Summarization for Pydantic AI

Automatic conversation summarization for unlimited context

Three strategies for managing agent context: intelligent LLM-based summarization, zero-cost sliding window trimming, and real-time context manager middleware with token tracking.

Installation

pip install summarization-pydantic-ai

GitHub PyPI

Two strategies for keeping agent conversations within context limits. LLM-based summarization intelligently compresses older messages while preserving key information — triggered by message count, token count, or context fraction. Zero-cost sliding window trimming simply drops the oldest messages with a safe cutoff that never breaks tool call/response pairs. A real-time context manager middleware tracks token usage live, truncates long tool outputs, and auto-detects model context windows.

Features

— LLM Summarization

— Sliding Window

— Real-time Context Manager

— Token Tracking

Quick Start

from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

agent = Agent(
    "openai:gpt-4o",
    history_processors=[processor],
)

result = await agent.run("Hello!")

Use Cases

Long Conversations

Keep agents running for hours without hitting context limits — older messages get summarized automatically.

Customer Support Bots

Preserve key customer details (name, issue, order ID) while discarding routine back-and-forth exchanges.

Research Assistants

Maintain research context across deep investigation sessions where accumulated findings would exceed the context window.

Cost-Sensitive Apps

Choose zero-cost sliding window for maximum throughput, or LLM summarization when quality matters more than speed.

Ready to build your first production AI agent?

Open-source tools, battle-tested patterns, zero boilerplate. Configure your stack and ship in minutes — not months.

Build Your AI App