Skip to content
All Projects

Summarization for Pydantic AI

Automatic conversation summarization for unlimited context

Three strategies for managing agent context: intelligent LLM-based summarization, zero-cost sliding window trimming, and real-time context manager middleware with token tracking.

Installation

Terminal
pip install summarization-pydantic-ai

Two strategies for keeping agent conversations within context limits. LLM-based summarization intelligently compresses older messages while preserving key information — triggered by message count, token count, or context fraction. Zero-cost sliding window trimming simply drops the oldest messages with a safe cutoff that never breaks tool call/response pairs. A real-time context manager middleware tracks token usage live, truncates long tool outputs, and auto-detects model context windows.

Features

LLM Summarization
Sliding Window
Real-time Context Manager
Token Tracking

Quick Start

summarization_example.py
from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000),
keep=("messages", 20),
)
agent = Agent(
"openai:gpt-4o",
history_processors=[processor],
)
result = await agent.run("Hello!")

Use Cases

Long Conversations

Keep agents running for hours without hitting context limits — older messages get summarized automatically.

Customer Support Bots

Preserve key customer details (name, issue, order ID) while discarding routine back-and-forth exchanges.

Research Assistants

Maintain research context across deep investigation sessions where accumulated findings would exceed the context window.

Cost-Sensitive Apps

Choose zero-cost sliding window for maximum throughput, or LLM summarization when quality matters more than speed.

Ready to build your first production AI agent?

Open-source tools, battle-tested patterns, zero boilerplate. Configure your stack and ship in minutes — not months.