Documentation Index
Fetch the complete documentation index at: https://daily-docs-pr-4386.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
In long-running voice AI conversations, context grows with every exchange. This increases token usage, raises costs, and can eventually hit context window limits. Pipecat includes built-in context summarization that automatically compresses older conversation history while preserving recent messages and important context.How It Works
Context summarization automatically triggers when either condition is met:- Token limit reached: Context size exceeds
max_context_tokens(estimated using ~4 characters per token) - Message count reached: Number of new messages exceeds
max_unsummarized_messages
None, but at least one must remain active. Summarization always generates a summary and cannot be reduced to pure truncation.
When triggered, the system:
- Sends a
LLMContextSummaryRequestFrameto the LLM service - The LLM generates a concise summary of older messages
- Context is reconstructed as:
[system_message (if present)] + [summary] + [recent_messages] - Incomplete function call sequences and recent messages are preserved
Context summarization is asynchronous and happens in the background without
blocking the pipeline. The system uses request IDs to match summary requests
with results and handles interruptions gracefully.
Enabling Context Summarization
Enable summarization by settingenable_auto_context_summarization=True in LLMAssistantAggregatorParams:
enable_auto_context_summarization=False). When enabled with the default configuration, summarization triggers at 8000 estimated tokens or after 20 new messages, whichever comes first.
Customizing Behavior
UseLLMAutoContextSummarizationConfig and LLMContextSummaryConfig to tune the summarization triggers and output:
What Gets Preserved
Context summarization intelligently preserves:- System messages: If the first message (
messages[0]) is a system message, it is preserved as the initial system prompt. Mid-conversation system messages (e.g., idle notifications or context injections) are treated as regular messages and included in the summarization range. When usingsystem_instructionin LLM Settings instead, the system prompt is not part of the context messages and is automatically prepended by the service on each request, so there is nothing to preserve in the context. - Recent messages: The last N messages stay uncompressed (configured by
min_messages_after_summary) - Function call sequences: Incomplete function call/result pairs are not split during summarization
- Developer messages are NOT preserved: Developer messages (
"role": "developer") are included in the summarization range like any other message and may be compressed or dropped. If instructions need to survive summarization, usesystem_instructioninstead.
Custom Summarization Prompts
You can override the default summarization prompt to control how the LLM generates summaries:Dedicated Summarization LLM
By default, summarization uses the same LLM service that handles conversation. You can route summarization to a separate, cheaper model by setting thellm field:
On-Demand Summarization
In addition to automatic summarization, you can trigger context summarization on demand by pushing anLLMSummarizeContextFrame into the pipeline. This is useful when you want to give users explicit control over when summarization happens — for example, via a function call tool.
enable_auto_context_summarization is False — the summarizer is always created internally to handle manually pushed frames. You can also pass a per-request LLMContextSummaryConfig to override the default settings:
Observability
The summarizer emits anon_summary_applied event after each successful summarization, providing message count metrics:
Next Steps
Context Summarization Reference
Full reference for configuration parameters, events, and classes.
Context Management
Learn how Pipecat manages conversation context in pipelines.