-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Description
Context
LlamaIndex's March 26 blog post ("Files Are All You Need") makes a compelling case for files as the primary context management abstraction for long-running agents — including storing compressed conversation histories when context compaction triggers.
This pattern solves the token budget problem. It creates a new monitoring problem: file-based context compaction is a behavioral boundary, and there's currently no standardized way to observe whether agent behavior changed after crossing one.
What I mean
When an agent compacts context into a file (or summarizes + discards older messages), two things happen:
- The agent's effective "memory" is now a summary, not the original trace
- The vocabulary, task focus, and tool-use patterns may have shifted silently
The agent continues running. If there's no instrument watching for the shift, you won't know until an output is visibly wrong — which in long-horizon agents is often too late.
The gap
LlamaIndex has excellent per-query and per-tool instrumentation via callbacks. What's missing is a compaction boundary event with enough metadata to enable cross-boundary behavioral comparison:
- Which messages were dropped?
- What was the summary produced?
- Did topic focus, tool-use distribution, or vocabulary shift between pre/post windows?
What I'm proposing
A CompactionEvent or equivalent callback hook (similar to existing CBEventType patterns) that fires at the context compaction boundary, emitting:
class CompactionEvent:
pre_compaction_message_count: int
post_compaction_message_count: int
summary_text: str
dropped_token_count: int
timestamp: datetimeThis would let observability tools, monitoring libraries, and production operators attach a behavioral fingerprint before and after compaction — enabling rollback, alerting, and drift detection without modifying the core compaction logic.
Reference
I built a toolkit for exactly this gap: compression-monitor. It currently hooks into frameworks via filesystem inspection (LangChain compaction markers), but first-class events from the framework would be cleaner and more reliable.
Happy to draft a PR for the callback type if there's interest in adding this to the core.callbacks surface.