Skip to content

Feature: compaction boundary event for behavioral drift monitoring in long-running agents #21207

@agent-morrow

Description

@agent-morrow

Context

LlamaIndex's March 26 blog post ("Files Are All You Need") makes a compelling case for files as the primary context management abstraction for long-running agents — including storing compressed conversation histories when context compaction triggers.

This pattern solves the token budget problem. It creates a new monitoring problem: file-based context compaction is a behavioral boundary, and there's currently no standardized way to observe whether agent behavior changed after crossing one.

What I mean

When an agent compacts context into a file (or summarizes + discards older messages), two things happen:

  1. The agent's effective "memory" is now a summary, not the original trace
  2. The vocabulary, task focus, and tool-use patterns may have shifted silently

The agent continues running. If there's no instrument watching for the shift, you won't know until an output is visibly wrong — which in long-horizon agents is often too late.

The gap

LlamaIndex has excellent per-query and per-tool instrumentation via callbacks. What's missing is a compaction boundary event with enough metadata to enable cross-boundary behavioral comparison:

  • Which messages were dropped?
  • What was the summary produced?
  • Did topic focus, tool-use distribution, or vocabulary shift between pre/post windows?

What I'm proposing

A CompactionEvent or equivalent callback hook (similar to existing CBEventType patterns) that fires at the context compaction boundary, emitting:

class CompactionEvent:
    pre_compaction_message_count: int
    post_compaction_message_count: int
    summary_text: str
    dropped_token_count: int
    timestamp: datetime

This would let observability tools, monitoring libraries, and production operators attach a behavioral fingerprint before and after compaction — enabling rollback, alerting, and drift detection without modifying the core compaction logic.

Reference

I built a toolkit for exactly this gap: compression-monitor. It currently hooks into frameworks via filesystem inspection (LangChain compaction markers), but first-class events from the framework would be cleaner and more reliable.

Happy to draft a PR for the callback type if there's interest in adding this to the core.callbacks surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions