Brief #80
Context engineering is entering a compression crisis: practitioners are abandoning progressive disclosure patterns because LLMs won't reliably retrieve, instead embedding entire knowledge bases directly into system prompts. Meanwhile, the infrastructure layer is fragmenting—protocol choices (MCP vs A2A) now determine whether agent intelligence compounds or resets between sessions.
LLMs Are Too Lazy for Smart Retrieval
Progressive disclosure and RAG-style retrieval fail in production because models won't reliably choose to fetch context. Practitioners are winning by embedding compressed knowledge directly into system prompts, removing the agent's decision entirely.
Vercel engineer ships 8KB compressed Next.js API index in system context after Skills-based retrieval failed. Result: 100% eval performance vs unreliable autonomous retrieval. Core insight: passive context beats active retrieval.
Haiku + 3 well-designed skills matches Opus performance but adding >3 skills causes context bloat and collapse. Reveals there's an optimal density where passive context works, beyond which system breaks.
Irrelevant context hurts accuracy even within window capacity. Models fail when presented with too much data even with space available—validates that compression and relevance filtering matter more than retrieval sophistication.
SDK-as-Interface Beats Tool Enumeration at Scale
When agents need access to 100+ API endpoints, enumerating each as an MCP tool exhausts context. Code Mode pattern wins: give agents a typed SDK and let them discover the API surface through code completion instead of reading schemas.
Cloudflare shows agents don't need individual tool definitions if they can write code against typed SDK. Anthropic independently converged on same pattern. Context compression through code abstraction vs schema documentation.
File System as Memory Interface Survives Compaction
Static context injection fails when agents hit compaction—flushed memories disappear. Practitioners are treating memory as a queryable file system so agents can re-access historical context through list/read/search operations after summarization.
OpenClaw contributor explains namespace abstraction: store history/memory/artifacts as addressable files. When compaction occurs, agent still has file handles—can re-read what's relevant instead of losing to summarization. Three memory tiers: scratchpad/episodic/fact.
Protocol Choice Determines Intelligence Persistence
MCP vs A2A isn't just a vendor preference—it's an architecture decision about whether agent context compounds across sessions. Stateful protocols preserve learning; stateless protocols reset intelligence every interaction.
Major platforms (Anthropic, Google, OpenAI) converging on context/memory protocols as critical infrastructure. Pattern mirrors REST API evolution: proprietary integrations → ecosystem protocols → interoperable agents. Without protocol-level persistence, every session resets.
Multi-Agent Coordination Breaks at Context Boundaries
Multi-agent systems fail when subagent completion states don't propagate upward. The coordination problem isn't prompting—it's ensuring child agent context explicitly injects into parent agent awareness so intelligence compounds across hierarchy.
OpenAI Codex ships explicit message injection from child to parent agents. Problem: when subagents complete or get blocked, parent agents weren't receiving status updates, breaking information flow. Solution: explicit state propagation + naming/color-coding for visibility.
Context Format is Model-Specific, Not Universal
Research shows file-based context retrieval improves frontier model accuracy by 2.7% but degrades open-source models by 7.7%. Teams optimizing context for one model family are accidentally pessimizing for others.
9,649 experiments across 11 models prove context format choice is model-dependent, not universal. Frontier models (Claude, GPT) benefit from file-based retrieval +2.7%; open-source models show -7.7% aggregate. A context structure that works for Claude may actively degrade Llama performance.
CLAUDE.md as Feedback Loop Formalization
Persistent system prompts that capture feedback as rules create compounding improvements. Without explicit formalization (mistake → correction → rule → persistence), AI systems repeat errors across sessions instead of learning.
Claude Code best practices from Boris Cherny (Anthropic): feedback must be explicitly captured as rules in CLAUDE.md. Without persistent structure, corrections are session-local and don't compound. Pattern: AI makes mistake → human corrects → pattern captured in rule → future interactions reference → error rate drops.
Session Checkpointing Prevents Context Debt
Context degrades across long sessions as unrelated instructions accumulate. Practitioners are using git commits as session boundaries—compacting context, creating stable baselines, and preventing 'context debt' where irrelevant history poisons future work.
Session checkpointing via git commits resets context and creates stable baselines. Proactive compaction before hitting 100% utilization. Pattern: progressive context disclosure (declare only Skills/Agents needed for current task), compact before saturation, commit to checkpoint.