← Latest brief

Brief #80

41 articles analyzed

Context engineering is entering a compression crisis: practitioners are abandoning progressive disclosure patterns because LLMs won't reliably retrieve, instead embedding entire knowledge bases directly into system prompts. Meanwhile, the infrastructure layer is fragmenting—protocol choices (MCP vs A2A) now determine whether agent intelligence compounds or resets between sessions.

LLMs Are Too Lazy for Smart Retrieval

Progressive disclosure and RAG-style retrieval fail in production because models won't reliably choose to fetch context. Practitioners are winning by embedding compressed knowledge directly into system prompts, removing the agent's decision entirely.

Stop building retrieval-based context systems. Compress critical domain knowledge into <10KB markdown indexes, embed directly in system prompts with explicit instruction to 'prefer this context over training data.' Test at what quantity context becomes counterproductive for your use case.
@koylanai: Progressive disclosure is not reliable because LLMs are inherently lazy

Vercel engineer ships 8KB compressed Next.js API index in system context after Skills-based retrieval failed. Result: 100% eval performance vs unreliable autonomous retrieval. Core insight: passive context beats active retrieval.

@alxfazio: anthropic basically gave skills to openai for free

Haiku + 3 well-designed skills matches Opus performance but adding >3 skills causes context bloat and collapse. Reveals there's an optimal density where passive context works, beyond which system breaks.

Context Engineering Guide in 2025

Irrelevant context hurts accuracy even within window capacity. Models fail when presented with too much data even with space available—validates that compression and relevance filtering matter more than retrieval sophistication.


SDK-as-Interface Beats Tool Enumeration at Scale

When agents need access to 100+ API endpoints, enumerating each as an MCP tool exhausts context. Code Mode pattern wins: give agents a typed SDK and let them discover the API surface through code completion instead of reading schemas.

For APIs with >20 endpoints, stop creating individual MCP tool definitions. Instead: provide typed SDK bindings, give agents code execution environment, let them compose calls programmatically. Measure context token savings vs explicit tool approach.
Code Mode: give agents an entire API in 1,000 tokens

Cloudflare shows agents don't need individual tool definitions if they can write code against typed SDK. Anthropic independently converged on same pattern. Context compression through code abstraction vs schema documentation.

File System as Memory Interface Survives Compaction

Static context injection fails when agents hit compaction—flushed memories disappear. Practitioners are treating memory as a queryable file system so agents can re-access historical context through list/read/search operations after summarization.

Architect agent memory as hierarchical file system: /scratchpad (temp), /episodic (session), /facts (durable). Give agents ls/cat/grep commands. When context compaction triggers, agent retains directory structure and can re-query historical data instead of relying on lossy summaries.
@koylanai: The problem is how memory gets into the context window

OpenClaw contributor explains namespace abstraction: store history/memory/artifacts as addressable files. When compaction occurs, agent still has file handles—can re-read what's relevant instead of losing to summarization. Three memory tiers: scratchpad/episodic/fact.

Protocol Choice Determines Intelligence Persistence

MCP vs A2A isn't just a vendor preference—it's an architecture decision about whether agent context compounds across sessions. Stateful protocols preserve learning; stateless protocols reset intelligence every interaction.

Audit your agent architecture: Does state persist across sessions or reset? If building new system, choose protocol based on whether you need session-spanning intelligence. For stateful needs, implement MCP or equivalent with explicit session/memory management. Document what context survives restarts.
Memory in AI: MCP, A2A & Agent Context Protocols

Major platforms (Anthropic, Google, OpenAI) converging on context/memory protocols as critical infrastructure. Pattern mirrors REST API evolution: proprietary integrations → ecosystem protocols → interoperable agents. Without protocol-level persistence, every session resets.

Multi-Agent Coordination Breaks at Context Boundaries

Multi-agent systems fail when subagent completion states don't propagate upward. The coordination problem isn't prompting—it's ensuring child agent context explicitly injects into parent agent awareness so intelligence compounds across hierarchy.

Map your multi-agent information flow: Which agent states need to be visible to which other agents? Implement explicit status injection (completion, blocking, errors) from children to parents. Add naming and role visibility. Test: Can parent agent see why a child agent failed without asking?
@LLMJunky: Codex team quality of life upgrade

OpenAI Codex ships explicit message injection from child to parent agents. Problem: when subagents complete or get blocked, parent agents weren't receiving status updates, breaking information flow. Solution: explicit state propagation + naming/color-coding for visibility.

Context Format is Model-Specific, Not Universal

Research shows file-based context retrieval improves frontier model accuracy by 2.7% but degrades open-source models by 7.7%. Teams optimizing context for one model family are accidentally pessimizing for others.

Stop assuming 'best practices' for context formatting transfer across models. Before production deployment, benchmark your specific context architecture (YAML vs JSON vs Markdown, file-based vs inline) on the exact model family you're using. Test with representative schema sizes (10-1000+ tables).
Structured Context Engineering for File-Native Agentic Systems

9,649 experiments across 11 models prove context format choice is model-dependent, not universal. Frontier models (Claude, GPT) benefit from file-based retrieval +2.7%; open-source models show -7.7% aggregate. A context structure that works for Claude may actively degrade Llama performance.

CLAUDE.md as Feedback Loop Formalization

Persistent system prompts that capture feedback as rules create compounding improvements. Without explicit formalization (mistake → correction → rule → persistence), AI systems repeat errors across sessions instead of learning.

Create project CLAUDE.md with three sections: (1) Workflow orchestration rules, (2) Constraints/never-do-this rules, (3) Verification-before-done rules. After each AI mistake, add explicit rule capturing the pattern. Enforce 150-line limit through periodic compaction. Version control the file so rules persist across team/sessions.
@dexhorthy: Task Management bullets matter

Claude Code best practices from Boris Cherny (Anthropic): feedback must be explicitly captured as rules in CLAUDE.md. Without persistent structure, corrections are session-local and don't compound. Pattern: AI makes mistake → human corrects → pattern captured in rule → future interactions reference → error rate drops.

Session Checkpointing Prevents Context Debt

Context degrades across long sessions as unrelated instructions accumulate. Practitioners are using git commits as session boundaries—compacting context, creating stable baselines, and preventing 'context debt' where irrelevant history poisons future work.

Implement session boundaries using version control: (1) Run each agent session in isolated git worktree, (2) Compact context when reaching 70-80% window utilization, (3) Commit completed work to create checkpoint, (4) Start new session in fresh worktree referencing previous commits. Track context utilization percentage as quality metric.
@shao__meng: Claude Code engineering knowledge base

Session checkpointing via git commits resets context and creates stable baselines. Proactive compaction before hitting 100% utilization. Pattern: progressive context disclosure (declare only Skills/Agents needed for current task), compact before saturation, commit to checkpoint.