← Latest brief

Brief #32

16 articles analyzed

Context architecture—not model capability—is emerging as the primary engineering discipline for AI systems. Practitioners are hitting architectural limits: context lock-in between tools, metadata pollution degrading performance over time, and coordination overhead killing multi-agent reliability. The shift is from prompt engineering to systems engineering.

Context Filtering Must Happen Before Token Consumption

Systems that filter unwanted context at the prompt layer (after loading) waste tokens and leak instructions. Production systems need infrastructure-layer preprocessing to remove metadata, comments, and maintenance notes before context windows are consumed—this is the difference between systems that degrade over time and those that compound intelligence.

If you're building Skills, MCP tools, or RAG systems: implement preprocessing that strips maintenance metadata, comments, and human-only annotations BEFORE feeding context to the model. Don't rely on system prompts to ignore content—that wastes tokens and leaks instructions. Build this as infrastructure, not prompt-layer filtering.
Dani Avila: Skills maintenance metadata pollution

User reports that maintenance metadata (versioning, sourcing, docs) contaminates Claude's context window, making Skills less effective as they age. The accumulation of human-only notes is a compounding problem.

Dani Avila: Workaround with separate maintenance files

Practitioner identifies that API-layer preprocessing (filtering before context consumption) is superior to prompt-layer filtering (after loading). Requests infrastructure support for ignoring maintenance comments—reveals this is a missing primitive in current tools.

Hesamation: Code organization determines AI effectiveness

From a practitioner who ships with AI: codebase structure and documentation quality is the bottleneck, not prompting. Clean context enables models to understand intent; messy context creates a ceiling regardless of prompt quality. This validates that context quality must be architected, not prompt-engineered.


Call-Stack Context Beats Linear Chat History

Organizing agent context as a hierarchical task stack (push subtasks, pop completions) eliminates lossy summarization because closed contexts can be fully removed rather than compressed. This mirrors how engineers actually decompose work and solves the context window problem architecturally, not through better compression.

When building long-running agents, structure context as a task stack: high-level goals push subtasks onto the stack, completion pops them. Store completed work in structured logs (not summarized chat history) so context can be retrieved hierarchically when needed. This architectural choice eliminates 80% of summarization needs.
Dan B: Call-stack context architecture POC

Practitioner built POC showing that structuring context as call stacks (hierarchical tasks) reduces need for lossy compaction. Completed tasks pop cleanly rather than requiring summarization. The key insight: linear chat history is the wrong mental model—tasks are hierarchical.

Multi-Agent Reliability Requires Context-First Architecture

Multi-agent systems fail in production not because models lack capability, but because context decay and coordination overhead are treated as afterthoughts rather than first-class constraints. Reliable designs constrain agent scope to bound context windows, make orchestration removable/testable, and measure outcomes rather than agent-level metrics.

Before adding more agents or smarter models, architect context preservation: (1) Constrain each agent's scope to bound its context window, (2) Make orchestration layers removable/testable so dependencies are explicit, (3) Measure outcomes (did the system achieve the goal?) not agent metrics (did this agent perform well?). Treat coordination costs as your primary optimization target.
Aishwarya Srinivasan: Multi-agent production failures

Practitioner identified root cause of multi-agent failures: context decay and coordination overhead treated as secondary concerns. Working architectures have narrow agent scopes (bounded contexts), removable orchestration (testable dependencies), and outcome-level measurement (shared understanding without context-destroying metrics).

Context Portability Lock-In Is the New Vendor Lock-In

AI tools create lock-in not through features but through incompatible context/configuration schemas. Agent definitions, command structures, hooks, and MCP integrations don't port between tools—forcing practitioners to rebuild intelligence from scratch when switching. The absence of portable context standards fragments the ecosystem.

Demand portable context standards from tool vendors. When evaluating AI dev tools, test whether your agent configurations, memory structures, and tool integrations can export/import cleanly. Prefer tools that support open protocols (MCP, Agent Protocol) over proprietary schemas. Document your configurations in vendor-neutral formats so you can rebuild if needed.
Alex Fazio: Can't port configurations between Claude Code and OpenCode

Practitioner reports inability to switch tools without rewriting agent YAML frontmatter, command definitions, hooks, and MCP server configs. Each tool uses proprietary schemas for defining agent behavior, creating context lock-in.

Models Learn Context Management Through RL, Not Architecture

RL post-training is teaching models to compensate for architectural constraints by externalizing memory through tool use (file reading, retrieval) rather than relying on attention windows. This blurs the line between model capability and context engineering—models are learning to manage their own context, making explicit preservation strategies less important but also less controllable.

Don't assume models will automatically manage context well—even if they CAN use tools for memory, they need clear guidance on WHEN to externalize vs keep in-context. Add explicit constraints to system prompts about memory/retrieval strategy. Test whether your model is learning to preserve critical context or just thrashing between tools. Budget token usage like you budget time.
Andrew Carr: RL overcomes linear attention limits via tool use

Analysis that models with constrained attention windows will learn to externalize memory via tools (reading, retrieval) through RL. This is emergent context engineering—models learning behavioral adaptations rather than engineers designing context preservation.