Brief #32
Context architecture—not model capability—is emerging as the primary engineering discipline for AI systems. Practitioners are hitting architectural limits: context lock-in between tools, metadata pollution degrading performance over time, and coordination overhead killing multi-agent reliability. The shift is from prompt engineering to systems engineering.
Context Filtering Must Happen Before Token Consumption
Systems that filter unwanted context at the prompt layer (after loading) waste tokens and leak instructions. Production systems need infrastructure-layer preprocessing to remove metadata, comments, and maintenance notes before context windows are consumed—this is the difference between systems that degrade over time and those that compound intelligence.
User reports that maintenance metadata (versioning, sourcing, docs) contaminates Claude's context window, making Skills less effective as they age. The accumulation of human-only notes is a compounding problem.
Practitioner identifies that API-layer preprocessing (filtering before context consumption) is superior to prompt-layer filtering (after loading). Requests infrastructure support for ignoring maintenance comments—reveals this is a missing primitive in current tools.
From a practitioner who ships with AI: codebase structure and documentation quality is the bottleneck, not prompting. Clean context enables models to understand intent; messy context creates a ceiling regardless of prompt quality. This validates that context quality must be architected, not prompt-engineered.
Call-Stack Context Beats Linear Chat History
Organizing agent context as a hierarchical task stack (push subtasks, pop completions) eliminates lossy summarization because closed contexts can be fully removed rather than compressed. This mirrors how engineers actually decompose work and solves the context window problem architecturally, not through better compression.
Practitioner built POC showing that structuring context as call stacks (hierarchical tasks) reduces need for lossy compaction. Completed tasks pop cleanly rather than requiring summarization. The key insight: linear chat history is the wrong mental model—tasks are hierarchical.
Multi-Agent Reliability Requires Context-First Architecture
Multi-agent systems fail in production not because models lack capability, but because context decay and coordination overhead are treated as afterthoughts rather than first-class constraints. Reliable designs constrain agent scope to bound context windows, make orchestration removable/testable, and measure outcomes rather than agent-level metrics.
Practitioner identified root cause of multi-agent failures: context decay and coordination overhead treated as secondary concerns. Working architectures have narrow agent scopes (bounded contexts), removable orchestration (testable dependencies), and outcome-level measurement (shared understanding without context-destroying metrics).
Context Portability Lock-In Is the New Vendor Lock-In
AI tools create lock-in not through features but through incompatible context/configuration schemas. Agent definitions, command structures, hooks, and MCP integrations don't port between tools—forcing practitioners to rebuild intelligence from scratch when switching. The absence of portable context standards fragments the ecosystem.
Practitioner reports inability to switch tools without rewriting agent YAML frontmatter, command definitions, hooks, and MCP server configs. Each tool uses proprietary schemas for defining agent behavior, creating context lock-in.
Models Learn Context Management Through RL, Not Architecture
RL post-training is teaching models to compensate for architectural constraints by externalizing memory through tool use (file reading, retrieval) rather than relying on attention windows. This blurs the line between model capability and context engineering—models are learning to manage their own context, making explicit preservation strategies less important but also less controllable.
Analysis that models with constrained attention windows will learn to externalize memory via tools (reading, retrieval) through RL. This is emergent context engineering—models learning behavioral adaptations rather than engineers designing context preservation.