Brief #99
Context cost economics is inverting the optimization stack: practitioners succeed by preserving intelligence across sessions through persistent state layers and clarifying constraints upfront, not by adding more orchestration or larger context windows.
Context Is Now More Expensive Than Inference
As inference becomes commoditized (competition, optimization), the binding constraint shifts from model compute to context management—token efficiency, retrieval quality, and state preservation become the optimization frontier.
Direct statement of economic inversion: inference cost declining while context management becomes bottleneck
Tool exists specifically to reduce context overhead from server sprawl—validation that context budget is precious
Empirical ceiling shows context windows have hard limits; optimization must work within constraints
Persistent File-Based Memory Beats Ephemeral Prompts
Practitioners achieving production reliability use file-based context persistence (CLAUDE.md, .learnings/, Keep.md+GitHub) to compound intelligence across sessions rather than re-establishing context through prompts.
CLAUDE.md + skills + hooks architecture achieves 100% vs 20% utilization—the difference is persistent context infrastructure
Specification Clarity Beats Model Capability Gains
Practitioners report agent failures stem from ambiguous specs, not model limitations—the bottleneck is humans articulating constraints clearly upfront, not waiting for better models.
Direct practitioner observation: spec quality determines output quality more than model choice
Safety-in-Prompts Dies When Model Stops Listening
Production incidents reveal prompt-based safety constraints are ephemeral—real safety requires encoding boundaries in infrastructure (permissions, rollback, sandboxing) outside the model's control.
Catastrophic Terraform deletion despite prompt warnings—proof safety context at instruction layer is fragile
Phase-Based Model Routing Optimizes Context Budget
Split complex reasoning (use expensive models) from execution (use cheaper models) to preserve context budget—different task phases have different context/capability requirements.
Explicit phase separation: complex reasoning needs Opus capacity, execution works with Sonnet efficiency
MCP Task Primitive Gaps Block Production Deployment
MCP Tasks shipped experimentally but lacks retry semantics, state expiry rules, and transport scalability—production deployments revealed infrastructure needs that lab experiments missed.
Official acknowledgment: Tasks primitive in production but has lifecycle gaps, needs governance and enterprise features
Hardwired Agent Pipelines Decay on Model Updates
Hand-crafted LangGraph/AutoGen orchestration becomes obsolete when underlying models improve—declarative architecture with semantic wiring survives model iteration without manual rewrites.
Declarative spec + IoC framework allows swapping execution engines without rewriting agent logic