Brief #66
Context engineering has matured from prompt optimization to supply chain security and architecture patterns. Practitioners are discovering that context boundaries—what gets preserved, shared, and executed—are now security attack surfaces, coordination mechanisms, and the primary design constraint outweighing model capability.
Markdown Documentation Becomes Executable Malware Vector
Agent ecosystems that consume skills/plugins from untrusted sources face supply chain attacks where documentation context (Markdown) doubles as code execution instructions, bypassing traditional security boundaries because agents treat installation steps as legitimate commands.
1Password VP documents actual malware in agent skill ecosystem where Markdown installation instructions executed shell commands, downloaded crypto miners. Six-step attack chain exploited that agents don't distinguish informational vs executable context.
Practitioner lost irreplaceable family photos when Claude Code executed filesystem operations without safety constraints. Demonstrates that capability without explicit destructive-operation boundaries creates catastrophic risk—same context boundary failure as Markdown malware.
Tool schema context being ignored by LLMs shows context injection failures aren't malicious-only—systems fail to preserve structural constraints across execution boundaries even in benign scenarios.
Shared Context File Beats Agent-Specific Prompts
Multi-agent coordination scales better through a single shared context document (CLAUDE.md) that all agents reference, rather than duplicating system prompts per agent—small tweaks cascade improvements because intelligence compounds in one place.
Practitioner discovered centralized CLAUDE.md as coordination hub—'tweaks' to shared context improve collaboration across agents without per-agent reconfiguration.
Context Phase Decomposition Creates Depth Not Models
Deep research outputs require multi-phase context decomposition where each phase builds on prior outputs (source selection → analysis → synthesis) with explicit structural schemas—speed-optimized context produces shallow results regardless of model quality.
Practitioner workflow: context clarity (audience/problem) → source specificity → phase decomposition (each builds on last) → output schema = depth. Without phase context preservation, AI scrapes shallowly regardless of capability.
Humans Stop Reading Code They're Shipping
Practitioners are shipping production codebases they've never read, relying entirely on AI-maintained context and artifact inspection—this works for greenfield single-contributor projects but creates invisible knowledge debt and single-point-of-failure risk.
Practitioner admits shipping multiple useful codebases without reading any code—interaction limited to artifacts. Works because: greenfield (clear problem), sole contributor (no coordination), AI maintains context. Feels 'alien' but functional.
Multi-Agent Orchestration Creates Context Waste Not Intelligence
Practitioners abandoning multi-agent patterns because agent switching forces context reloading, duplicated reads, and wasted tokens—for linear workflows, single-threaded context preservation outperforms orchestration complexity.
Experienced engineer (libGDX creator) reports multi-agent delegation created redundant operations: context switching, file re-reading after context clears, duplicated processing. Linear feature implementation didn't need orchestration—needed single-threaded context.
Context Window Degradation Breaks Agent Loops Mid-Task
Local models in agentic workflows fail not from reasoning quality but from cache-clearing when context fills—agents enter loops instead of progressing, revealing context budget management as distinct failure mode from hallucination.
Practitioner testing local models found glm 4.7 doesn't hallucinate tool calls but degrades mid-task: as context fills and cache clears, agent loops instead of completing. This is context budget failure, not reasoning failure.
Agents Should Pull Context Not Receive It
Inversion-of-control pattern from software engineering applies to context: instead of harnesses pushing context to agents, agents should decide what context they need and pull it—this enables organic accumulation and better retrieval strategies across turns.
AI researcher maps inversion-of-control to agent context: current harnesses push context to agents (like hardcoded dependencies), but better pattern is agents pull what they need (like dependency injection). RLM experiments show agents build better retrieval loops when controlling their own context decisions.
Token Consumption Not Price Determines Real Cost
Per-token pricing is misleading because models consume vastly different token volumes on identical tasks—teams must benchmark actual token usage on their workload or risk 3x cost surprises despite identical sticker prices.
Context Bench leaderboard operator reports Opus 4.6 costs 3x more than 4.5 despite same per-token price—because it uses 3x tokens on code tasks. Total cost = unit price × consumption. Consumption varies by model AND task type.