← Latest brief Saturday, February 7, 2026

Brief #64

35 articles analyzed

Context engineering is fracturing into two worlds: practitioners discovering that *context visibility and structural constraints* matter more than model capability, while vendors race to add features that paradoxically consume more context. The surprise isn't that AI agents are getting better—it's that the best practitioners are succeeding by *removing context* and *making actions impossible* rather than adding sophistication.

◐

MCP Servers Are Context Budget Vampires

MCP tool integrations consume 25%+ of context windows before any actual work begins. Practitioners are optimizing by disabling unused MCP searches rather than adding more tools—the bottleneck is context visibility, not capability.

→ Run /context command before starting work. Measure baseline MCP token consumption. Disable any tool searches you don't actively need for current task. Budget 20-30% overhead for MCP in planning.

Signal 1: Claude Code Just Fixed MCP's Biggest Problem

Tutorial demonstrates /context command reveals 25% token consumption by MCP tools before work begins. Disabling unused searches reduced overhead to 14%. Pattern: measurement → understanding → selective enablement.

Signal 6: Excalidraw in Claude via MCP

MCP Apps announcement from core engineer shows protocol maturity but doesn't address context consumption costs. Vendor framing emphasizes capability expansion while practitioners report budget exhaustion.

Signal 18: Multi-agent context bloat suspicion

Practitioner suspects Anthropic uses private extended-context model because agent teams generate uncompactable context bloat. MCP integrations compound this—each tool adds baseline overhead.

More signals

●

CLAUDE.md Should Document Failures Not Structure

Effective CLAUDE.md files contain *only* what Claude cannot observe from code—constraints, hidden dependencies, what breaks. Practitioners waste token budget documenting discoverable information (tech stack, file structure) instead of irreplaceable knowledge.

→ Delete your current CLAUDE.md. Start from scratch. Only add entries when Claude makes a specific error. Each entry should answer: 'What can Claude not infer from reading the code directly?' Focus on constraints, breaking changes, technical debt.

Signal 2: @pdrmnvd on CLAUDE.md bloat

Direct practitioner experience: 'you're writing a CLAUDE dot md? let me guess. this project uses React...' Dismissive of documenting observable code. Advocates minimal viable approach: start empty, add only when Claude fails.

Control Layer Architecture Beats Behavioral Prompting

After two years of practice, advanced teams encode constraints as structural impossibilities (control layer prevents illegal actions from being expressible) rather than behavioral rules (prompts asking model to refuse). This eliminates the 'explain again' problem across sessions.

→ Audit your system prompts for behavioral constraints ('never do X', 'always check Y'). For each one, ask: can I make this violation structurally impossible? Move constraints from prompt layer to control/hook layer where action space is defined.

Signal 4: @tokenbender on making illegal actions impossible

Practitioner at dottxt.ai reports 'two years' validating control-layer constraint architecture. Key insight: architectural layer between intent and expression makes violations impossible rather than unlikely. Constraint is baked into decision space.

◐

Codex 5.3 Context Performance Cliff at 45%

Practitioners empirically observe that Codex 5.3 enters 'dumb zone' when context utilization exceeds 45%, and model behavior changed unpredictably between 5.2 and 5.3—consuming 50%+ context before producing output vs. 20% previously. Context optimization creates version-specific technical debt.

→ Test your context utilization percentage before deploying model version updates. Don't assume context behavior is stable across versions. Build monitoring for context consumption patterns. Keep context utilization below 40% as safety margin.

Signal 15: Codex 5.3 dumb zone starts at 45%

Mario Zechner reports concrete empirical threshold: performance degradation begins at 45% context utilization. Below threshold model performs normally; above it enters 'dumb zone.' Suggests context window capacity vs. utility have non-linear relationship.

◐

Agent Teams Need Dedicated Maintenance Agents

Large agent swarms degrade over time from resource leaks and stuck processes. Practitioners discover they need meta-agents with SSH access and infrastructure knowledge to maintain other agents—operational context is as critical as execution context.

→ If running agent swarms, create a dedicated maintenance agent with: (1) SSH/infrastructure access, (2) knowledge of normal vs. stuck states, (3) restart/cleanup capabilities, (4) persistent operational intelligence across maintenance cycles.

Signal 24: Dedicated agent for swarm maintenance

Practitioner reports 'incredibly useful to have dedicated agent with SSH access and infrastructure knowledge' to maintain agent swarms. Reveals gap: swarms need maintenance context about operational state, not just execution context.

Tool Amnesia Requires Active Context Injection

Agents lose awareness of available tools across conversation turns, underutilizing enabled functionality. Tool definitions in system prompts don't persist effectively—practitioners need re-injection strategies or persistent tool state outside context windows.

→ Don't rely on single tool definition in system prompt. Re-inject tool summaries every 5-10 turns in long conversations. Consider persistent tool state in external database. Add explicit 'you have these tools available' reminders before complex tasks.

Signal 21: Agent forgets available tools

Practitioner reports OpenClaw bot 'keeps forgetting it can do stuff' despite tool definitions in system prompt. Asks about persistent tool memory solutions. Reveals tool availability amnesia pattern.

●

Session Duration Inversely Correlates With Quality

Practitioners observe output quality degrades within extended sessions and recovers after breaks, suggesting context window saturation or attention dilution. The bottleneck isn't model capability—it's lack of active mid-session context management.

→ Monitor your session lengths. Set maximum conversation turn limits (e.g., 15-20 exchanges) before forcing context reset or summarization. Test whether quality improves with periodic context pruning. Don't assume models perform consistently across unbounded sessions.

Signal 29: AI models get worse with continuous use

Practitioner asks if others experience quality degradation in extended sessions (ChatGPT/Gemini). Quality recovers after stepping away. Suggests session boundaries matter more than assumed—context accumulates noise or loses signal-to-noise ratio.

◐

Subscription Models Break Agent Automation Economics

Practitioners discover that unlimited subscription models (Claude Max) enforce ToS against automated agent access, causing random suspensions. API-key-based consumption with fallback model architecture is the only reliable approach for agent workflows.

→ Don't build agent automation on subscription models. Use API keys with pay-per-token pricing. Configure fallback models (Claude → GPT-4 → GPT-3.5) for resilience. Budget for token consumption rather than assuming unlimited access.

Signal 16: OpenClaw subscription trap

Practitioner reports OpenClaw fails randomly with Claude Max subscription due to ToS enforcement against automated access. Solution: API keys + fallback model configuration. Subscription designed for human chat doesn't handle agent workloads.