Brief #107
Agent reliability bottlenecks shifted from model capability to architectural discipline. Practitioners discovered that state management, context portability, and security boundaries matter more than framework features—but vendor tooling still optimizes for the wrong problems.
Agent Skills Supply Chain Is Already Compromised
CONTRADICTS model-context-protocol — baseline assumes MCP provides security guarantees; skills bypass MCP entirelyMCP standardization created an attack surface before security primitives existed. Skills bypass MCP boundaries entirely via embedded shell commands, making tool directories a vector for arbitrary code execution.
Demonstrated that Agent Skills bypass MCP tool-calling boundaries completely—Markdown files can contain direct shell commands, bundled scripts, hardcoded credentials without any MCP mediation
Context-Minimization pattern shows architectural constraints prevent injection cascades—but only if tools are isolated. Skills that bundle execution bypass these protections
Repository-defined .mcp.json configurations can be exploited to override MCP servers and exfiltrate credentials—attack vector enabled by trusting skill definitions
Context Portability Breaks on Harness-Specific Tool Design
Shared skill directories assumed tools are context-independent, but most tools embed implicit harness dependencies (CLI paths, prompts referencing specific frameworks) that break when moved between agents.
Identified three types of implicit context coupling: harness-specific CLI tools, prompts that reference the harness, hardcoded paths/ports. Skills designed for one harness fail silently when moved.
Explicit State Files Prevent Context Drift Disasters
Agent frameworks that rely on implicit state tracking (conversation history, cached assumptions) accumulate technical debt that manifests as catastrophic failures. Explicit state files—tracked in version control—are the only reliable context preservation mechanism.
Missing state file caused Claude to create duplicate resources, nuking production. Disaster directly resulted from implicit state management—no ground truth for 'what exists.'
Demand-Driven Tool Discovery Beats Context Stuffing
Loading full API specs (2.3M tokens) into context destroys agent performance. Successful systems query tool catalogs on-demand when agents need capabilities, not upfront.
Cloudflare's 2.3M token OpenAPI spec can't fit in context. Solution: agents query available tools when needed rather than loading full catalog upfront.
RAG Lost to Pre-Indexed Integrations
Retrieval pipelines proved less effective than pre-indexed connectors (Microsoft 365, filesystem abstractions) because chunk-based retrieval loses document structure that users actually see.
Mollick declares RAG era over, with Microsoft 365 connectors on every Claude plan. Signals shift from retrieve-on-demand to pre-indexed access.
Task Complexity Classification Determines Agent Strategy
Agents fail when practitioners use the same interaction strategy (planning mode, one-shot) for all tasks. Order-0 problems need one-shot, order-1 needs planning, order-n needs recursive decomposition—clarity about task order is the bottleneck.
Practitioner discovered using planning mode for simple problems wastes tokens. Task complexity (order-0, order-1, order-n) determines optimal context strategy.
LLM-Maintained Context Indices Beat External Search
Using LLMs to maintain summary indices and metadata files (rather than external vector stores) enables knowledge compounding—wiki grows smarter across sessions because the LLM both consumes and produces context.
LLM-maintained wiki with auto-updating summaries queried 400K words without RAG. Knowledge accumulated across sessions—research compounded rather than resetting.