← Latest brief

Brief #37

29 articles analyzed

Context engineering is splitting into two distinct disciplines: architectural patterns for multi-agent coordination (harnesses, orchestration, isolation) and micro-patterns for single-agent effectiveness (framing, planning, constraint design). The bottleneck isn't models or frameworks—it's practitioners' ability to design context flow at both levels simultaneously.

Context Architecture Beats Model Selection Every Time

Practitioners report spending 80% of effort on planning/context design and 20% on execution, with framework choice mattering far less than how context flows through the system. The performance lever is what information reaches the model and when, not which model you use.

Stop optimizing model selection and framework comparisons. Invest 80% of your time designing what context reaches your model at each step: create explicit context flow diagrams, define what information persists across turns, and document decision rules for context filtering.
@petergyang: Is it just me or do you also spend 80% of your time coding doing planning

Practitioner directly quantifies that 80% of AI coding work is context/planning infrastructure, not generation

How to think about agent frameworks - LangChain Blog

LangChain author concludes framework choice doesn't matter—'the hard part is making sure the LLM has appropriate context at each step'

A Practitioner's Take on Context Engineering Survey

1,400+ paper survey identifies context engineering (retrieval, processing, management) as the primary performance lever over model size

@alexhillman: the wildest thing I'm experiencing since sharing my claude code stuff

Code sharing fails to transfer value because 'decisions, thinking, and planning' (context architecture) are non-transferable, proving context design is the actual bottleneck


Expanded Requirements Artifacts Prevent Premature Completion

Long-running agents fail by declaring projects 'done' prematurely because they lack persistent, itemized scope artifacts. Converting vague requests into 200+ item checklists with explicit status tracking creates a shared truth that prevents scope hallucination.

For any multi-step project, insert an 'initializer phase' where an agent or human expands the vague request into an itemized checklist (50-200+ items) with explicit status fields. Make this artifact the source-of-truth for all subsequent agents, not the original prompt. Each work session updates the artifact, not the conversation history.
Effective harnesses for long-running agents

Anthropic demonstrates using initializer agent to expand vague requests into 200+ item feature lists with status tracking, preventing agents from claiming completion prematurely

Boundary-Injected Secrets Beat Agent Memory Storage

Storing secrets or credentials in agent context creates security liabilities. Production systems are inverting the pattern: secrets never enter agent memory—instead, they're injected at request boundaries by trusted infrastructure that intercepts and rewrites agent calls.

Audit your agent systems for secrets in context/memory. Redesign them so agents never see credentials—use interceptor layers (proxies, middleware, MCP servers) that inject secrets only when the agent makes specific API calls. Log all injection events for audit trails.
@jonas: Treat your AI agents in sandboxes the same way that awful corporations treat

Explicit recommendation to inject secrets at boundaries via proxy/middleware rather than loading into agent context, treating agents as untrusted sandboxes

Collaborative Framing Outperforms Imperative Commands

How you linguistically frame requests to AI systems affects output quality independently of semantic content. Inquiry framing ('can we?', 'what would it look like?') triggers different reasoning patterns than imperative commands ('do xyz'), suggesting framing is a hidden context engineering layer.

Rewrite your system prompts and task instructions from imperative commands to collaborative inquiry. Replace 'Generate X' with 'Can we explore generating X?' and 'Implement Y' with 'What would it look like if we implemented Y?' Test whether this shifts output quality for your specific use case.
@alexhillman: i haven't tested this with formal evals but I seem to get much better results

Practitioner discovers that collaborative/inquiry framing produces better results than imperative framing without changing the technical request

Harness Engineering Compounds Faster Than Model Capability

Multi-model harnesses with observability, consensus checks, and cost-optimized routing outperform single frontier models because harnesses create feedback loops that improve reliability over time. Each generation will need re-engineered harnesses—they won't obsolete despite model advances.

Stop waiting for the next model generation to solve reliability problems. Build harnesses with: (1) adversarial review layers where cheap models check expensive model outputs, (2) consensus mechanisms across multiple models, (3) observability hooks that log decision paths. These compound reliability faster than model upgrades.
@irl_danB: 'all your harnesses will be bitter lessoned into irrelevance'

Argues that frontier model capability + harness engineering together outperform either alone; harnesses solve orthogonal problems (cost, reliability, audit) that scale doesn't address

AI Agents Excel as Context Translators for Complex Systems

When systems have poor UX but rich APIs/structure, AI agents bypass UI friction by reading documentation and executing API calls directly. The agent preserves full system context without UI confusion, making them ideal for interfacing with legacy/complex systems.

Identify systems in your stack with terrible UIs but good APIs (legacy enterprise software, complex configuration systems, poorly-documented tools). Build AI agents to translate natural language intent into API calls. Give agents read access to documentation, example configurations, and system state—let them bypass the UI entirely.
@alexhillman: i forgot why I had given up on home assistant previously

Home Assistant UI too confusing to use directly; Claude Code bypasses UI by reading docs and executing API calls, succeeding where human UI interaction failed