← Latest brief

Brief #38

26 articles analyzed

The field is splitting into two approaches: training models to self-manage context (RLMs, agent trajectories) versus building external systems that structure context through artifacts and architecture. Practitioners are validating that clarity of problem definition—not model capability—is the actual bottleneck, while simultaneously proving that intelligence compounds when context is preserved through deliberate architectural choices (stateless sessions with artifacts, MCP bridges, file-based knowledge stores).

Recursive Language Models: Context Self-Management as Training Target

The research frontier is shifting from external prompt engineering to training models that introspect and optimize their own context usage. This represents a fundamental architectural bet: context management as a learned capability rather than an engineering problem.

Monitor RLM research but don't wait for it—the underlying insight (models need explicit context management strategy) applies now. Document your context engineering patterns as they could become training data for future context-aware models.
@lateinteraction: Excellent bet as usual from the @PrimeIntellect folks

Omar Khattab endorses Prime Intellect's research direction that models trained for self-context-management will achieve breakthrough long-horizon capabilities

@slow_developer: i think the next big breakthrough is making context engineering trainable

Proposes making context robustness and uncertainty handling optimization targets during training via RL, shifting from application-level to training-level solution

@a1zhang: Much like the switch in 2025 from language models to reasoning models

RLMs allow models to treat their own prompts as code-manipulable objects, enabling recursive prompt optimization and self-improving context

@rosinality: Report on building a small LLM, from pre-training to post-training

Tencent research demonstrates agent trajectory training improves multi-turn context coherence, proving context sequencing patterns are learnable


Stateless Sessions with Structured Artifacts Beat Conversation Memory

Practitioners are discovering that fresh LLM invocations with complete structured context (git state + progress narrative + work queue) outperform conversation-based memory. Intelligence compounds through artifacts, not chat history.

Stop relying on conversation memory for long-running agent work. Instead: (1) Structure context as durable artifacts (specs, progress logs, state files), (2) Invoke fresh sessions with complete context, (3) Use version control to track context evolution. Your artifacts ARE your memory.
@jonas: In the limit, these self-evolving intelligent systems are just a magic transform

Runs autonomous multi-sprint work by feeding fresh Claude invocations (git status + progress.txt + TODO) each iteration—no conversation context needed

Instruction Engineering Outperforms Model Selection by 3:1

Blind testing reveals that rubric/instruction clarity drives 77% user preference swing while model selection drives only 56%. The bottleneck is problem definition, not model capability—validating the core thesis with empirical data.

Stop model shopping. Invest 3x more effort in clarifying instructions, rubrics, and success criteria. Run blind A/B tests on instruction variants before switching models. Document your rubrics as compounding assets.
@Vvsotnikov: Ran a very interesting @DSPyOSS GEPA experiment over the last month

DSPy GEPA experiment shows rubric v1 vs v2 (same model) = 77% preference, while model swap (same rubric) = 56% preference

MCP Security: Context Integration Creates Attack Surface

As MCP adoption scales, the field is discovering that each context integration point (tool, data source, prompt injection vector) is simultaneously a capability multiplier and security vulnerability. Context architecture decisions determine system safety.

Treat every MCP integration as a security boundary. Implement: (1) Tool output validation before feeding to context, (2) Explicit user approval gates for high-stakes actions, (3) Read-only vs write access separation, (4) Audit logs of what context was accessed when. Test for prompt injection at every tool interface.
Model Context Protocol - Wikipedia

Wikipedia citations reveal MCP implementations face documented prompt injection, tool poisoning, and context boundary vulnerabilities

Hypothetico-Deductive Loops Break Agent Hallucination Cycles

Forcing agents to make explicit predictions and verify them via executable code creates epistemological grounding that prevents doom loops. This is context engineering at the architectural level—structure context to include prediction + verification + feedback.

For any agent workflow involving system understanding or data analysis, add verification steps: (1) Force agent to state predictions explicitly, (2) Provide tools to test predictions (code execution, API calls, queries), (3) Feed results back into context, (4) Require agent to reconcile prediction vs reality before proceeding.
@MaximeRivest: RLM is a very inspiring type of general inference strategy for LLM agents

RLM research proposes Hypothetico-Deductive Loop pattern: agent predicts system behavior → writes test code → observes results → updates understanding