context compression
66 articles · 15 co-occurring · 1 contradictions · 12 briefs
Context compression refers to techniques that reduce the volume of information in an agent's working memory while preserving the details relevant to completing the task." — Article provides explicit d
[STRONG] "burn context fast, and can loop when the cache starts clearing" — GLM 4.7 Flash exhibits rapid context exhaustion and degraded performance as context cache fills, a constraint on its agentic capability
Context engineering is about designing the entire information environment around the AI. Not just what you ask, but what the AI already knows when you ask it." — Directly defines context engineering a
Context compression refers to techniques that reduce the volume of information in an agent's working memory while preserving the details relevant to completing the task." — Article provides explicit d
compressing large amounts of data to improve efficiency" — Article explicitly identifies data compression as a context engineering technique to improve efficiency.
We also maintain the same rolling compression system from slate V0 that let it run single sessions for as long as 2 days" — Article describes a rolling compression system as key to enabling long-runni
take a large set of tokens and turn it into a smaller set of tokens that is most relevant and meaning-rich for the task at hand" — Article provides a concrete definition and implementation of context
Auto-compaction is the explicit mechanism for context compression described in the tweet
Lecture explicitly focuses on context compression as a technique for managing large, complex context windows.
Master the Context Stack system prompts, tasks, RAG, tool outputs, and history" — Book directly teaches context stack engineering as a core architecture pattern for autonomous agents
AI Agent 需要大量工具来完成实际任务,但每个工具的描述都会占用宝贵的上下文空间,导致任务输入空间受限" — Article identifies context space as a critical constraint for tool-heavy agents and proposes dynamic tool discovery (search() interface) as a
compress the context. find everything necessary and gather good context." — Article explicitly recommends context compression as a practical strategy to manage complexity and prevent issues.
Best practices for avoiding context distraction involve periodically summarizing or compressing conversation history, outdated details, and prioritizing recent context through scoring mechanisms or re
Prompt compaction = when the context window gets close to full, model generates a shorter summary" — Article describes prompt compaction as a concrete implementation of context compression, calling it
支持自动压缩历史记录以适应模型上下文窗口" — Demonstrates automatic history compression as a mechanism to manage context window constraints.
4 formats (YAML, Markdown, JSON, Token-Oriented Object Notation [TOON])" — Evaluating multiple file format representations is a direct exploration of how to compress and encode structured schemas for
Performance gains in 2026 come from dynamic context selection, compression, and memory management." — Article directly lists compression as one of the three core performance drivers in modern producti
要約・圧縮・外部メモリ・検索・サブエージェントが主要技術" — Article explicitly identifies compression (圧縮) as a primary technique for effective context engineering
One idea I had is to compress the web search results after each tool call." — Article demonstrates context compression as a practical solution for managing accumulated results
它根据当前任务,从海量的"上下文文件"中选出最相关的部分,进行优先级排序和压缩,生成一份"清单"(Manifest)。" — The context constructor component directly implements context compression through prioritization and compression techniques as part of th
context processing is especially important because it decides how retrieved information is cleaned, organized, and compressed before reaching the model." — Article directly discusses compression as a
We also maintain the same rolling compression system from slate V0 that let it run single sessions for as long as 2 days as reported by our customers." — Article describes a specific implementation of
We also maintain the same rolling compression system from slate V0 that let it run single sessions for as long as 2 days as reported by our customers." — Article demonstrates context compression as a
Slate has episodic memory that actually makes sense. The system retains only the tool calls that contribute to its success. We also maintain the same rolling compression system from slate V0 that let
一份 100 页的建筑变更订单,解析后产生约 20 万行 JSON...其中真正有用的合同条款、单价表内容占比极低,大部分是'坐标数组'和'元数据字段'" — Article demonstrates problem that context compression solves: removing non-content metadata reduces 200k tokens to actio
Slate has episodic memory that actually makes sense. The system retains only the tool calls that contribute to its success. We also maintain the same rolling compression system from slate V0 that let
Component 4 explicitly names 'context reduction: clip, dedup, compress' as critical. This is applied context compression strategy.
Summarization for Compression Condense long conversations into shorter summaries." — Article explicitly mentions summarization as a compression strategy for managing context
We also maintain the same rolling compression system from slate V0 that let it run single sessions for as long as 2 days as reported by our customers." — Slate implements rolling compression to enable
The code acts as a compact plan. The model can explore tool operations, compose multiple calls, and return just the data it needs" — Code Mode exemplifies context compression by converting verbose too
Context engineering employs four key strategies to manage the context window effectively: writing, selecting, compressing, and isolating context." — Article identifies context compression as one of fo
compaction strategies" — Article demonstrates practical application of context compression techniques as a key strategy for scaling coding agents in production.
large files attached to a chat can be condensed to fit in the context limit" — Demonstrates practical context compression technique in modern AI tools like Cursor.
MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) is the solution to this... Applies random linear projection to compress each sub-vector (following the Johnson-Lindenstrauss Lemma to pr
Let it compact. I don't know how they do it but it's great." — Author demonstrates practical use of automatic context compaction (compression) in Codex and reports zero drift over extended sessions, p
The human visual system actually processes 40 to 50 bits per second after spatial compression. Much, much less if you add temporal compression over a long time horizon." — Article explicitly discusses
We also maintain the same rolling compression system from slate V0 that let it run single sessions for as long as 2 days as reported by our customers." — Slate implements rolling compression to extend
读取时分层检索:先拉摘要,问 LLM '够了吗',不够再下钻到具体事实。" — Article demonstrates context compression through hierarchical retrieval: summaries first, then drilling into facts only when needed. This directly exemplifies s
大型或潜在无关的数据(如工具输出、聊天历史、终端会话)被转换为文件或引用,而不是直接注入提示" — The approach compresses context by converting large data into file references accessible via dynamic retrieval rather than inline injection
presumably Suhail is thinking about the compaction problem as it occurs in long running agents like claude code" — Author identifies that the call-stack architecture addresses the compaction problem t
记忆文件采用层级目录结构,文件名和目录层级本身就是导航信号。文件树结构始终在系统提示中——Agent 随时知道自己"记得什么"" — Hierarchical directory structure with progressive disclosure (system/ for resident memory, selective loading elsewhere) reduces token
The agent silently writes durable memories to disk before compaction hits. But after the window resets, the agent can't systematically browse what it flushed." — Article demonstrates practical example
A useful memory system needs to handle: Write control (what deserves to become a memory vs. passing noise), Deduplication (collapsing repeated information into canonical facts), Reconciliation (handli
Handles inputs 100× beyond model context windows" — The symbolic handles and recursive approach allow RLMs to effectively compress context representation, enabling processing of vastly larger inputs w
Knowledge flowing human → agent1 → agent2 → human is context compression pipeline. Each hop filters/encodes knowledge more efficiently than original codebase would.
Explicitly discusses 'compressed context' as the mechanism by which orchestrator prepares subagent input; critical under bandwidth/latency constraints.
Explicitly identifies context compression as architectural decision with consequences: 'what information will be retained, what will be discarded' determines memory quality
Claude provides `/compact` which also runs faster in CC 2.0 but sometimes I prefer to make Claude write what happened in current session (with some specific stuff) before I kill it and start a new one
Compression reduces cost and maintains continuity but may impact quality" — Article directly discusses compression as a context management strategy within the tradeoff triangle
We also maintain the same rolling compression system from slate V0 that let it run single sessions for as long as 2 days as reported by our customers." — Slate implements rolling compression to mainta
In this definitive guide, we won't just explore what context is. We'll break it down like systems engineers: how to structure it, isolate it, store it, retrieve it, compress it, and shape it over time
With continuous background rebuilding, there's no cliff. The cache stays clean because you're always pruning slightly, invisibly, as you go." — Article explains continuous cache pruning and summarizat