← All concepts

token efficiency

143 articles · 15 co-occurring · 8 contradictions · 13 briefs

the result: ~415 tokens at session start + ~50 per turn. compared to Mem0 burning ~1,600 tokens of LLM input every time you save a single fact. 22x cheaper" — Provides specific quantitative evidence o

@SIGKITTEN: seems like that whole claude-code .69 update is a dud.

[INFERRED] "improve performance of the parent" — Article indicates async agents feature fails to achieve its performance goal (parent context compression) because subagent tool activity remains visible to parent, defeating token optimization strategy.

@jarrodwatts: why is claude code taking up 180% of my cpu usage

[INFERRED] "why is claude code taking up 180% of my cpu usage" — Social media report of excessive CPU consumption by Claude Code tool, indicating performance or resource efficiency concern

Claude Code Is Getting Bad - YouTube

[INFERRED] "Claude Code is very CPU hungry and inefficient now" — Article reports negative performance characteristics contradicting expectations of efficiency

@harjtaggar: Everybody I know using AI is working more hours not less.

[strong] "AI does not reduce work. It intensifies it." — Directly challenges assumption that AI adoption reduces workload; empirical study contradicts common productivity expectations

@helloiamleonie: Is it actually a productivity boost if I spent all the time saved on figuring...

[INFERRED] "Is it actually a productivity boost if I spent all the time saved on figuring out how to further use AI in my work?" — Article questions whether AI productivity gains are real when setup/learning costs are factored in. Challenges assumption that time savings equal actual productivity boost.

@badlogicgames: we all should do this. token count as a kpi is fucking insane.

[INFERRED] "token count as a kpi is fucking insane" — Article critiques using token count as a key performance indicator, arguing it drives the wrong optimization priorities and incentives in AI development

@dhasandev: tldr

[STRONG] "When you already know what to write on the test, you don't need to think hard. Finally, an honest optimization." — Article documents the system intentionally degrading reasoning effort (reducing token usage for thinking) when trajectory guidance is available, undermining genuine problem-solving optimization.

Anthropic tries to hide Claude's AI actions. Devs hate it | Hacker News

[STRONG] "The issue of it burning through tokens grepping around *should* be fixed with language server integration, but that's broken in Claude Code and the MCP code nav tools seem to use more tokens than just a home-built code map in markdown files" — Developer criticizes inefficient token usage in MCP code navigation tools, arguing they consume more tokens than simpler approaches, pointing to token-efficiency problems

2026-W15
661
2026-W14
471
2026-W12
1

the result: ~415 tokens at session start + ~50 per turn. compared to Mem0 burning ~1,600 tokens of LLM input every time you save a single fact. 22x cheaper" — Provides specific quantitative evidence o

AI does not reduce work. It intensifies it." — Directly challenges assumption that AI adoption reduces workload; empirical study contradicts common productivity expectations

$225/month on pure Opus vs $19/month with hierarchy" — Article demonstrates token efficiency gains through hierarchical routing, achieving 10x cost reduction by matching model capability to task compl

Programmatic Tool Calling:减少约 37% token 消耗;Dynamic Filtering:减少约 24% 输入 token;Tool Search Tool:减少约 85% 工具定义 token" — Article presents quantified token reduction strategies with specific percentage imp

Monitor tool is explicitly designed as 'big token saver' by reducing wasted polling cycles

The entire announcement is about choosing an I/O pattern specifically to reduce token consumption—a core context engineering concern.

[high] "On an energy basis, my best estimate is human efficiency for solving simple ARC v1 tasks is 1,000,000X higher than last December's unreleased o3 (High) preview" — Article provides empirical en

Set thinking.type: "adaptive" and let Claude decide based on task complexity." — Article presents adaptive thinking as a solution to manual token tuning, showing it intelligently adjusts resources bas

normal claude: ~180 tokens for a web search task caveman claude: ~45 tokens for the same task" — Provides concrete quantified evidence of token reduction (75% efficiency gain) through practical applic

My more important one is token efficiency. I had to run a prompt 100,000,000 times per week with Llama 8B and I worked to make my output only 1 token long." — Author demonstrates concrete implementati

Single-agent designs reduce token usage and API calls by maintaining context within one entity." — Article directly addresses token usage reduction as a benefit of single-agent architecture, making a

On realistic end-to-end large-session workloads (not toy microbenchmarks), pi_agent_rust is now: 极低的内存占用(比 Node.js 版本低 8–13 倍)" — The 8-13x lower memory footprint compared to Node.js version directly

显著减少上下文 token 使用,例如在使用 MCP 工具的运行中平均降低 46.9%" — The article provides concrete quantitative evidence of token efficiency gains through dynamic context loading, with specific 46.9% reduction metric

Token-Oriented Object Notation [TOON]" — TOON format is explicitly designed for token efficiency in context representation, demonstrating practical token-aware encoding approach

Compressing context - retaining only the tokens required to perform a task." — Article demonstrates token optimization through context compression, showing practical implementation of token efficiency

tokens above 200k are now charged at the same (not higher) per-token rate" — Article provides concrete pricing optimization data: flat per-token rate above 200k removes prior penalty for extended cont

保持系统提示稳定(不变换模型/上下文)以命中提示缓存;长会话用 /compress 总结历史;并行任务用 delegate_task 分流 Sub Agent" — Provides concrete implementation tactics (prompt caching, history compression, task delegation) achieving >50% token

Code Mode: LLM writes TS code → runs in isolate → calls typed APIs → only final result returns to context. 81% fewer tokens vs sequential tool calls." — Demonstrates quantified token savings (81%) thr

[DIRECT] "Codemode mcp -> ~50k tokens total, 1.5 mins. Perfect answer." — Direct evidence of token efficiency in practice - codemode MCP achieved identical results using significantly fewer tokens (50

Cost Optimization Through Context Smaller context means lower API costs per call." — Article directly links token management to cost efficiency, showing how context management reduces API expenses

Improved cost optimization: Static system prompt gets cached, dynamic context comes through tool messages." — Provides direct evidence that using cached static prompts with dynamic tool messages impro

This dual approach ensures the agent maintains awareness of its goals and progress (via the summary) while preserving the ability to recover specific details when needed (via filesystem search)." — Ar

This changes the economics of AI coding with MCP." — Article argues that Tool Search fundamentally improves token efficiency in MCP-based systems, with 46.9% reduction documented, supporting better to

A single Docker MCP server could consume 125,000 tokens just to define its 135 tools." — Article quantifies token inefficiency problem and presents MCP Tool Search as enabling better token efficiency

A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans" — Article demonstrates how prompt caching directly reduces token consumption and

~390X efficiency improvement in one year" — Article quantifies cost-per-task improvement: $4.5k → $11.64 with 390X efficiency gain, demonstrating significant progress in model cost efficiency.

Claude 会在后台监控 CI 流程。如果启用 "auto-fix",它会自动尝试修复失败;启用 "auto-merge",则在检查通过后自动合并 PR。" — Claude automates CI monitoring, failure fixing, and PR merging in background without developer intervention.

set a token threshold for when compaction kicks in" — Extends token optimization by allowing fine-grained control over when context compaction triggers, enabling intelligent token budget management ba

distributed architectures, token-based representation, and compression techniques to enhance memory efficiency and reduce computational overhead" — Article explicitly describes using token-based repre

The obvious reasons intelligence-per-watt is going up so fast: more efficient architectures, more efficient hardware, and higher quality data." — Article provides evidence that efficiency gains come f

Agent 能利用历史 checkpoint,少重复已纠正过的错误" — By capturing checkpoints with full session history, agents can avoid repeating previous errors and corrections, reducing token waste

Additional metadata like input token count is persisted with the span so that we can estimate the cost of the procedure." — Article shows how tracing captures input and output token counts at span lev

The issue of it burning through tokens grepping around *should* be fixed with language server integration, but that's broken in Claude Code and the MCP code nav tools seem to use more tokens than just

token usage is still a problem - spoke to one person who's spending $1-$2k a month on openai plans, very token optimized. he said he is going through ~1B tokens per day across all of his claws" — Arti

If you are in the yolo-let-the-harness-handle it you will never get comparable results to someone who is intentional with their context window" — Article argues that intentional context window managem

it wastes tokens and invites hallucinations" — Article explicitly critiques traditional approaches for wasting tokens and advocates for token-efficient skill design patterns

This was context pollution, and it was killing MCP adoption." — Article demonstrates token inefficiency problem where unused tool definitions waste tokens, and shows how MCP Tool Search improves token

also saves tokens because claude stops wasting context searching for the wrong files" — Article provides direct evidence that improved tool integration enables token efficiency by preventing Claude fr

95% Context reduction via lazy loading" — Article highlights specific token efficiency metric showing 95% reduction achievable through lazy loading techniques, demonstrating practical token optimizati

LLMが『コンテキストロット』によりトークンが増えるほど重要情報が埋もれ、精度が下がる" — Article describes context rot phenomenon where increased tokens reduce precision - fundamental motivation for token efficiency

Markdown is the most token-efficient format for Claude to read" — Article provides explicit guidance on token efficiency by recommending Markdown format for context files

For us a fresh session in our monorepo costs a baseline ~20k tokens (10%) with the remaining 180k for making your change — which can fill up quite fast" — Adds specific quantification of token costs a

let it eat 5-10k tokens checking the output 24 times in 2 minutes rather than just waiting for the command to finish" — Demonstrates concrete token waste scenario: polling generates 5-10k token consum

Because context windows are getting bigger (1 million tokens!)" — Article uses token scaling as evidence that naive context dumping is a widespread misconception despite larger windows

They are so insanely token efficient" — Article directly demonstrates that codemaps reduce token consumption compared to traditional code context methods

took forever, and the token count was enormous" — Adds practical insight that naive approaches to LLM-assisted code generation incur massive token costs, driving need for optimization strategies

converting HTML to markdown can reduce a 16,000-token page to ~3,000" — Demonstrates concrete technique for reducing input tokens through format conversion, directly supporting token efficiency.

[INFERRED] "inference is cheap, context is expensive" — Establishes that optimization focus should be on context efficiency rather than inference efficiency; inverts typical cost assumptions

CI 友好:GitHub Action 示例,--provider=local --ci --threshold=0.8" — Article provides concrete example of Skillgrade integration with GitHub Actions CI, demonstrating practical agent evaluation in CI/CD pi

I need a way to handle this accumulation efficiently to optimize latency and token usage." — Article identifies token usage optimization as a key requirement for the context management solution

query this concept
$ db.articles("token-efficiency")
$ db.cooccurrence("token-efficiency")
$ db.contradictions("token-efficiency")