← All concepts

token efficiency

192 articles · 15 co-occurring · 10 contradictions · 57 briefs

the result: ~415 tokens at session start + ~50 per turn. compared to Mem0 burning ~1,600 tokens of LLM input every time you save a single fact. 22x cheaper" — Provides specific quantitative evidence o

@dexhorthy: keep the lights on

[strong] "Velocity isn't going up when you consider how much effort was spent fixing things post-merge, pre-deploy" — Article challenges simplistic velocity metrics, showing they hide downstream rework costs from AI-generated code issues

@burkeholland: "Going forward, companies should budget $100 USD to $500 USD per day, per dev...

High token budgets suggest inefficiency in context management. Good context engineering should *reduce* token consumption, not require higher budgets. This tweet frames costs as fixed, but the CE thesis suggests costs are a symptom of poor context design.

@dbreunig: So let's recap. If I have this right…

[strong] "he's on claude max 20x at $200 a month. yesterday claude code hit him with "you're out of extra usage" out of nowhere. his dashboard showed 13% weekly usage. 0% current session. 86% of his plan was sitting there untouched but $200.98 in extra usage already burned through" — Documents a real billing bug where token usage was incorrectly charged despite available plan allocation, showing failures in token management systems.

@SIGKITTEN: seems like that whole claude-code .69 update is a dud.

[INFERRED] "improve performance of the parent" — Article indicates async agents feature fails to achieve its performance goal (parent context compression) because subagent tool activity remains visible to parent, defeating token optimization strategy.

@jarrodwatts: why is claude code taking up 180% of my cpu usage

[INFERRED] "why is claude code taking up 180% of my cpu usage" — Social media report of excessive CPU consumption by Claude Code tool, indicating performance or resource efficiency concern

Claude Code Is Getting Bad - YouTube

[INFERRED] "Claude Code is very CPU hungry and inefficient now" — Article reports negative performance characteristics contradicting expectations of efficiency

@harjtaggar: Everybody I know using AI is working more hours not less.

[strong] "AI does not reduce work. It intensifies it." — Directly challenges assumption that AI adoption reduces workload; empirical study contradicts common productivity expectations

@helloiamleonie: Is it actually a productivity boost if I spent all the time saved on figuring...

[INFERRED] "Is it actually a productivity boost if I spent all the time saved on figuring out how to further use AI in my work?" — Article questions whether AI productivity gains are real when setup/learning costs are factored in. Challenges assumption that time savings equal actual productivity boost.

@badlogicgames: we all should do this. token count as a kpi is fucking insane.

[INFERRED] "token count as a kpi is fucking insane" — Article critiques using token count as a key performance indicator, arguing it drives the wrong optimization priorities and incentives in AI development

@dhasandev: tldr

[STRONG] "When you already know what to write on the test, you don't need to think hard. Finally, an honest optimization." — Article documents the system intentionally degrading reasoning effort (reducing token usage for thinking) when trajectory guidance is available, undermining genuine problem-solving optimization.

2026-W22
168
2026-W21
1156
2026-W20
1111
2026-W19
770
2026-W18
1068
2026-W17
1029
2026-W16
998
2026-W15
1079
2026-W14
471
2026-W12
1

the result: ~415 tokens at session start + ~50 per turn. compared to Mem0 burning ~1,600 tokens of LLM input every time you save a single fact. 22x cheaper" — Provides specific quantitative evidence o

AI does not reduce work. It intensifies it." — Directly challenges assumption that AI adoption reduces workload; empirical study contradicts common productivity expectations

$225/month on pure Opus vs $19/month with hierarchy" — Article demonstrates token efficiency gains through hierarchical routing, achieving 10x cost reduction by matching model capability to task compl

Programmatic Tool Calling:减少约 37% token 消耗;Dynamic Filtering:减少约 24% 输入 token;Tool Search Tool:减少约 85% 工具定义 token" — Article presents quantified token reduction strategies with specific percentage imp

Both examples directly reduce token consumption through design choices rather than model improvements

Author's observation about 1/10th tool calls is directly measuring token efficiency—smarter model uses context window more efficiently by making better decisions.

Article explicitly frames context engineering as cost-efficiency mechanism via 'keeping every token high-signal.'

Author's core problem is token waste from unnecessary polling cycles. This is a direct token efficiency issue.

Monitor tool is explicitly designed as 'big token saver' by reducing wasted polling cycles

The entire announcement is about choosing an I/O pattern specifically to reduce token consumption—a core context engineering concern.

[high] "On an energy basis, my best estimate is human efficiency for solving simple ARC v1 tasks is 1,000,000X higher than last December's unreleased o3 (High) preview" — Article provides empirical en

Set thinking.type: "adaptive" and let Claude decide based on task complexity." — Article presents adaptive thinking as a solution to manual token tuning, showing it intelligently adjusts resources bas

normal claude: ~180 tokens for a web search task caveman claude: ~45 tokens for the same task" — Provides concrete quantified evidence of token reduction (75% efficiency gain) through practical applic

My more important one is token efficiency. I had to run a prompt 100,000,000 times per week with Llama 8B and I worked to make my output only 1 token long." — Author demonstrates concrete implementati

Single-agent designs reduce token usage and API calls by maintaining context within one entity." — Article directly addresses token usage reduction as a benefit of single-agent architecture, making a

On realistic end-to-end large-session workloads (not toy microbenchmarks), pi_agent_rust is now: 极低的内存占用(比 Node.js 版本低 8–13 倍)" — The 8-13x lower memory footprint compared to Node.js version directly

显著减少上下文 token 使用,例如在使用 MCP 工具的运行中平均降低 46.9%" — The article provides concrete quantitative evidence of token efficiency gains through dynamic context loading, with specific 46.9% reduction metric

Token-Oriented Object Notation [TOON]" — TOON format is explicitly designed for token efficiency in context representation, demonstrating practical token-aware encoding approach

Compressing context - retaining only the tokens required to perform a task." — Article demonstrates token optimization through context compression, showing practical implementation of token efficiency

tokens above 200k are now charged at the same (not higher) per-token rate" — Article provides concrete pricing optimization data: flat per-token rate above 200k removes prior penalty for extended cont

保持系统提示稳定(不变换模型/上下文)以命中提示缓存;长会话用 /compress 总结历史;并行任务用 delegate_task 分流 Sub Agent" — Provides concrete implementation tactics (prompt caching, history compression, task delegation) achieving >50% token

Code Mode: LLM writes TS code → runs in isolate → calls typed APIs → only final result returns to context. 81% fewer tokens vs sequential tool calls." — Demonstrates quantified token savings (81%) thr

[DIRECT] "Codemode mcp -> ~50k tokens total, 1.5 mins. Perfect answer." — Direct evidence of token efficiency in practice - codemode MCP achieved identical results using significantly fewer tokens (50

Cost Optimization Through Context Smaller context means lower API costs per call." — Article directly links token management to cost efficiency, showing how context management reduces API expenses

Improved cost optimization: Static system prompt gets cached, dynamic context comes through tool messages." — Provides direct evidence that using cached static prompts with dynamic tool messages impro

This dual approach ensures the agent maintains awareness of its goals and progress (via the summary) while preserving the ability to recover specific details when needed (via filesystem search)." — Ar

This changes the economics of AI coding with MCP." — Article argues that Tool Search fundamentally improves token efficiency in MCP-based systems, with 46.9% reduction documented, supporting better to

A single Docker MCP server could consume 125,000 tokens just to define its 135 tools." — Article quantifies token inefficiency problem and presents MCP Tool Search as enabling better token efficiency

A high prompt cache hit rate decreases costs and helps us create more generous rate limits for our subscription plans" — Article demonstrates how prompt caching directly reduces token consumption and

Agents don't know how to effectively use extra resources. They leave 85% of their budget untouched." — Research demonstrates agents fail to efficiently utilize allocated computational resources, suppo

LangChain formalised four strategies: write (persist context externally), select (retrieve what's relevant via RAG), compress (summarise and compact), and isolate (separate contexts for different agen

[direct] "Every token processed and every millisecond of compute impacts scalability, user experience, and sustainability." — Article provides explicit evidence that token-level and latency optimizati

Provides concrete measurement and optimization strategies for reducing token waste in multi-turn sessions.

~390X efficiency improvement in one year" — Article quantifies cost-per-task improvement: $4.5k → $11.64 with 390X efficiency gain, demonstrating significant progress in model cost efficiency.

Claude 会在后台监控 CI 流程。如果启用 "auto-fix",它会自动尝试修复失败;启用 "auto-merge",则在检查通过后自动合并 PR。" — Claude automates CI monitoring, failure fixing, and PR merging in background without developer intervention.

set a token threshold for when compaction kicks in" — Extends token optimization by allowing fine-grained control over when context compaction triggers, enabling intelligent token budget management ba

distributed architectures, token-based representation, and compression techniques to enhance memory efficiency and reduce computational overhead" — Article explicitly describes using token-based repre

The obvious reasons intelligence-per-watt is going up so fast: more efficient architectures, more efficient hardware, and higher quality data." — Article provides evidence that efficiency gains come f

Agent 能利用历史 checkpoint,少重复已纠正过的错误" — By capturing checkpoints with full session history, agents can avoid repeating previous errors and corrections, reducing token waste

Additional metadata like input token count is persisted with the span so that we can estimate the cost of the procedure." — Article shows how tracing captures input and output token counts at span lev

The issue of it burning through tokens grepping around *should* be fixed with language server integration, but that's broken in Claude Code and the MCP code nav tools seem to use more tokens than just

token usage is still a problem - spoke to one person who's spending $1-$2k a month on openai plans, very token optimized. he said he is going through ~1B tokens per day across all of his claws" — Arti

If you are in the yolo-let-the-harness-handle it you will never get comparable results to someone who is intentional with their context window" — Article argues that intentional context window managem

it wastes tokens and invites hallucinations" — Article explicitly critiques traditional approaches for wasting tokens and advocates for token-efficient skill design patterns

This was context pollution, and it was killing MCP adoption." — Article demonstrates token inefficiency problem where unused tool definitions waste tokens, and shows how MCP Tool Search improves token

also saves tokens because claude stops wasting context searching for the wrong files" — Article provides direct evidence that improved tool integration enables token efficiency by preventing Claude fr

95% Context reduction via lazy loading" — Article highlights specific token efficiency metric showing 95% reduction achievable through lazy loading techniques, demonstrating practical token optimizati

LLMが『コンテキストロット』によりトークンが増えるほど重要情報が埋もれ、精度が下がる" — Article describes context rot phenomenon where increased tokens reduce precision - fundamental motivation for token efficiency

Markdown is the most token-efficient format for Claude to read" — Article provides explicit guidance on token efficiency by recommending Markdown format for context files

pricing models are now debated almost as intensely as capabilities, especially as more tools move toward usage-based billing and tighter limits" — Article emphasizes token/usage-based billing as criti

query this concept
$ db.articles("token-efficiency")
$ db.cooccurrence("token-efficiency")
$ db.contradictions("token-efficiency")