Decompute is a Claude Code cost-control gateway that compresses logs, diffs, tool output, and repeated context before they hit your Anthropic bill.
Less context. Same Claude.
Free VS Code extension · prefer a terminal? Set up via pip
Every request carries more than your prompt: large diffs, test output, logs, project files, MCP and tool results, and context repeated turn after turn. Costs scale with context size — so token usage compounds quietly until the bill arrives.
Four layers of Claude API cost reduction, applied automatically to every request.
Logs, diffs, JSON, and tool output are compressed before they're sent — fewer input tokens, same meaning.
Stable context stays byte-stable, so Anthropic prompt caching hits more often and more of your bill lands at the cheaper cache-read rate.
Bulky blobs become opaque references. Originals wait in an encrypted store, retrieved only when Claude actually needs them.
Raw versus sent tokens on every request, by workload and engine — Claude Code cost optimization you can verify, not assume.
Each engine handles a different shape of context — code, prose, structured data, and stable references. Together they reduce Claude Code token usage automatically, behind a single Anthropic-compatible endpoint.
A Rust core wrapping decompute 0.22.5 with SmartCrusher and CodeCompressor — built for high-volume, low-latency gateway traffic.
A YSCompress-v1 model on torch 2.12.0 handles logs, free-form tool output, and anything where structure is loose and signal is sparse.
Backed by tree-sitter. Parses the actual syntax tree, keeps signatures, types, and changed nodes — drops what Claude doesn't need.
Replace bulky blobs with opaque references. Originals live in an encrypted store, retrievable via the ccr_retrieve tool when actually needed.
Point Claude Code at Decompute. The gateway compresses, caches, and routes — then forwards a slim payload to Anthropic. Same model. Same workflow. Designed to preserve answer quality, with a smaller bill.
Live savings, by request, by workload, by capability. See exactly what you saved and which engine did the saving.
| workload | requests | raw tokens | sent tokens | savings |
|---|---|---|---|---|
| structured_json | 4 | 131,224 | 29,695 | 77.4% |
| code_diff | 11 | 284,910 | 52,318 | 81.6% |
| test_logs | 6 | 96,440 | 9,210 | 90.5% |
Already using Anthropic prompt caching? Keep it. They solve different halves of the Claude API cost problem.
Helps when repeated prompt prefixes stay stable across requests. Cache reads are billed at a fraction of the base input-token rate, cache writes at a premium — so hit rate is everything.
Works before the request is sent: compresses noisy logs, diffs, JSON, and tool output, and keeps stable context stable — so requests are smaller and cache-friendlier.
Decompute exposes an Anthropic-compatible gateway endpoint. Claude Code already supports routing through it — one ANTHROPIC_BASE_URL and you're done. Prefer the editor? The Decompute Claude Savings extension for VS Code sets this up for you.
Runs locally. Originals stay on your machine; only the compressed request is forwarded to Anthropic or the provider you configure.
Claude Code already supports gateway routing via ANTHROPIC_BASE_URL.
Latest request, rolling median, lifetime savings — by workload, by capability.
Estimate your Claude token savings and monthly Claude API cost reduction from your own numbers.
savings = 1 − sent_tokens ÷ raw_tokens · Removed tokens are priced at your input rate, with cache reads counted at ~0.1× base input. Model prices are editable defaults — verify current rates on Anthropic's pricing page. Output tokens are unaffected by compression and excluded here.A cost-control gateway shouldn't become a data liability. Here is exactly where your context goes.
The gateway runs on 127.0.0.1 by default. Hosted mode exists for teams that want shared dashboards — it's optional, never required.
Blobs replaced by CCR references live in an encrypted store on your side, retrievable via the ccr_retrieve tool only when a request actually needs them.
Compressed requests are forwarded to Anthropic or the Anthropic-compatible provider you configure. Decompute does not host Claude; in local mode, nothing is sent to Decompute's servers.
Your Anthropic API key stays in your environment. Decompute passes requests through to your configured endpoint — billing and account stay with you.
Aligned with cost control. The more we save you, the better the deal gets.
decompute usage prints it in your terminal.Run Decompute locally for a week — or send a week of usage logs — and get a free Claude savings report for your team: token savings, dollar savings, and which workloads compress best.
Prefer to start now? Install the local gateway → · or estimate savings yourself →