Claude Code token savings

Cut Claude Code token usage.
Keep your workflow.

Decompute is a Claude Code cost-control gateway that compresses logs, diffs, tool output, and repeated context before they hit your Anthropic bill.

Less context. Same Claude.

Free VS Code extension · prefer a terminal? Set up via pip

Local-firstAnthropic-compatiblePer-request savingsEncrypted references
decompute.local · gateway · example data
Gateway overview
Example data — five-developer team, 30 days · YSCompress & Yellowstone
all systems operational
Latest request
93.2%
Rolling median
82.1%
Tokens saved · 30d
38.4M
Est. saved · 30d
$187
scroll · how it works ↓

Why Claude Code token usage
grows fast.

Every request carries more than your prompt: large diffs, test output, logs, project files, MCP and tool results, and context repeated turn after turn. Costs scale with context size — so token usage compounds quietly until the bill arrives.

Illustrative anatomy of a long Claude Code session request. The green sliver is the part you typed.

How Decompute
saves Claude tokens.

Four layers of Claude API cost reduction, applied automatically to every request.

01 — COMPRESS

Context compression

Logs, diffs, JSON, and tool output are compressed before they're sent — fewer input tokens, same meaning.

02 — CACHE

Cache reuse optimization

Stable context stays byte-stable, so Anthropic prompt caching hits more often and more of your bill lands at the cheaper cache-read rate.

03 — REFERENCE

Reference replacement

Bulky blobs become opaque references. Originals wait in an encrypted store, retrieved only when Claude actually needs them.

04 — MEASURE

Per-request savings dashboard

Raw versus sent tokens on every request, by workload and engine — Claude Code cost optimization you can verify, not assume.

The stack.
Four engines.

Each engine handles a different shape of context — code, prose, structured data, and stable references. Together they reduce Claude Code token usage automatically, behind a single Anthropic-compatible endpoint.

rust_core active

The hot path. Native speed.

A Rust core wrapping decompute 0.22.5 with SmartCrusher and CodeCompressor — built for high-volume, low-latency gateway traffic.

decompute 0.22.5 · SmartCrusher · CodeCompressor
yscompress-v1 active

For prose and free-form output.

A YSCompress-v1 model on torch 2.12.0 handles logs, free-form tool output, and anything where structure is loose and signal is sparse.

torch 2.12.0 · YSCompress-v1 · prose & logs
ast_code_aware active

Syntax-aware. Not text-aware.

Backed by tree-sitter. Parses the actual syntax tree, keeps signatures, types, and changed nodes — drops what Claude doesn't need.

tree-sitter · multi-language · diff-aware
ccr_active active

Originals on hold, not on the wire.

Replace bulky blobs with opaque references. Originals live in an encrypted store, retrievable via the ccr_retrieve tool when actually needed.

encrypted store · ccr_retrieve tool · reversible

One env var.
That's the setup.

Point Claude Code at Decompute. The gateway compresses, caches, and routes — then forwards a slim payload to Anthropic. Same model. Same workflow. Designed to preserve answer quality, with a smaller bill.

from
Claude Code
your existing CLI
no workflow changes
84,000raw tokens
Decompute
YSCompress · Yellowstone · CCR
−78% on this request
to
Anthropic
same model
same workflow
18,700sent tokens

Every request.
Measured.

Live savings, by request, by workload, by capability. See exactly what you saved and which engine did the saving.

decompute.local — /dashboard · demo data
Decompute Gateway
Claude optimization through YSCompress context compression and Yellowstone routing. demo data — local gateway session
Savings
Latest request
93.2%
most recent request
Rolling median
82.1%
across 4 requests
Lifetime $ saved
$0.0544
101,529 tokens
Lifetime cost
$0.0534
4 requests
Savings by workload
Demo savings by workload: requests, raw tokens, sent tokens, and percent saved
workloadrequestsraw tokenssent tokenssavings
structured_json4131,22429,69577.4%
code_diff11284,91052,31881.6%
test_logs696,4409,21090.5%
Compressor capabilities
rust_coreactive
decompute 0.22.5 importable; SmartCrusher/CodeCompressor active
yscompress-v1active
torch 2.12.0; YSCompress-v1 ML text compression available
ast_code_awareactive
tree_sitter present; source-code AST compression available
ccr_activeactive
CCR reference-replacement active (encrypted store + ccr_retrieve tool)
Local setup
export DECOMPUTE_BASE_URL=http://127.0.0.1:8080 export DECOMPUTE_API_KEY=dc_dev_local export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
How savings are calculated
savings = 1 − sent_tokens ÷ raw_tokens raw_tokens — tokens before compression sent_tokens — tokens after compression + reference replacement $ saved — removed tokens priced at your model's input rate, cache reads at the cache-read rate
All numbers on this dashboard are demo data from a local gateway session. Actual Claude token savings vary by workload — code diffs, test logs, and structured JSON compress differently. Full methodology →

Prompt caching
and Decompute.

Already using Anthropic prompt caching? Keep it. They solve different halves of the Claude API cost problem.

ANTHROPIC

Prompt caching

Helps when repeated prompt prefixes stay stable across requests. Cache reads are billed at a fraction of the base input-token rate, cache writes at a premium — so hit rate is everything.

DECOMPUTE

Context compression

Works before the request is sent: compresses noisy logs, diffs, JSON, and tool output, and keeps stable context stable — so requests are smaller and cache-friendlier.

Use both. Decompute reduces what you send and improves cache reuse — it complements Claude prompt caching, it doesn't replace it. Full comparison →

Claude Code setup.
Two minutes.

Decompute exposes an Anthropic-compatible gateway endpoint. Claude Code already supports routing through it — one ANTHROPIC_BASE_URL and you're done. Prefer the editor? The Decompute Claude Savings extension for VS Code sets this up for you.

01

Install & start the gateway

Runs locally. Originals stay on your machine; only the compressed request is forwarded to Anthropic or the provider you configure.

02

Set three env vars

Claude Code already supports gateway routing via ANTHROPIC_BASE_URL.

03

Open the dashboard

Latest request, rolling median, lifetime savings — by workload, by capability.

~/your-project
$ pip install decompute $ decompute gateway start ✓ gateway listening on 127.0.0.1:8080 ✓ rust_core · yscompress-v1 · ast_code_aware · ccr_active $ export DECOMPUTE_BASE_URL=http://127.0.0.1:8080 $ export DECOMPUTE_API_KEY=dc_dev_local $ export ANTHROPIC_BASE_URL=http://127.0.0.1:8080 $ claude # later... $ decompute usage latest: 93.2% · median: 82.1% · saved today: $6.10

Claude Code
cost calculator.

Estimate your Claude token savings and monthly Claude API cost reduction from your own numbers.

Token savings
1 − sent ÷ raw
Est. saved per month
Of your Claude bill
share of current monthly spend
Estimates only. savings = 1 − sent_tokens ÷ raw_tokens · Removed tokens are priced at your input rate, with cache reads counted at ~0.1× base input. Model prices are editable defaults — verify current rates on Anthropic's pricing page. Output tokens are unaffected by compression and excluded here.

Local-first.
By design.

A cost-control gateway shouldn't become a data liability. Here is exactly where your context goes.

Local gateway

The gateway runs on 127.0.0.1 by default. Hosted mode exists for teams that want shared dashboards — it's optional, never required.

Encrypted reference store

Blobs replaced by CCR references live in an encrypted store on your side, retrievable via the ccr_retrieve tool only when a request actually needs them.

Honest data flow

Compressed requests are forwarded to Anthropic or the Anthropic-compatible provider you configure. Decompute does not host Claude; in local mode, nothing is sent to Decompute's servers.

Your keys stay yours

Your Anthropic API key stays in your environment. Decompute passes requests through to your configured endpoint — billing and account stay with you.

Pricing.
Flat, never usage.

Aligned with cost control. The more we save you, the better the deal gets.

Free
For individual developers.
$0
  • Local gateway
  • YSCompress preview
  • 7 days of usage
  • Community support
Start free
Pro Popular
For serious Claude Code users.
$15 /mo
  • Full local history
  • Advanced dashboard
  • CCR encrypted store
  • Savings reports
Get Pro
Enterprise
For platform teams.
Custom
  • SSO / SAML
  • VPC / on-prem
  • Yellowstone routing
  • Security review
Talk to us
Flat pricing, never metered on usage. If we save you less than your subscription in your first month, we extend your trial.

Questions,
answered.

Does Decompute reduce Claude Code token usage?
Yes. The gateway compresses logs, diffs, JSON, and tool output, and replaces bulky blobs with references before each request reaches the Anthropic API. On the demo workloads above, noisy context compressed 77–90% — your results depend on workload mix.
How does Decompute save Claude tokens?
Four mechanisms: context compression, cache reuse optimization, reference replacement for bulky blobs, and a per-request savings dashboard so you can verify the first three.
Does Decompute replace Claude prompt caching?
No — it complements it. Prompt caching helps when repeated prompt prefixes stay stable. Decompute makes each request smaller before it's sent and keeps stable context byte-stable, which raises your cache hit rate.
Does Decompute work with ANTHROPIC_BASE_URL?
Yes. Claude Code supports routing requests through a gateway via the ANTHROPIC_BASE_URL environment variable, and Decompute exposes an Anthropic-compatible /v1/messages endpoint. Setup is one export.
Does Decompute change Claude's answer?
Compression changes what the model sees, so it can affect output. Decompute is designed to preserve answer quality — syntax-aware code compression keeps signatures, types, and changed nodes, and references are reversible — but you should measure on your own workloads. The dashboard makes that comparison easy.
Can I run Decompute locally?
Yes. Local-first is the default: the gateway runs on 127.0.0.1 and hosted mode is optional.
Where does my code go?
Originals stay on your machine or in your encrypted reference store. Only the compressed request is forwarded to Anthropic or the provider you configure. In local mode, nothing is sent to Decompute's servers.
How do I measure Claude savings?
Every request logs raw tokens versus sent tokens; savings = 1 − sent ÷ raw. The dashboard breaks it down by workload and engine, and decompute usage prints it in your terminal.
Is Decompute affiliated with Anthropic?
No. Decompute is an independent cost-control gateway. Anthropic, Claude, and Claude Code are trademarks of Anthropic; Decompute is not affiliated with or endorsed by Anthropic.

Stop paying Claude
to reread logs.

Run Decompute locally for a week — or send a week of usage logs — and get a free Claude savings report for your team: token savings, dollar savings, and which workloads compress best.

Prefer to start now? Install the local gateway → · or estimate savings yourself →