Claude Token Savings for Claude Code

Why Claude Code token usage
grows fast.

Every request carries more than your prompt: large diffs, test output, logs, project files, MCP and tool results, and context repeated turn after turn. Costs scale with context size — so token usage compounds quietly until the bill arrives.

Illustrative anatomy of a long Claude Code session request. The green sliver is the part you typed.

How Decompute
saves Claude tokens.

Four layers of Claude API cost reduction, applied automatically to every request.

01 — COMPRESS

Context compression

Logs, diffs, JSON, and tool output are compressed before they're sent — fewer input tokens, same meaning.

02 — CACHE

Cache reuse optimization

Stable context stays byte-stable, so Anthropic prompt caching hits more often and more of your bill lands at the cheaper cache-read rate.

03 — REFERENCE

Reference replacement

Bulky blobs become opaque references. Originals wait in an encrypted store, retrieved only when Claude actually needs them.

04 — MEASURE

Per-request savings dashboard

Raw versus sent tokens on every request, by workload and engine — Claude Code cost optimization you can verify, not assume.

The stack.
Four engines.

Each engine handles a different shape of context — code, prose, structured data, and stable references. Together they reduce Claude Code token usage automatically, behind a single Anthropic-compatible endpoint.

rust_core active

The hot path. Native speed.

A Rust core wrapping decompute 0.22.5 with SmartCrusher and CodeCompressor — built for high-volume, low-latency gateway traffic.

decompute 0.22.5 · SmartCrusher · CodeCompressor

yscompress-v1 active

For prose and free-form output.

A YSCompress-v1 model on torch 2.12.0 handles logs, free-form tool output, and anything where structure is loose and signal is sparse.

torch 2.12.0 · YSCompress-v1 · prose & logs

ast_code_aware active

Syntax-aware. Not text-aware.

Backed by tree-sitter. Parses the actual syntax tree, keeps signatures, types, and changed nodes — drops what Claude doesn't need.

tree-sitter · multi-language · diff-aware

ccr_active active

Originals on hold, not on the wire.

Replace bulky blobs with opaque references. Originals live in an encrypted store, retrievable via the ccr_retrieve tool when actually needed.

encrypted store · ccr_retrieve tool · reversible

One env var.
That's the setup.

Point Claude Code at Decompute. The gateway compresses, caches, and routes — then forwards a slim payload to Anthropic. Same model. Same workflow. Designed to preserve answer quality, with a smaller bill.

from

Claude Code

your existing CLI
no workflow changes

84,000raw tokens

Decompute

YSCompress · Yellowstone · CCR

−78% on this request

to

Anthropic

same model
same workflow

18,700sent tokens

Every request.
Measured.

Live savings, by request, by workload, by capability. See exactly what you saved and which engine did the saving.

decompute.local — /dashboard · demo data

Decompute Gateway

Claude optimization through YSCompress context compression and Yellowstone routing. demo data — local gateway session

Savings

Latest request

93.2%

most recent request

Rolling median

82.1%

across 4 requests

Lifetime $ saved

$0.0544

101,529 tokens

Lifetime cost

$0.0534

4 requests

Savings by workload

Demo savings by workload: requests, raw tokens, sent tokens, and percent saved
workload	requests	raw tokens	sent tokens	savings
structured_json	4	131,224	29,695	77.4%
code_diff	11	284,910	52,318	81.6%
test_logs	6	96,440	9,210	90.5%

Compressor capabilities

rust_coreactive

decompute 0.22.5 importable; SmartCrusher/CodeCompressor active

yscompress-v1active

torch 2.12.0; YSCompress-v1 ML text compression available

ast_code_awareactive

tree_sitter present; source-code AST compression available

ccr_activeactive

CCR reference-replacement active (encrypted store + ccr_retrieve tool)

Local setup

export DECOMPUTE_BASE_URL=http://127.0.0.1:8080 export DECOMPUTE_API_KEY=dc_dev_local export ANTHROPIC_BASE_URL=http://127.0.0.1:8080

How savings are calculated

savings = 1 − sent_tokens ÷ raw_tokens raw_tokens — tokens before compression sent_tokens — tokens after compression + reference replacement $ saved — removed tokens priced at your model's input rate, cache reads at the cache-read rate

All numbers on this dashboard are demo data from a local gateway session. Actual Claude token savings vary by workload — code diffs, test logs, and structured JSON compress differently. Full methodology →

Prompt caching
and Decompute.

Already using Anthropic prompt caching? Keep it. They solve different halves of the Claude API cost problem.

ANTHROPIC

Prompt caching

Helps when repeated prompt prefixes stay stable across requests. Cache reads are billed at a fraction of the base input-token rate, cache writes at a premium — so hit rate is everything.

DECOMPUTE

Context compression

Works before the request is sent: compresses noisy logs, diffs, JSON, and tool output, and keeps stable context stable — so requests are smaller and cache-friendlier.

Use both. Decompute reduces what you send and improves cache reuse — it complements Claude prompt caching, it doesn't replace it. Full comparison →

Claude Code setup.
Two minutes.

Decompute exposes an Anthropic-compatible gateway endpoint. Claude Code already supports routing through it — one ANTHROPIC_BASE_URL and you're done. Prefer the editor? The Decompute Claude Savings extension for VS Code sets this up for you.

01

Install & start the gateway

Runs locally. Originals stay on your machine; only the compressed request is forwarded to Anthropic or the provider you configure.

02

Set three env vars

Claude Code already supports gateway routing via ANTHROPIC_BASE_URL.

03

Open the dashboard

Latest request, rolling median, lifetime savings — by workload, by capability.

~/your-project

$ pip install decompute $ decompute gateway start ✓ gateway listening on 127.0.0.1:8080 ✓ rust_core · yscompress-v1 · ast_code_aware · ccr_active $ export DECOMPUTE_BASE_URL=http://127.0.0.1:8080 $ export DECOMPUTE_API_KEY=dc_dev_local $ export ANTHROPIC_BASE_URL=http://127.0.0.1:8080 $ claude # later... $ decompute usage latest: 93.2% · median: 82.1% · saved today: $6.10

Claude Code
cost calculator.

Estimate your Claude token savings and monthly Claude API cost reduction from your own numbers.

Monthly Claude spend ($)

Raw input tokens / mo (M)

Sent input tokens / mo (M)

Cache read share (%)

Model

Input price ($ / MTok)

Token savings

—

1 − sent ÷ raw

Est. saved per month

—

Of your Claude bill

share of current monthly spend

Estimates only. savings = 1 − sent_tokens ÷ raw_tokens · Removed tokens are priced at your input rate, with cache reads counted at ~0.1× base input. Model prices are editable defaults — verify current rates on Anthropic's pricing page. Output tokens are unaffected by compression and excluded here.

Local-first.
By design.

A cost-control gateway shouldn't become a data liability. Here is exactly where your context goes.

Local gateway

The gateway runs on 127.0.0.1 by default. Hosted mode exists for teams that want shared dashboards — it's optional, never required.

Encrypted reference store

Blobs replaced by CCR references live in an encrypted store on your side, retrievable via the ccr_retrieve tool only when a request actually needs them.

Honest data flow

Compressed requests are forwarded to Anthropic or the Anthropic-compatible provider you configure. Decompute does not host Claude; in local mode, nothing is sent to Decompute's servers.

Your keys stay yours

Your Anthropic API key stays in your environment. Decompute passes requests through to your configured endpoint — billing and account stay with you.

Pricing.
Flat, never usage.

Aligned with cost control. The more we save you, the better the deal gets.

Free

For individual developers.

$0

Local gateway
YSCompress preview
7 days of usage
Community support

Start free

Pro Popular

For serious Claude Code users.

$15 /mo

Full local history
Advanced dashboard
CCR encrypted store
Savings reports

Get Pro

Enterprise

For platform teams.

Custom

SSO / SAML
VPC / on-prem
Yellowstone routing
Security review

Talk to us

Flat pricing, never metered on usage. If we save you less than your subscription in your first month, we extend your trial.

Questions,
answered.

Does Decompute reduce Claude Code token usage?

Yes. The gateway compresses logs, diffs, JSON, and tool output, and replaces bulky blobs with references before each request reaches the Anthropic API. On the demo workloads above, noisy context compressed 77–90% — your results depend on workload mix.

How does Decompute save Claude tokens?

Four mechanisms: context compression, cache reuse optimization, reference replacement for bulky blobs, and a per-request savings dashboard so you can verify the first three.

Does Decompute replace Claude prompt caching?

No — it complements it. Prompt caching helps when repeated prompt prefixes stay stable. Decompute makes each request smaller before it's sent and keeps stable context byte-stable, which raises your cache hit rate.

Does Decompute work with ANTHROPIC_BASE_URL?

Yes. Claude Code supports routing requests through a gateway via the ANTHROPIC_BASE_URL environment variable, and Decompute exposes an Anthropic-compatible /v1/messages endpoint. Setup is one export.

Does Decompute change Claude's answer?

Compression changes what the model sees, so it can affect output. Decompute is designed to preserve answer quality — syntax-aware code compression keeps signatures, types, and changed nodes, and references are reversible — but you should measure on your own workloads. The dashboard makes that comparison easy.

Can I run Decompute locally?

Yes. Local-first is the default: the gateway runs on 127.0.0.1 and hosted mode is optional.

Where does my code go?

Originals stay on your machine or in your encrypted reference store. Only the compressed request is forwarded to Anthropic or the provider you configure. In local mode, nothing is sent to Decompute's servers.

How do I measure Claude savings?

Every request logs raw tokens versus sent tokens; savings = 1 − sent ÷ raw. The dashboard breaks it down by workload and engine, and decompute usage prints it in your terminal.

Is Decompute affiliated with Anthropic?

No. Decompute is an independent cost-control gateway. Anthropic, Claude, and Claude Code are trademarks of Anthropic; Decompute is not affiliated with or endorsed by Anthropic.

Cut Claude Code token usage.
Keep your workflow.

Why Claude Code token usage
grows fast.

How Decompute
saves Claude tokens.

Context compression

Cache reuse optimization

Reference replacement

Per-request savings dashboard

The stack.
Four engines.

The hot path. Native speed.

For prose and free-form output.

Syntax-aware. Not text-aware.

Originals on hold, not on the wire.

One env var.
That's the setup.

Every request.
Measured.

Prompt caching
and Decompute.

Prompt caching

Context compression

Claude Code setup.
Two minutes.

Install & start the gateway

Set three env vars

Open the dashboard

Claude Code
cost calculator.

Local-first.
By design.

Local gateway

Encrypted reference store

Honest data flow

Your keys stay yours

Pricing.
Flat, never usage.

Questions,
answered.

Stop paying Claude
to reread logs.

Why Claude Code token usagegrows fast.

How Decomputesaves Claude tokens.

Context compression

Cache reuse optimization

Reference replacement

Per-request savings dashboard

The stack.Four engines.

The hot path. Native speed.

For prose and free-form output.

Syntax-aware. Not text-aware.

Originals on hold, not on the wire.

One env var.That's the setup.

Every request.Measured.

Prompt cachingand Decompute.

Prompt caching

Context compression

Claude Code setup.Two minutes.

Install & start the gateway

Set three env vars

Open the dashboard

Claude Codecost calculator.

Local-first.By design.

Local gateway

Encrypted reference store

Honest data flow

Your keys stay yours

Pricing.Flat, never usage.

Questions,answered.

Stop paying Claudeto reread logs.

Why Claude Code token usage
grows fast.

How Decompute
saves Claude tokens.

The stack.
Four engines.

One env var.
That's the setup.

Every request.
Measured.

Prompt caching
and Decompute.

Claude Code setup.
Two minutes.

Claude Code
cost calculator.

Local-first.
By design.

Pricing.
Flat, never usage.

Questions,
answered.

Stop paying Claude
to reread logs.