A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
One command to cut token usage by up to 50%+
One command to cut token usage by up to 50%+.
English · 简体中文 · 日本語 · Français · Español
Before / After · Quick start · Install · Profiles · Benchmarks & Evidence · Roadmap · Full benchmark guide
npm install -g github:MaxForAI/Tokenless
tokenless repair-hooks --user
tokenless install-commands --user
tokenless launch
Then use Claude Code normally. Switch profiles anytime:
tokenless style chat # readable, shorter replies
tokenless style coding # dense coding output
tokenless style off # full hard-off
Claude Code gets expensive when every log, file read, diff, and long reply keeps getting carried into the next request.
Tokenless fixes that.
It keeps the raw evidence on your machine, sends Claude a compact version, and lets you expand the original output only when you need it.
| Normal Claude Code | Claude Code with Tokenless |
|---|---|
| Reads a large file or log into future context repeatedly. | Stores the raw output locally and sends a compact packet. |
| Verbose final replies become part of the next request history. | chat and coding profiles keep replies short. |
| Agent trajectory can grow through repeated exploration and task-plan history. | Launcher trims Task/Plan tools by default; packets reduce large read context. |
Example large-read replacement:
| Raw context | Tokenless context |
|---|---|
| Full file/log output is carried through API requests. | TOKENLESS-READ-PACKET/0.1 with artifact id, imports, symbols, snippets, nearby files, and exact expansion commands. |
Claude Code sessions can become expensive because tool outputs, file reads, task-plan history, and verbose assistant replies are repeatedly carried through future API requests. Tokenless targets three sources of growth:
chat and coding profiles reduce assistant output tokens.Tokenless has two evidence layers: real Claude Code API-body measurements, and external research showing why shorter, denser context can reduce cost without automatically reducing quality.
These are API-body measurements from actual Claude Code sessions. The main metric is estimated request-body or response-body tokens from raw API logs, not local hook-side savings estimates.
| Scenario | Baseline | Tokenless | Reduction |
|---|---|---|---|
5-turn CRM vibe coding, off vs coding | 4,697,867 request tokens | 2,476,391 | 47.3% |
6-turn natural conversation, off vs chat | 7,223 response tokens | 1,442 | 80.0% |
| Large CSS visual edit | 1,017,642 request tokens | 403,995-473,354 | ~54-60% |
| 10k-line React/TSX edit | 917,137 request tokens | 545,456 | 40.5% |
| Multifile React dashboard | 628,261 request tokens | 512,521 | 18.4% |
| Task/Plan tools enabled vs default launcher | 1,524,894 request tokens | 1,087,753 | 28.7% |
The strongest current product benchmark is the 5-turn CRM vibe-coding run: a non-specialist user gave vague iterative product-polish prompts. The public coding profile reduced request tokens by 47.3%, response tokens by 44.4%, and request count by 39.3% versus clean off.
The clean natural-conversation run isolates chat: no file tools or packet reducers were involved, and response tokens dropped by 80.0%.
Detailed methodology and raw run notes are in docs/benchmarking.md and docs/style-benchmark.md.
The research does not prove Tokenless automatically helps every session. It supports the benchmark premise: context and response length are controllable engineering variables, and less text can sometimes be cheaper, faster, and more accurate.
| Paper | Why it matters for Tokenless |
|---|---|
| Brevity Constraints Reverse Performance Hierarchies in Language Models | Brevity constraints improved large-model accuracy by 26.3 percentage points on inverse-scaling problems. Verbose is not always better. |
| Prompt Compression in the Wild | Prompt compression can deliver real end-to-end speedups when workload, compression ratio, and hardware match; quality can remain statistically unchanged. |
| LLMLingua | Prompt compression can reduce inference cost while preserving semantic integrity under high compression ratios. |
| LongLLMLingua | Long-context compression can improve key-information perception while reducing cost and latency. |
| Selective Context | Pruning redundant context reported 50% context-cost reduction, 36% memory reduction, and 32% inference-time reduction with minor quality loss. |
| Gist Tokens | Learned prompt compression reached up to 26x prompt compression and up to 40% FLOPs reduction. |
Tokenless currently focuses on Claude Code context growth from tool output, file reads, and response verbosity. Next areas:
Install from GitHub:
npm install -g github:MaxForAI/Tokenless
tokenless repair-hooks --user
tokenless launch
Tokenless is currently distributed through GitHub. It has not been published to the public npm registry yet.
For local development from a checkout:
git clone https://github.com/MaxForAI/Tokenless.git
cd Tokenless
npm install
npm link
tokenless repair-hooks --user
tokenless launch
If Claude Code is not available as claude on your PATH, set CLAUDE_BIN:
CLAUDE_BIN=/path/to/claude tokenless launch
Check installation status:
tokenless status --user
Tokenless has three public profiles:
| Profile | Behavior |
|---|---|
chat | Default. Short, readable natural-language responses. Only changes output style. |
coding | Dense structured responses for coding workflows. Only changes output style. |
off | Full Tokenless hard-off. Disables style injection and compression hooks. |
Set a profile:
tokenless style chat
tokenless style coding
tokenless style off
Claude Code slash command shortcuts:
/tokenless-style-chat
/tokenless-style-coding
/tokenless-style-off
The selected profile is stored at ~/.tokenless/style.json by default and persists across Claude Code restarts.
TOKENLESS_MODE=off remains available as an environment-level hard-off switch for benchmark runs.
For noisy commands and large outputs:
Claude requests a noisy tool call
-> Tokenless intercepts it with Claude Code hooks
-> The original command or output is processed locally
-> Raw stdout/stderr or file content is saved as an artifact
-> Claude receives a compact TOKENLESS-* packet
-> Claude expands only the relevant artifact slice if needed
Example read packet:
TOKENLESS-READ-PACKET/0.1
file: /path/to/src/App.tsx
artifact_id: ctx_20260518_abc123
summary: large TSX source packet with imports, declarations, snippets, and nearby files
Expand raw evidence when needed:
tokenless latest --data-dir ~/.tokenless
tokenless expand latest --around "DashboardShell" --data-dir ~/.tokenless
tokenless expand latest --lines 120:170 --data-dir ~/.tokenless
The shorter acc command remains available as a compatibility alias. tokenless is the preferred public command.
Tokenless currently handles:
npm test, pytest, go test, cargo test.npm run build, docker build, kubectl logs.git diff, git log.rg, grep -R, find, tree, ls -R.TOKENLESS-EDIT-PACKET and TOKENLESS-WRITE-PACKET.Small bounded commands, such as rg -m 20, find ... | head, cat file | grep, and tree | head, pass through directly. Small reads are not compressed by default.
Read packets cap large low-risk file reads as TOKENLESS-READ-PACKET/0.1.
Default policy:
.go, .rs, .java, .swift, or .cpp by default.Read packets are indexes, not edit proof. If Claude needs exact code or style, it should expand the relevant lines first:
tokenless expand ctx_abc --around ".target-card" --data-dir ~/.tokenless
tokenless expand ctx_abc --lines 520:535 --data-dir ~/.tokenless
For CSS, SCSS, Less, HTML, HTM, and SVG files, read packets include deterministic editable summaries: variables, color palettes, likely editable selectors, media and animation rules, sections, ids/classes, interactive elements, assets, and headings.
Large-file gate behavior:
tokenless read --agent --data-dir ~/.tokenless <file>.Tokenless can cap large successful Edit, MultiEdit, and low-risk Write tool results.
The policy is intentionally conservative:
Edit / MultiEdit output over roughly 3,000 estimated tokens can become TOKENLESS-EDIT-PACKET/0.1.Write output over roughly 5,000 estimated tokens can become TOKENLESS-WRITE-PACKET/0.1.Write outputs are not compressed by default.Risk signals that pass through unchanged include:
Error
Failed
old_string
not found
multiple matches
ambiguous
permission denied
conflict
No changes
partial
Edit/write packets do not claim semantic correctness. They only confirm the tool completed successfully, preserve the raw artifact, and mark previous read packets for that file as stale.
tokenless launch starts Claude Code with normal read, edit, write, and bash tools available, but disables high-overhead Task/Plan tools by default:
TaskCreate, TaskUpdate, TaskList, TaskGet, EnterPlanMode, ExitPlanMode
This reduces fixed tool-schema overhead and prevents task-list history from being repeatedly carried through API request context.
Opt back into Task/Plan tools for a session:
TOKENLESS_ALLOW_TASK_TOOLS=1 tokenless launch
Install user-level Claude Code slash commands:
tokenless install-commands --user
Installed commands:
/tokenless
/tokenless-style-chat
/tokenless-style-coding
/tokenless-style-off
/tokenless shows hook status, active mode, profile, savings, packet counts, pending gates, and latest artifact.
Restart Claude Code after installing slash commands. If you previously installed older Tokenless commands, clean them up with:
tokenless uninstall-commands --user
tokenless install-commands --user
tokenless --help
tokenless status --user
tokenless latest --data-dir ~/.tokenless
tokenless list --data-dir ~/.tokenless
tokenless stats --data-dir ~/.tokenless
tokenless show latest --data-dir ~/.tokenless
tokenless expand latest --around "Cannot find module" --data-dir ~/.tokenless
tokenless clean --data-dir ~/.tokenless --keep 100 --dry-run
tokenless style status
tokenless style coding
tokenless api-usage --since 24h
tokenless stats separates local savings by source:
hook: real Claude Code hook-path compression.eval: local evaluation fixtures.smoke: manual probes, doctor checks, and ad hoc compression runs.legacy: records created before source tagging.For API-body verification, enable raw Claude Code API body capture:
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_LOG_RAW_API_BODIES="file:<api-body-dir>"
Inspect captured request/response bodies:
node plugins/claude-code/bin/tokenless api-probe stats \
--dir "<api-body-dir>" \
--data-dir ~/.tokenless
A valid clean off run must show:
TOKENLESS-READ-PACKET: request=0
TOKENLESS-EDIT-PACKET: request=0
TOKENLESS-WRITE-PACKET: request=0
request_saved_estimate: 0
Raw API bodies can contain full prompts, tool outputs, and sensitive local context. Keep capture disabled unless you are verifying what enters model context.
Run the main smoke checks:
npm run eval:complex
npm run eval:read
npm run eval:edit
npm run eval:cli-smoke
npm run doctor
Run all synthetic and captured cases:
npm run eval:all
Check hook install state:
npm run tokenless:status
npm run tokenless:install:dry-run
npm run tokenless:repair-hooks:dry-run
npm run tokenless:uninstall:dry-run
Test the actual Claude Code hook path inside Claude Code:
npm run test:complex
Expected behavior:
PreToolUse caps the noisy command
Claude reruns node .../bin/acc run --agent ...
Claude receives TOKENLESS-PACKET/0.1
Raw artifact can be expanded with tokenless expand latest
Artifacts are stored under the selected --data-dir, usually ~/.tokenless/artifacts.
Preview cleanup:
npm run tokenless:clean:dry-run
Delete old artifacts:
tokenless clean --data-dir ~/.tokenless --older-than 7d
Keep only the newest 100 artifacts:
tokenless clean --data-dir ~/.tokenless --keep 100
Tokenless saves tokens and keeps the raw evidence. Star costs zero. Fair trade.
Tokenless is an independent implementation and does not use code from caveman. We still want to acknowledge caveman as part of the broader open-source conversation around shorter, lower-token agent output.
MIT. Free to use, modify, and ship.
A Claude Code skill by Hao (駱君昊) that learns your Facebook voice and auto-posts to FB / IG / Threads / X with a 14-day c
1000+ skills curated from Anthropic, Vercel, Stripe, and other engineering teams
Human + AI music production workflow for Suno - skills, templates, and tools
Claude Code skill for YouTube creators — channel audits, video SEO, retention scripts, thumbnails, content strategy, Sho