
Who Caveman is for#
Developers running long daily Claude Code sessions
Developers who spend several hours per day asking Claude Code to debug, refactor, and explain code. Caveman cuts the running token cost by an average of 65% and shortens per-response wait times since less text is generated per turn.
Skip if:
Your sessions are short or infrequent. For a developer doing 10-minute tasks once a day, the install and activation overhead does not pay back in meaningful savings.
Engineering teams managing a shared API token budget
Teams on a shared Claude or Cursor plan who track monthly token consumption against spend targets. /caveman-stats provides a real token count to report, and the savings compound across the whole team if all members activate caveman in their daily sessions.
Skip if:
Your team's usage is well within plan limits and API cost is not a concern. The optimization adds a setup step without a clear return.
Developers maintaining large CLAUDE.md or project-notes files
Developers who maintain large context files that load into every AI session. caveman-compress rewrites those files into compact form, cutting approximately 46% of input tokens at the source so every future session starts with a smaller context.
Skip if:
Your context files are already under 200 words. The compression gain on short files is marginal and may make the files harder for a human to read and maintain.
Engineers building multi-agent agentic workflows
Engineers running cavecrew subagents (investigator, builder, reviewer) in agentic loops where main context depth matters across many sequential calls. Caveman keeps each turn's output compact, extending the effective usable context window before hitting token limits.
Skip if:
Your workflow depends on detailed, human-readable agent output logs for auditing or debugging purposes. Heavily compressed agent traces are harder to inspect when something goes wrong.
The problem it solves#
AI coding agents are verbose by default. A simple bug fix explanation that contains 5 tokens of actual information often arrives wrapped in 50-100 tokens of conversational filler: "Sure, I'd be happy to help with that. The issue you're experiencing is most likely caused by..." Every session with an AI coding agent costs real money in output tokens, and those costs compound fast when agents loop, explain their reasoning, and acknowledge every step.
For teams using Claude Code, Cursor, or Windsurf for hours each day, the gap between what an agent needs to say and what it actually says is a direct line item on the API bill. There is no built-in control to suppress filler without writing a custom system prompt that degrades inconsistently across session turns.
How it solves it#
Four compression levels
Switch between lite (drop filler only), full (default caveman fragments), ultra (telegraphic), and wenyan (classical Chinese, shortest) with one command per session. Levels persist until the session ends. The ultra level benchmarks at 87% token reduction on tasks like React re-render explanations.
30+ agent support via one installer
Works with Claude Code, Codex, Gemini (built-in auto-activate on every session), Cursor, Windsurf, Cline, GitHub Copilot, and 30+ other AI coding agents. One curl command detects installed agents and configures each one. The installer is safe to re-run and skips agents you do not have installed.
caveman-compress for memory file shrinkage
Rewrites context files like CLAUDE.md and project notes into caveman-style compressed form, cutting an average of 46% of input tokens per benchmarked receipts across five real memory files. Savings apply to every future session, not just the active one. Code, URLs, and file paths are preserved byte-for-byte.
/caveman-stats with lifetime token tracking
Reads the actual Claude Code session log to count tokens saved, displays a running lifetime total, and generates a tweetable summary via the --share flag. Savings appear as a statusline badge in Claude Code showing cumulative tokens saved since install. Gives teams a real number to report against API spend targets.
Companion subagents for multi-agent workflows
cavecrew subagents (investigator, builder, reviewer) use caveman output compression and run approximately 60% fewer tokens than vanilla agents per the README. Keeps the main conversation context from filling up in long agentic sessions with multiple sequential tool calls.
Strengths and trade-offs#
Strengths
- Benchmarked reduction, not claimedThe evaluation harness compares caveman against a terse "Answer concisely." baseline, not against verbose default mode, so the measured 65% average token reduction is a conservative and honest delta. Benchmarks cover 10 diverse coding tasks with a reported 100% task accuracy maintained across all of them.
- MIT license, no subscription or binaryThe skill is a plain text file dropped into the agent config directory. No binary, no subscription, no API call required to run it. MIT licensed, so you can modify the compression levels, fork the install script, or redistribute it without restriction.
- Input token savings through memory compressionUnlike a "be brief" system prompt that only shrinks output, caveman-compress rewrites CLAUDE.md and project notes so the savings recur at the start of every session. Benchmark receipts in the README show 36-59% reductions across five real memory files, with an average of 46%.
- Works across mixed-agent teams without per-agent configA team running Claude Code, Cursor, and Copilot simultaneously can configure caveman across all three with the same curl command. The installer detects what is installed and skips what is not. No separate per-agent configuration work for teams with mixed toolchains.
Trade-offs
- -Thinking and reasoning tokens are not compressedCaveman only reduces visible output tokens. Models that spend most of their compute budget on internal reasoning steps (such as o1 or o3 series) see smaller overall cost savings since reasoning tokens are not affected. The README notes this explicitly: the tool makes the "mouth smaller," not the brain.
- -Ultra mode reduces readability for non-technical stakeholdersAt the ultra compression level, responses become telegraphic fragments suited for quick solo command-line work. Sharing those responses with product managers, designers, or other non-technical collaborators who expect full sentences is harder. Teams with mixed audiences should default to lite or full rather than ultra.
- -Per-session trigger required for Cursor, Windsurf, and Cline without --with-initClaude Code, Codex, and Gemini can auto-activate caveman on session start via a hook. Cursor, Windsurf, and Cline require typing /caveman at the start of each session unless installed using the --with-init flag, which injects always-on rule files instead. Without that flag it is easy to forget to activate on short sessions.
Caveman vs alternatives#
Caveman vs. Custom Brevity Rules in CLAUDE.md
Most developers who want shorter AI responses add a "be concise" line to their CLAUDE.md file or agent system prompt. This is the most common real-world alternative, not a paid product, because there is no dominant paid equivalent in this category.
The DIY approach has three gaps that caveman addresses. First, raw "be brief" instructions degrade across session turns as the agent drifts back to verbose defaults. Second, there is no compression level to tune: the instruction either applies or it does not. Third, it only affects output tokens; it does not reduce the input tokens consumed by large context files at session start.
Caveman compares directly against this baseline in its own benchmarks and shows a 65% average token reduction vs. the terse instruction's smaller, less consistent gain.
| Caveman | Custom "Be Brief" Rule | |
|---|---|---|
| License | MIT | N/A (your own text) |
| Compression measurement | Benchmarked (3-arm eval) | None |
| Compression levels | 4 (lite / full / ultra / wenyan) | None |
| Input token savings | Yes (caveman-compress) | No |
| Agent compatibility | 30+ agents, one installer | Per-agent manual config |
| Stats tracking | Yes (/caveman-stats) | No |
The DIY approach is the right call if you want zero setup and run AI agents infrequently. Caveman is worth installing when sessions are long, multiple agents are in use, or you need reliable consistent compression without rewriting system prompts for each tool in your stack.
Install and self-host#
# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
# Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iexWhat it's built on#
- Languages
- JavaScriptPython
FAQ#
Does Caveman reduce the accuracy or quality of AI responses?
Based on the benchmarks in the README, no. Caveman removes filler phrases and conversational padding, not technical substance. The three-arm evaluation harness compares caveman against a terse "Answer concisely." baseline and shows 65% fewer tokens with equivalent task accuracy across 10 coding prompts. A March 2026 paper cited in the README on brevity constraints in language models found that constrained-output responses improved accuracy by 26 points on certain benchmarks.
Which AI coding agents does Caveman work with?
Caveman works with Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, GitHub Copilot, and 30+ other agents. The installer detects which agents are on the machine and configures each one automatically. Claude Code, Codex, and Gemini auto-activate on every new session; other agents use a per-session /caveman command unless installed with the --with-init flag.
How do I install Caveman?
Run one curl command: curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash. Requires Node 18 or higher. Takes about 30 seconds and skips any agents you do not have installed. Windows users run the equivalent PowerShell command (irm ... | iex). Detailed per-agent instructions and flags are in the INSTALL.md file in the repository.
Does Caveman cost anything?
No. Caveman is MIT licensed and free to install and use. It saves money by reducing output token consumption by an average of 65%, which directly lowers API costs for developers on per-token plans for Claude, GPT-4o, or other LLMs. The caveman-compress feature adds input token savings on top of that.
What are the four compression levels and when should I use each?
"lite" drops filler phrases while keeping mostly complete sentences, good for documentation you will share or reviews with non-technical readers. "full" (the default) produces fragment-based caveman-style responses, the best balance of brevity and readability for solo development. "ultra" goes telegraphic, shortest possible output, best for quick answers in solo sessions. "wenyan" uses classical Chinese phrasing, the most token-dense option. Switch levels with /caveman lite, /caveman full, etc.
Similar open-source tools#
OpenMolt
Build programmatic AI agents in Node.js, open source
AI-Flow
Visually chain AI models and APIs into automated pipelines
Letta
Give your LLM agents persistent memory across every conversation
jcode
Next-gen coding agent harness for efficient workflows
9Router
Smart AI Router with 3-Tier Fallback
Tabby
Self-hosted AI coding assistant server for private team deployment

