Open Source Alternatives

Stay Updated

Subscribe to our newsletter for the latest news and updates about Alternatives

Open Source Alternatives

Alternatives Blog Advertise

Open Source Alternatives

Caveman

Open source alternative to TokenCrush, WrangleAI and Vellum AI

Token-slashing caveman-speak for cheaper, faster AI code.

66.1K starsJavaScriptMITActive this month

Visit website GitHub repo

who it's for

Who Caveman is for#

Developers running long daily Claude Code sessions

Developers who spend several hours per day asking Claude Code to debug, refactor, and explain code. Caveman cuts the running token cost by an average of 65% and shortens per-response wait times since less text is generated per turn.

Skip if:

Your sessions are short or infrequent. For a developer doing 10-minute tasks once a day, the install and activation overhead does not pay back in meaningful savings.

Engineering teams managing a shared API token budget

Teams on a shared Claude or Cursor plan who track monthly token consumption against spend targets. /caveman-stats provides a real token count to report, and the savings compound across the whole team if all members activate caveman in their daily sessions.

Skip if:

Your team's usage is well within plan limits and API cost is not a concern. The optimization adds a setup step without a clear return.

Developers maintaining large CLAUDE.md or project-notes files

Developers who maintain large context files that load into every AI session. caveman-compress rewrites those files into compact form, cutting approximately 46% of input tokens at the source so every future session starts with a smaller context.

Skip if:

Your context files are already under 200 words. The compression gain on short files is marginal and may make the files harder for a human to read and maintain.

Engineers building multi-agent agentic workflows

Engineers running cavecrew subagents (investigator, builder, reviewer) in agentic loops where main context depth matters across many sequential calls. Caveman keeps each turn's output compact, extending the effective usable context window before hitting token limits.

Skip if:

Your workflow depends on detailed, human-readable agent output logs for auditing or debugging purposes. Heavily compressed agent traces are harder to inspect when something goes wrong.

the problem

The problem it solves#

AI coding agents are verbose by default. A simple bug fix explanation that contains 5 tokens of actual information often arrives wrapped in 50-100 tokens of conversational filler: "Sure, I'd be happy to help with that. The issue you're experiencing is most likely caused by..." Every session with an AI coding agent costs real money in output tokens, and those costs compound fast when agents loop, explain their reasoning, and acknowledge every step.

For teams using Claude Code, Cursor, or Windsurf for hours each day, the gap between what an agent needs to say and what it actually says is a direct line item on the API bill. There is no built-in control to suppress filler without writing a custom system prompt that degrades inconsistently across session turns.

how Caveman solves it

How it solves it#

Four compression levels

Switch between lite (drop filler only), full (default caveman fragments), ultra (telegraphic), and wenyan (classical Chinese, shortest) with one command per session. Levels persist until the session ends. The ultra level benchmarks at 87% token reduction on tasks like React re-render explanations.

30+ agent support via one installer

Works with Claude Code, Codex, Gemini (built-in auto-activate on every session), Cursor, Windsurf, Cline, GitHub Copilot, and 30+ other AI coding agents. One curl command detects installed agents and configures each one. The installer is safe to re-run and skips agents you do not have installed.

caveman-compress for memory file shrinkage

Rewrites context files like CLAUDE.md and project notes into caveman-style compressed form, cutting an average of 46% of input tokens per benchmarked receipts across five real memory files. Savings apply to every future session, not just the active one. Code, URLs, and file paths are preserved byte-for-byte.

/caveman-stats with lifetime token tracking

Reads the actual Claude Code session log to count tokens saved, displays a running lifetime total, and generates a tweetable summary via the --share flag. Savings appear as a statusline badge in Claude Code showing cumulative tokens saved since install. Gives teams a real number to report against API spend targets.

Companion subagents for multi-agent workflows

cavecrew subagents (investigator, builder, reviewer) use caveman output compression and run approximately 60% fewer tokens than vanilla agents per the README. Keeps the main conversation context from filling up in long agentic sessions with multiple sequential tool calls.

strengths · trade-offs

Strengths and trade-offs#

Strengths

Benchmarked reduction, not claimedThe evaluation harness compares caveman against a terse "Answer concisely." baseline, not against verbose default mode, so the measured 65% average token reduction is a conservative and honest delta. Benchmarks cover 10 diverse coding tasks with a reported 100% task accuracy maintained across all of them.
MIT license, no subscription or binaryThe skill is a plain text file dropped into the agent config directory. No binary, no subscription, no API call required to run it. MIT licensed, so you can modify the compression levels, fork the install script, or redistribute it without restriction.
Input token savings through memory compressionUnlike a "be brief" system prompt that only shrinks output, caveman-compress rewrites CLAUDE.md and project notes so the savings recur at the start of every session. Benchmark receipts in the README show 36-59% reductions across five real memory files, with an average of 46%.
Works across mixed-agent teams without per-agent configA team running Claude Code, Cursor, and Copilot simultaneously can configure caveman across all three with the same curl command. The installer detects what is installed and skips what is not. No separate per-agent configuration work for teams with mixed toolchains.

Trade-offs

-Thinking and reasoning tokens are not compressedCaveman only reduces visible output tokens. Models that spend most of their compute budget on internal reasoning steps (such as o1 or o3 series) see smaller overall cost savings since reasoning tokens are not affected. The README notes this explicitly: the tool makes the "mouth smaller," not the brain.
-Ultra mode reduces readability for non-technical stakeholdersAt the ultra compression level, responses become telegraphic fragments suited for quick solo command-line work. Sharing those responses with product managers, designers, or other non-technical collaborators who expect full sentences is harder. Teams with mixed audiences should default to lite or full rather than ultra.
-Per-session trigger required for Cursor, Windsurf, and Cline without --with-initClaude Code, Codex, and Gemini can auto-activate caveman on session start via a hook. Cursor, Windsurf, and Cline require typing /caveman at the start of each session unless installed using the --with-init flag, which injects always-on rule files instead. Without that flag it is easy to forget to activate on short sessions.

versus alternatives

Caveman vs alternatives#

Caveman vs. Custom Brevity Rules in CLAUDE.md

Most developers who want shorter AI responses add a "be concise" line to their CLAUDE.md file or agent system prompt. This is the most common real-world alternative, not a paid product, because there is no dominant paid equivalent in this category.

The DIY approach has three gaps that caveman addresses. First, raw "be brief" instructions degrade across session turns as the agent drifts back to verbose defaults. Second, there is no compression level to tune: the instruction either applies or it does not. Third, it only affects output tokens; it does not reduce the input tokens consumed by large context files at session start.

Caveman compares directly against this baseline in its own benchmarks and shows a 65% average token reduction vs. the terse instruction's smaller, less consistent gain.

	Caveman	Custom "Be Brief" Rule
License	MIT	N/A (your own text)
Compression measurement	Benchmarked (3-arm eval)	None
Compression levels	4 (lite / full / ultra / wenyan)	None
Input token savings	Yes (caveman-compress)	No
Agent compatibility	30+ agents, one installer	Per-agent manual config
Stats tracking	Yes (/caveman-stats)	No

The DIY approach is the right call if you want zero setup and run AI agents infrequently. Caveman is worth installing when sessions are long, multiple agents are in use, or you need reliable consistent compression without rewriting system prompts for each tool in your stack.

install · self-host

Install and self-host#

bash

# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

tech stack · detected from GitHub

What it's built on#

Languages: JavaScriptPython

frequently asked

FAQ#

Does Caveman reduce the accuracy or quality of AI responses?

Based on the benchmarks in the README, no. Caveman removes filler phrases and conversational padding, not technical substance. The three-arm evaluation harness compares caveman against a terse "Answer concisely." baseline and shows 65% fewer tokens with equivalent task accuracy across 10 coding prompts. A March 2026 paper cited in the README on brevity constraints in language models found that constrained-output responses improved accuracy by 26 points on certain benchmarks.

Which AI coding agents does Caveman work with?

Caveman works with Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, GitHub Copilot, and 30+ other agents. The installer detects which agents are on the machine and configures each one automatically. Claude Code, Codex, and Gemini auto-activate on every new session; other agents use a per-session /caveman command unless installed with the --with-init flag.

How do I install Caveman?

Run one curl command: curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash. Requires Node 18 or higher. Takes about 30 seconds and skips any agents you do not have installed. Windows users run the equivalent PowerShell command (irm ... | iex). Detailed per-agent instructions and flags are in the INSTALL.md file in the repository.

Does Caveman cost anything?

No. Caveman is MIT licensed and free to install and use. It saves money by reducing output token consumption by an average of 65%, which directly lowers API costs for developers on per-token plans for Claude, GPT-4o, or other LLMs. The caveman-compress feature adds input token savings on top of that.

What are the four compression levels and when should I use each?

"lite" drops filler phrases while keeping mostly complete sentences, good for documentation you will share or reviews with non-technical readers. "full" (the default) produces fragment-based caveman-style responses, the best balance of brevity and readability for solo development. "ultra" goes telegraphic, shortest possible output, best for quick answers in solo sessions. "wenyan" uses classical Chinese phrasing, the most token-dense option. Switch levels with /caveman lite, /caveman full, etc.

also worth a look

Similar open-source tools#

OpenMolt

Build programmatic AI agents in Node.js, open source

34TypeScriptMIT

AI-Flow

Visually chain AI models and APIs into automated pipelines

283TypeScriptMIT

Letta

Give your LLM agents persistent memory across every conversation

2.5KTypeScriptApache-2.0

jcode

Next-gen coding agent harness for efficient workflows

6KRustMIT

9Router

Smart AI Router with 3-Tier Fallback

9.8KJavaScriptMIT

Tabby

Self-hosted AI coding assistant server for private team deployment

33.6KRustApache-2.0

Stay Updated

Subscribe to our newsletter for the latest news and updates about Alternatives

Caveman

Open source alternative to TokenCrush, WrangleAI and Vellum AI

Token-slashing caveman-speak for cheaper, faster AI code.

66.1K starsJavaScriptMITActive this month

Visit website GitHub repo

who it's for

Who Caveman is for#

Developers running long daily Claude Code sessions

Skip if:

Your sessions are short or infrequent. For a developer doing 10-minute tasks once a day, the install and activation overhead does not pay back in meaningful savings.

Engineering teams managing a shared API token budget

Skip if:

Your team's usage is well within plan limits and API cost is not a concern. The optimization adds a setup step without a clear return.

Developers maintaining large CLAUDE.md or project-notes files

Skip if:

Your context files are already under 200 words. The compression gain on short files is marginal and may make the files harder for a human to read and maintain.

Engineers building multi-agent agentic workflows

Skip if:

Your workflow depends on detailed, human-readable agent output logs for auditing or debugging purposes. Heavily compressed agent traces are harder to inspect when something goes wrong.

the problem

The problem it solves#

how Caveman solves it

How it solves it#

Four compression levels

30+ agent support via one installer

caveman-compress for memory file shrinkage

/caveman-stats with lifetime token tracking

Companion subagents for multi-agent workflows

strengths · trade-offs

Strengths and trade-offs#

Strengths

Benchmarked reduction, not claimedThe evaluation harness compares caveman against a terse "Answer concisely." baseline, not against verbose default mode, so the measured 65% average token reduction is a conservative and honest delta. Benchmarks cover 10 diverse coding tasks with a reported 100% task accuracy maintained across all of them.
MIT license, no subscription or binaryThe skill is a plain text file dropped into the agent config directory. No binary, no subscription, no API call required to run it. MIT licensed, so you can modify the compression levels, fork the install script, or redistribute it without restriction.
Input token savings through memory compressionUnlike a "be brief" system prompt that only shrinks output, caveman-compress rewrites CLAUDE.md and project notes so the savings recur at the start of every session. Benchmark receipts in the README show 36-59% reductions across five real memory files, with an average of 46%.
Works across mixed-agent teams without per-agent configA team running Claude Code, Cursor, and Copilot simultaneously can configure caveman across all three with the same curl command. The installer detects what is installed and skips what is not. No separate per-agent configuration work for teams with mixed toolchains.

Trade-offs

-Thinking and reasoning tokens are not compressedCaveman only reduces visible output tokens. Models that spend most of their compute budget on internal reasoning steps (such as o1 or o3 series) see smaller overall cost savings since reasoning tokens are not affected. The README notes this explicitly: the tool makes the "mouth smaller," not the brain.
-Ultra mode reduces readability for non-technical stakeholdersAt the ultra compression level, responses become telegraphic fragments suited for quick solo command-line work. Sharing those responses with product managers, designers, or other non-technical collaborators who expect full sentences is harder. Teams with mixed audiences should default to lite or full rather than ultra.
-Per-session trigger required for Cursor, Windsurf, and Cline without --with-initClaude Code, Codex, and Gemini can auto-activate caveman on session start via a hook. Cursor, Windsurf, and Cline require typing /caveman at the start of each session unless installed using the --with-init flag, which injects always-on rule files instead. Without that flag it is easy to forget to activate on short sessions.

versus alternatives

Caveman vs alternatives#

Caveman vs. Custom Brevity Rules in CLAUDE.md

Caveman compares directly against this baseline in its own benchmarks and shows a 65% average token reduction vs. the terse instruction's smaller, less consistent gain.

	Caveman	Custom "Be Brief" Rule
License	MIT	N/A (your own text)
Compression measurement	Benchmarked (3-arm eval)	None
Compression levels	4 (lite / full / ultra / wenyan)	None
Input token savings	Yes (caveman-compress)	No
Agent compatibility	30+ agents, one installer	Per-agent manual config
Stats tracking	Yes (/caveman-stats)	No

install · self-host

Install and self-host#

bash

# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

tech stack · detected from GitHub

What it's built on#

Languages: JavaScriptPython

frequently asked

FAQ#

Does Caveman reduce the accuracy or quality of AI responses?

Which AI coding agents does Caveman work with?

How do I install Caveman?

Does Caveman cost anything?

What are the four compression levels and when should I use each?

also worth a look

Similar open-source tools#

OpenMolt

Build programmatic AI agents in Node.js, open source

34TypeScriptMIT

AI-Flow

Visually chain AI models and APIs into automated pipelines

283TypeScriptMIT

Letta

Give your LLM agents persistent memory across every conversation

2.5KTypeScriptApache-2.0

jcode

Next-gen coding agent harness for efficient workflows

6KRustMIT

9Router

Smart AI Router with 3-Tier Fallback

9.8KJavaScriptMIT

Tabby

Self-hosted AI coding assistant server for private team deployment

33.6KRustApache-2.0