Open Source Alternatives LogoOpen Source Alternatives
AlternativesBlogAdvertise
Open Source Alternatives LogoOpen Source Alternatives

Stay Updated

Subscribe to our newsletter for the latest news and updates about Alternatives

Open Source Alternatives LogoOpen Source Alternatives

Handpicked Open Source Alternatives to Paid Softwares

Product
  • Search
  • Categories
  • Tag
  • Sign In
Resources
  • Blog
  • Collection
  • Submit
  • Advertise your tool
Company
  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Sitemap
Copyright © 2026 All Rights Reserved.
Home/Categories/AI & Machine Learning/Caveman
icon of Caveman

Caveman

Open source alternative to TokenCrush, WrangleAI and Vellum AI

Token-slashing caveman-speak for cheaper, faster AI code.

66.1K starsJavaScriptMITActive this month
Visit websiteGitHub repo
image of Caveman
Contents
  1. 01Who Caveman is for
  2. 02The problem it solves
  3. 03How it solves it
  4. 04Strengths and trade-offs
  5. 05Caveman vs alternatives
  6. 06Install and self-host
  7. 07Tech stack
  8. 08FAQ
  9. 09Similar open-source tools
TL;DR

Caveman is an MIT-licensed skill that reduces Claude Code and AI agent output tokens by an average of 65% by switching them to compressed, fragment-based responses.MIT · JavaScript · 66.1K stars · Active this month

who it's for

Who Caveman is for#

Developers running long daily Claude Code sessions

Developers who spend several hours per day asking Claude Code to debug, refactor, and explain code. Caveman cuts the running token cost by an average of 65% and shortens per-response wait times since less text is generated per turn.

Skip if:

Your sessions are short or infrequent. For a developer doing 10-minute tasks once a day, the install and activation overhead does not pay back in meaningful savings.

Engineering teams managing a shared API token budget

Teams on a shared Claude or Cursor plan who track monthly token consumption against spend targets. /caveman-stats provides a real token count to report, and the savings compound across the whole team if all members activate caveman in their daily sessions.

Skip if:

Your team's usage is well within plan limits and API cost is not a concern. The optimization adds a setup step without a clear return.

Developers maintaining large CLAUDE.md or project-notes files

Developers who maintain large context files that load into every AI session. caveman-compress rewrites those files into compact form, cutting approximately 46% of input tokens at the source so every future session starts with a smaller context.

Skip if:

Your context files are already under 200 words. The compression gain on short files is marginal and may make the files harder for a human to read and maintain.

Engineers building multi-agent agentic workflows

Engineers running cavecrew subagents (investigator, builder, reviewer) in agentic loops where main context depth matters across many sequential calls. Caveman keeps each turn's output compact, extending the effective usable context window before hitting token limits.

Skip if:

Your workflow depends on detailed, human-readable agent output logs for auditing or debugging purposes. Heavily compressed agent traces are harder to inspect when something goes wrong.

the problem

The problem it solves#

AI coding agents are verbose by default. A simple bug fix explanation that contains 5 tokens of actual information often arrives wrapped in 50-100 tokens of conversational filler: "Sure, I'd be happy to help with that. The issue you're experiencing is most likely caused by..." Every session with an AI coding agent costs real money in output tokens, and those costs compound fast when agents loop, explain their reasoning, and acknowledge every step.

For teams using Claude Code, Cursor, or Windsurf for hours each day, the gap between what an agent needs to say and what it actually says is a direct line item on the API bill. There is no built-in control to suppress filler without writing a custom system prompt that degrades inconsistently across session turns.

how Caveman solves it

How it solves it#

Four compression levels

Switch between lite (drop filler only), full (default caveman fragments), ultra (telegraphic), and wenyan (classical Chinese, shortest) with one command per session. Levels persist until the session ends. The ultra level benchmarks at 87% token reduction on tasks like React re-render explanations.

30+ agent support via one installer

Works with Claude Code, Codex, Gemini (built-in auto-activate on every session), Cursor, Windsurf, Cline, GitHub Copilot, and 30+ other AI coding agents. One curl command detects installed agents and configures each one. The installer is safe to re-run and skips agents you do not have installed.

caveman-compress for memory file shrinkage

Rewrites context files like CLAUDE.md and project notes into caveman-style compressed form, cutting an average of 46% of input tokens per benchmarked receipts across five real memory files. Savings apply to every future session, not just the active one. Code, URLs, and file paths are preserved byte-for-byte.

/caveman-stats with lifetime token tracking

Reads the actual Claude Code session log to count tokens saved, displays a running lifetime total, and generates a tweetable summary via the --share flag. Savings appear as a statusline badge in Claude Code showing cumulative tokens saved since install. Gives teams a real number to report against API spend targets.

Companion subagents for multi-agent workflows

cavecrew subagents (investigator, builder, reviewer) use caveman output compression and run approximately 60% fewer tokens than vanilla agents per the README. Keeps the main conversation context from filling up in long agentic sessions with multiple sequential tool calls.

strengths · trade-offs

Strengths and trade-offs#

Strengths

  • Benchmarked reduction, not claimedThe evaluation harness compares caveman against a terse "Answer concisely." baseline, not against verbose default mode, so the measured 65% average token reduction is a conservative and honest delta. Benchmarks cover 10 diverse coding tasks with a reported 100% task accuracy maintained across all of them.
  • MIT license, no subscription or binaryThe skill is a plain text file dropped into the agent config directory. No binary, no subscription, no API call required to run it. MIT licensed, so you can modify the compression levels, fork the install script, or redistribute it without restriction.
  • Input token savings through memory compressionUnlike a "be brief" system prompt that only shrinks output, caveman-compress rewrites CLAUDE.md and project notes so the savings recur at the start of every session. Benchmark receipts in the README show 36-59% reductions across five real memory files, with an average of 46%.
  • Works across mixed-agent teams without per-agent configA team running Claude Code, Cursor, and Copilot simultaneously can configure caveman across all three with the same curl command. The installer detects what is installed and skips what is not. No separate per-agent configuration work for teams with mixed toolchains.

Trade-offs

  • -Thinking and reasoning tokens are not compressedCaveman only reduces visible output tokens. Models that spend most of their compute budget on internal reasoning steps (such as o1 or o3 series) see smaller overall cost savings since reasoning tokens are not affected. The README notes this explicitly: the tool makes the "mouth smaller," not the brain.
  • -Ultra mode reduces readability for non-technical stakeholdersAt the ultra compression level, responses become telegraphic fragments suited for quick solo command-line work. Sharing those responses with product managers, designers, or other non-technical collaborators who expect full sentences is harder. Teams with mixed audiences should default to lite or full rather than ultra.
  • -Per-session trigger required for Cursor, Windsurf, and Cline without --with-initClaude Code, Codex, and Gemini can auto-activate caveman on session start via a hook. Cursor, Windsurf, and Cline require typing /caveman at the start of each session unless installed using the --with-init flag, which injects always-on rule files instead. Without that flag it is easy to forget to activate on short sessions.
versus alternatives

Caveman vs alternatives#

Caveman vs. Custom Brevity Rules in CLAUDE.md

Most developers who want shorter AI responses add a "be concise" line to their CLAUDE.md file or agent system prompt. This is the most common real-world alternative, not a paid product, because there is no dominant paid equivalent in this category.

The DIY approach has three gaps that caveman addresses. First, raw "be brief" instructions degrade across session turns as the agent drifts back to verbose defaults. Second, there is no compression level to tune: the instruction either applies or it does not. Third, it only affects output tokens; it does not reduce the input tokens consumed by large context files at session start.

Caveman compares directly against this baseline in its own benchmarks and shows a 65% average token reduction vs. the terse instruction's smaller, less consistent gain.

CavemanCustom "Be Brief" Rule
LicenseMITN/A (your own text)
Compression measurementBenchmarked (3-arm eval)None
Compression levels4 (lite / full / ultra / wenyan)None
Input token savingsYes (caveman-compress)No
Agent compatibility30+ agents, one installerPer-agent manual config
Stats trackingYes (/caveman-stats)No

The DIY approach is the right call if you want zero setup and run AI agents infrequently. Caveman is worth installing when sessions are long, multiple agents are in use, or you need reliable consistent compression without rewriting system prompts for each tool in your stack.

install · self-host

Install and self-host#

bash
# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows (PowerShell 5.1+)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
tech stack · detected from GitHub

What it's built on#

Languages
JavaScriptPython
frequently asked

FAQ#

Does Caveman reduce the accuracy or quality of AI responses?

Based on the benchmarks in the README, no. Caveman removes filler phrases and conversational padding, not technical substance. The three-arm evaluation harness compares caveman against a terse "Answer concisely." baseline and shows 65% fewer tokens with equivalent task accuracy across 10 coding prompts. A March 2026 paper cited in the README on brevity constraints in language models found that constrained-output responses improved accuracy by 26 points on certain benchmarks.

Which AI coding agents does Caveman work with?

Caveman works with Claude Code, Codex, Gemini, Cursor, Windsurf, Cline, GitHub Copilot, and 30+ other agents. The installer detects which agents are on the machine and configures each one automatically. Claude Code, Codex, and Gemini auto-activate on every new session; other agents use a per-session /caveman command unless installed with the --with-init flag.

How do I install Caveman?

Run one curl command: curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash. Requires Node 18 or higher. Takes about 30 seconds and skips any agents you do not have installed. Windows users run the equivalent PowerShell command (irm ... | iex). Detailed per-agent instructions and flags are in the INSTALL.md file in the repository.

Does Caveman cost anything?

No. Caveman is MIT licensed and free to install and use. It saves money by reducing output token consumption by an average of 65%, which directly lowers API costs for developers on per-token plans for Claude, GPT-4o, or other LLMs. The caveman-compress feature adds input token savings on top of that.

What are the four compression levels and when should I use each?

"lite" drops filler phrases while keeping mostly complete sentences, good for documentation you will share or reviews with non-technical readers. "full" (the default) produces fragment-based caveman-style responses, the best balance of brevity and readability for solo development. "ultra" goes telegraphic, shortest possible output, best for quick answers in solo sessions. "wenyan" uses classical Chinese phrasing, the most token-dense option. Switch levels with /caveman lite, /caveman full, etc.

also worth a look

Similar open-source tools#

OpenMolt

OpenMolt

Build programmatic AI agents in Node.js, open source

34TypeScriptMIT
AI-Flow

AI-Flow

Visually chain AI models and APIs into automated pipelines

283TypeScriptMIT
Letta

Letta

Give your LLM agents persistent memory across every conversation

2.5KTypeScriptApache-2.0
jcode

jcode

Next-gen coding agent harness for efficient workflows

6KRustMIT
9Router

9Router

Smart AI Router with 3-Tier Fallback

9.8KJavaScriptMIT
Tabby

Tabby

Self-hosted AI coding assistant server for private team deployment

33.6KRustApache-2.0

Repository

Stars
66.1K
Forks
3.7K
License
MIT
Latest
v1.8.2
Last commit
11 days ago
Last verified
May 29, 2026
Repo
JuliusBrussee/caveman ↗

Additional details

Language
JavaScript
Open issues
244
Contributors
29
First release
2026

Categories

AI & Machine LearningLLMOps & AI ToolingDeveloper Tools

Tags

LLMPrompt EngineeringAI Coding AssistantDeveloper ToolsAI Agents