Logo of Voicebox

Voicebox

Voicebox is an open-source, local-first voice synthesis studio for cloning voices, generating speech, and building voice-powered apps.

23.6K stars2.8K forksTypeScriptMITActive this month

What Voicebox does

What is Voicebox?

Voicebox is a local-first, open-source voice synthesis studio built as a free alternative to ElevenLabs. It runs entirely on your machine — your voice data, your models, your privacy. Whether you need to clone a voice from a few seconds of audio, generate speech across 23 languages, or compose multi-voice narratives for podcasts and audiobooks, Voicebox handles it all without sending a single audio sample to the cloud.

Who it's for

Voicebox is designed for creators, developers, and privacy-conscious teams who need professional-grade voice synthesis without subscription fees or cloud dependency. Podcasters, game developers, accessibility tool builders, and indie content creators will find it particularly useful. Developers can integrate it into their own projects via the built-in REST API.

Key capabilities

  • 7 TTS engines: Qwen3-TTS, Qwen CustomVoice, LuxTTS, Chatterbox Multilingual, Chatterbox Turbo, HumeAI TADA, and Kokoro
  • Zero-shot voice cloning from a short reference audio sample
  • 50+ preset voices via Kokoro, plus 9 Qwen CustomVoice presets
  • 23 languages including English, Japanese, Hindi, Arabic, and Swahili
  • Expressive speech with paralinguistic tags like [laugh], [sigh], and [gasp]
  • Post-processing effects: pitch shift, reverb, delay, chorus, compression, and filters
  • Stories editor: a multi-track timeline for conversations and podcasts
  • Native performance: built with Tauri (Rust), not Electron — fast and lightweight
  • Runs on macOS (MLX/Metal), Windows (CUDA), Linux, AMD ROCm, Intel Arc, and Docker

Why choose it over ElevenLabs?

ElevenLabs and similar platforms (Murf.ai, Play.ht, Speechify) charge per character generated and keep your voice data on their servers. Voicebox eliminates both concerns. There are no usage limits — generate as much audio as your GPU can handle. Your voice clones are stored locally, making it the only viable option for teams handling sensitive or proprietary audio content. With 21,000+ GitHub stars and active development, it is already one of the most capable open-source voice tools available.

GitHub Activity

Last commit

14 days ago

Last synced

Apr 27, 2026

23.6KStars
2.8KForks
285Open Issues
MITLicense

Tech Stack

Detected via GitHub

Languages

PythonRustTypeScript

Frameworks

Next.jsReact

Details

Similar Open Source Tools

Stay Updated

Subscribe to our newsletter for the latest news and updates about Alternatives