icon of BLOOM

BLOOM

BLOOM is a 176-billion parameter open-access multilingual language model trained on 46 natural languages — the first model at GPT-3 scale with openly downloadable weights spanning non-English languages.

1K stars101 forksShellNOASSERTION

What BLOOM does

BLOOM is the 176-billion parameter open-access language model created by the BigScience research workshop — the first model at GPT-3 scale with openly downloadable weights, spanning 46 natural languages including Arabic, Chinese, French, Hindi, Spanish, and a range of African and Indic languages, plus 13 programming languages.

The Problem

GPT-3 and GPT-4 are predominantly English-centric. While they handle common European languages adequately, coverage for lower-resource languages is shallow. More critically, their weights are closed: you cannot download them, audit their training data, run inference locally, or fine-tune them for a language or domain that OpenAI does not support. Research into multilingual NLP or controlled LLM behavior requires weights you can actually inspect and modify.

How BLOOM Solves It

BLOOM was trained collaboratively by over 1,000 researchers across 70 countries using the ROOTS corpus — 498 billion tokens of curated multilingual text with documented provenance. The model architecture is a decoder-only transformer comparable in scale to GPT-3, usable for text generation, summarization, translation, and classification. Weights are on Hugging Face and loadable with the transformers library. Smaller BLOOM variants (560M to 7.1B) and instruction-tuned derivatives (BLOOMZ, mT0) are available for lower-resource setups.

Key Features

  • 176B parameters trained on 46 natural languages and 13 programming languages
  • Open, downloadable weights on Hugging Face — no API key needed to run inference
  • ROOTS training corpus documented with partially public data provenance
  • BLOOM variants from 560M to 7.1B parameters for lower-resource deployments
  • Decoder-only architecture compatible with standard Hugging Face generation pipelines
  • BLOOMZ and mT0 instruction-tuned variants available for zero-shot task following

Who It's For

BLOOM is best for multilingual NLP researchers who need open weights across non-English languages, organizations building text applications for underserved languages that proprietary models handle poorly, and AI safety researchers who require transparent, inspectable model weights with documented training data provenance.

Compared to GPT-3

Unlike GPT-3 and GPT-4, BLOOM's weights are openly downloadable — you can run inference locally, fine-tune on your own data, and inspect the training corpus provenance, while covering 46 languages including many that proprietary models handle at a shallow level.

License

BLOOM is released under a RAIL (Responsible AI License) — not a standard open source license. Weights are open-access and freely downloadable, but the RAIL license restricts specific harmful use cases. Commercial use is permitted with restrictions. Review the RAIL license before deploying BLOOM in production.

GitHub Activity

Last commit

654 days ago

Last synced

May 13, 2026

1KStars
101Forks
20Open Issues
NOASSERTIONLicense

Tech Stack

Detected via GitHub

Languages

Python

Details

Similar Open Source Tools

Stay Updated

Subscribe to our newsletter for the latest news and updates about Alternatives