icon of Firecrawl

Firecrawl

Firecrawl is an open source web scraping API for AI that converts any website into clean markdown or structured JSON, purpose-built for feeding LLMs and RAG pipelines. AGPL-3.0.

111K stars7.1K forksTypeScriptAGPL-3.0Active this week

What is Firecrawl?

Firecrawl is the open source web scraping API that turns any URL into clean, LLM-ready content, handling JavaScript rendering, HTML-to-markdown conversion, and structured data extraction so your AI pipeline receives signal instead of raw DOM noise.

The Problem

Feeding live web data to an LLM is harder than it looks. A raw HTML page is 80% navigation, ads, footers, and scripts, context tokens wasted on boilerplate. BeautifulSoup handles static pages but breaks on JavaScript-rendered content. Headless Playwright works but requires building and maintaining per-site scrapers. Neither approach scales when you need to crawl hundreds of URLs on a schedule or extract structured data from inconsistent page layouts.

How Firecrawl Solves It

Firecrawl abstracts all of that. Submit a URL and receive clean markdown. Submit a domain and receive a full sitemap plus crawled content for every page. The extraction API accepts a JSON schema and returns typed data from any page, no prompt engineering required. LLM-based extraction handles inconsistent layouts that CSS selectors cannot.

Key Features
  • Single-URL scrape: returns clean markdown stripped of navigation and boilerplate
  • Site crawl: recursively crawls a domain and returns all pages as structured documents
  • Structured extraction: define a JSON schema and Firecrawl returns typed data from any page
  • JavaScript rendering via headless Chrome for SPAs and dynamic content
  • Change detection: re-scrape on a schedule and diff to track content updates
  • LLM-ready output: markdown, HTML, links, and metadata in a single response object
Self-Hosting

Clone the repository, copy .env.example, and run docker compose up. The Compose file includes Redis and a Playwright-capable headless Chrome environment. No separate infrastructure setup is required beyond Docker.

License

AGPL-3.0. Free to use and modify; distributing a modified version as a network service requires open-sourcing your changes under AGPL. A commercial license is available from Mendable for teams that need to avoid AGPL obligations.

Who It's For

Firecrawl is best for AI engineers building RAG systems who need clean, up-to-date web content, developers replacing brittle custom scrapers with an LLM-ready API, and teams building AI products that monitor or index third-party website content.

Compared to BeautifulSoup

Unlike BeautifulSoup or custom scrapers, Firecrawl handles JavaScript-rendered pages, strips boilerplate automatically, and returns LLM-ready markdown or typed JSON, no per-site scraper maintenance, no raw HTML parsing.

GitHub Activity

111KStars
7.1KForks
281Open Issues
AGPL-3.0License

Tech Stack

language TypeScript

Details

Related Alternatives

Stay Updated

Subscribe to our newsletter for the latest news and updates about Alternatives