Firecrawl is the open source web scraping API that turns any URL into clean, LLM-ready content, handling JavaScript rendering, HTML-to-markdown conversion, and structured data extraction so your AI pipeline receives signal instead of raw DOM noise.
The Problem
Feeding live web data to an LLM is harder than it looks. A raw HTML page is 80% navigation, ads, footers, and scripts, context tokens wasted on boilerplate. BeautifulSoup handles static pages but breaks on JavaScript-rendered content. Headless Playwright works but requires building and maintaining per-site scrapers. Neither approach scales when you need to crawl hundreds of URLs on a schedule or extract structured data from inconsistent page layouts.
How Firecrawl Solves It
Firecrawl abstracts all of that. Submit a URL and receive clean markdown. Submit a domain and receive a full sitemap plus crawled content for every page. The extraction API accepts a JSON schema and returns typed data from any page, no prompt engineering required. LLM-based extraction handles inconsistent layouts that CSS selectors cannot.
Key Features
- Single-URL scrape: returns clean markdown stripped of navigation and boilerplate
- Site crawl: recursively crawls a domain and returns all pages as structured documents
- Structured extraction: define a JSON schema and Firecrawl returns typed data from any page
- JavaScript rendering via headless Chrome for SPAs and dynamic content
- Change detection: re-scrape on a schedule and diff to track content updates
- LLM-ready output: markdown, HTML, links, and metadata in a single response object
Self-Hosting
Clone the repository, copy .env.example, and run docker compose up. The Compose file includes Redis and a Playwright-capable headless Chrome environment. No separate infrastructure setup is required beyond Docker.
License
AGPL-3.0. Free to use and modify; distributing a modified version as a network service requires open-sourcing your changes under AGPL. A commercial license is available from Mendable for teams that need to avoid AGPL obligations.
Who It's For
Firecrawl is best for AI engineers building RAG systems who need clean, up-to-date web content, developers replacing brittle custom scrapers with an LLM-ready API, and teams building AI products that monitor or index third-party website content.
Compared to BeautifulSoup
Unlike BeautifulSoup or custom scrapers, Firecrawl handles JavaScript-rendered pages, strips boilerplate automatically, and returns LLM-ready markdown or typed JSON, no per-site scraper maintenance, no raw HTML parsing.

