Ploomber is an open source Python framework for building reproducible data science and ML pipelines that treats Jupyter notebooks as pipeline tasks and deploys them to cloud or on-premises infrastructure without a dedicated workflow orchestrator.
The Problem
Apache Airflow requires a separate cluster, operators, and infrastructure configuration that adds weeks of setup before a data scientist can run their first pipeline in production. Prefect and Dagster improve on Airflow's ergonomics but still require teams to write DAG configuration alongside their analysis code. Data scientists working in Jupyter notebooks face a friction gap between exploration and production deployment.
How Ploomber Solves It
Ploomber treats Jupyter notebooks and Python scripts as first-class pipeline tasks. Developers define task dependencies in a YAML pipeline file, and Ploomber handles execution order, incremental builds (skipping completed tasks on re-runs), parameter injection, and output storage. It deploys to AWS Lambda, Google Cloud Run, Kubernetes, or on-premises servers. Apache-2.0 licensed.
Key Features
- Notebook-first pipeline development: each Jupyter notebook is a pipeline task with injected parameters
- DAG-based execution with incremental builds that skip completed tasks on re-runs
- Cloud deployment to AWS Lambda, Google Cloud Run, Kubernetes, or SLURM clusters
- Automatic pipeline plotting and dependency graph visualization for documentation
- Integrated testing with task-level output validation hooks
- Apache-2.0 licensed; works with existing notebooks without refactoring analysis code
Who It's For
Ploomber is best for data scientists and ML engineers who need to productionize Jupyter notebook-based workflows without learning Airflow's operator model or rewriting analysis code into DAG task functions. It suits teams that want cloud deployment from existing notebooks with minimal framework-specific code.
Compared to Apache Airflow
Unlike Apache Airflow, Ploomber does not require a dedicated orchestration cluster or operator code. Airflow excels at scheduling complex, long-running production pipelines with extensive retry and alerting logic; Ploomber excels at moving data science notebooks into reproducible, versioned pipelines with lower overhead and cloud deployment built in.

