Installation
Use the package manager pip to install data-pipelines-cli (requires Python 3.9-3.12).
You need to provide different flags in square brackets depending on the systems you want to integrate with. You can provide comma separated list of flags.
Required Flags
A dbt adapter must be installed (provides dbt-core as transitive dependency). Depending on the data storage you have you can use:
bigquery- Google BigQuerysnowflake- Snowflakeredshift- Amazon Redshiftpostgres- PostgreSQLdatabricks- Databricks
Example:
pip install data-pipelines-cli[bigquery]
To pin a specific dbt-core version:
pip install data-pipelines-cli[snowflake] 'dbt-core>=1.8.0,<1.9.0'
Optional Flags
If you need git integration for loading packages published by other projects or publish them by yourself:
git
If you want to deploy created artifacts (docker images and DataHub metadata) add the following flags (these are not usually used by a person user):
dockerdatahub
If you need Business Intelligence integration:
looker
For cloud storage deployment:
gcs- Google Cloud Storages3- AWS S3
Example with Multiple Flags
pip install data-pipelines-cli[bigquery,docker,datahub,gcs]
Troubleshooting
Pre-release dbt versions: data-pipelines-cli requires stable dbt-core releases. If you encounter errors with beta or RC versions, reinstall with stable versions:
pip install --force-reinstall 'dbt-core>=1.7.3,<2.0.0'