Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 46 additions & 7 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,17 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

**data-pipelines-cli** (`dp`) is a CLI tool for managing data platform workflows. It orchestrates dbt projects, cloud deployments, Docker builds, and multi-service integrations (Airbyte, DataHub, Looker). Projects are created from templates using copier, compiled with environment-specific configs, and deployed to cloud storage (GCS, S3).

**Version:** 0.31.0 | **Python:** 3.9-3.12 | **License:** Apache 2.0
**Version:** 0.32.0 (unreleased) | **Python:** 3.9-3.12 | **License:** Apache 2.0

## Documentation Style

Write concise, technical, minimal descriptions. Developer-to-developer communication:
- State facts, no verbose explanations
- Focus on what changed, not why it matters
- Example: "Expanded dbt-core support: `>=1.7.3,<2.0.0`" (good) vs "We expanded dbt support to allow users more flexibility..." (bad)
- CHANGELOG: List changes only, no context or justification
- Code comments: Describe implementation, not rationale
- Commit messages: Precise technical changes

## Quick Command Reference

Expand All @@ -33,6 +43,15 @@ flake8 data_pipelines_cli tests
mypy data_pipelines_cli
```

### Installation

Must install with adapter extra:
```bash
pip install data-pipelines-cli[snowflake] # Snowflake (primary)
pip install data-pipelines-cli[bigquery] # BigQuery
pip install data-pipelines-cli[snowflake,docker,datahub,gcs] # Multiple extras
```

### CLI Workflow
```bash
# Initialize global config
Expand Down Expand Up @@ -177,6 +196,7 @@ run_dbt_command(("run",), env, profiles_path)
|------|-------|---------|
| **cli_commands/compile.py** | 160+ | Orchestrates compilation: file copying, config merging, dbt compile, Docker build |
| **cli_commands/deploy.py** | 240+ | Orchestrates deployment: Docker, DataHub, Airbyte, Looker, cloud storage |
| **cli_commands/publish.py** | 140+ | Publish dbt package to Git; parses manifest.json as plain JSON (no dbt Python API) |
| **config_generation.py** | 175+ | Config merging logic, profiles.yml generation |
| **dbt_utils.py** | 95+ | dbt subprocess execution with variable aggregation |
| **filesystem_utils.py** | 75+ | LocalRemoteSync class for cloud storage (uses fsspec) |
Expand All @@ -190,7 +210,7 @@ run_dbt_command(("run",), env, profiles_path)
### Core (always installed)
- **click** (8.1.3): CLI framework
- **copier** (7.0.1): Project templating
- **dbt-core** (1.7.3): Data build tool
- **dbt-core** (>=1.7.3,<2.0.0): Data build tool - supports 1.7.x through 1.10.x
- **fsspec** (>=2024.6.0,<2025.0.0): Cloud filesystem abstraction
- **jinja2** (3.1.2): Template rendering
- **pyyaml** (6.0.1): Config parsing
Expand All @@ -200,11 +220,11 @@ run_dbt_command(("run",), env, profiles_path)

### Optional Extras
```bash
# dbt adapters
pip install data-pipelines-cli[bigquery] # dbt-bigquery==1.7.2
pip install data-pipelines-cli[snowflake] # dbt-snowflake==1.7.1
pip install data-pipelines-cli[postgres] # dbt-postgres==1.7.3
pip install data-pipelines-cli[databricks] # dbt-databricks-factory
# dbt adapters (version ranges support 1.7.x through 1.10.x)
pip install data-pipelines-cli[snowflake] # dbt-snowflake>=1.7.1,<2.0.0 (PRIMARY)
pip install data-pipelines-cli[bigquery] # dbt-bigquery>=1.7.2,<2.0.0
pip install data-pipelines-cli[postgres] # dbt-postgres>=1.7.3,<2.0.0
pip install data-pipelines-cli[databricks] # dbt-databricks-factory>=0.1.1
pip install data-pipelines-cli[dbt-all] # All adapters

# Cloud/integrations
Expand Down Expand Up @@ -332,6 +352,25 @@ my_pipeline/ # Created by dp create
- **Code generation** requires compilation first (needs manifest.json)
- **Test mocking:** S3 uses moto, GCS uses gcp-storage-emulator

## Recent Changes (v0.32.0 - Unreleased)

**dbt Version Support Expanded**
- All adapters: version ranges `>=1.7.x,<2.0.0` (was exact pins)
- dbt-core removed from INSTALL_REQUIREMENTS (adapters provide it)
- Snowflake added to test suite (primary adapter)
- **CRITICAL:** `cli_commands/publish.py` refactored to parse `manifest.json` as plain JSON instead of using dbt Python API (fixes dbt 1.8+ compatibility)
- All other commands use subprocess calls to dbt CLI
- No dependency on unstable `dbt.contracts.*` modules
- Works across dbt 1.7.x through 1.10.x (verified with 70 test executions)
- See `design/001-dbt-manifest-api-migration.md` for full details

**dbt Pre-release Installation Edge Case**
- Stable `dbt-snowflake==1.10.3` declares `dbt-core>=1.10.0rc0` dependency
- The `rc0` constraint allows pip to install beta versions (e.g., `dbt-core==1.11.0b4`)
- This is PEP 440 standard behavior, not a bug
- Added troubleshooting documentation: `pip install --force-reinstall 'dbt-core>=1.7.3,<2.0.0'`
- No code changes needed (rare edge case, self-correcting when stable releases update)

## Recent Changes (v0.31.0)

**Python 3.11/3.12 Support**
Expand Down
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,23 @@

## [Unreleased]

### Changed

- Expanded all dbt adapter version ranges to `>=1.7.x,<2.0.0` (Snowflake, BigQuery, Postgres, Redshift, Glue)
- Added Snowflake adapter to test suite (tox.ini)
- Removed dbt-core from base requirements (all adapters provide it as dependency)
- Jinja2 version constraint: `==3.1.2` → `>=3.1.3,<4`

### Fixed

- `dp publish` compatibility with dbt 1.8+ (removed dependency on unstable Python API)
- CLI import failure when GitPython not installed

### Removed

- MarkupSafe pin (managed by Jinja2)
- Werkzeug dependency (unused)

## [0.31.0] - 2025-11-03

## [0.30.0] - 2023-12-08
Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ pip install -r requirements-dev.txt
pre-commit install
```

**Note:** A dbt adapter extra (e.g., `bigquery`, `snowflake`) is required because dbt-core is provided as a transitive dependency. Any adapter can be used for development.

## Running Tests

```bash
Expand Down
36 changes: 34 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![PyPI Version](https://badge.fury.io/py/data-pipelines-cli.svg)](https://pypi.org/project/data-pipelines-cli/)
[![Downloads](https://pepy.tech/badge/data-pipelines-cli)](https://pepy.tech/project/data-pipelines-cli)
[![Maintainability](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/maintainability)](https://codeclimate.com/github/getindata/data-pipelines-cli/maintainability)
[![Test Coverage](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/test_coverage)](https://codeclimate.com/github/getindata/data-pipelines-cli/test_coverage)
[![Test Coverage](https://img.shields.io/badge/test%20coverage-95%25-brightgreen.svg)](https://github.com/getindata/data-pipelines-cli)
[![Documentation Status](https://readthedocs.org/projects/data-pipelines-cli/badge/?version=latest)](https://data-pipelines-cli.readthedocs.io/en/latest/?badge=latest)

CLI for data platform
Expand All @@ -14,12 +14,44 @@ CLI for data platform
Read the full documentation at [https://data-pipelines-cli.readthedocs.io/](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)

## Installation
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install [dp (data-pipelines-cli)](https://pypi.org/project/data-pipelines-cli/):

**Requirements:** Python 3.9-3.12

### Required

A dbt adapter extra must be installed:

```bash
pip install data-pipelines-cli[snowflake] # Snowflake
pip install data-pipelines-cli[bigquery] # BigQuery
pip install data-pipelines-cli[postgres] # PostgreSQL
pip install data-pipelines-cli[databricks] # Databricks
```

To pin a specific dbt-core version:

```bash
pip install data-pipelines-cli[snowflake] 'dbt-core>=1.8.0,<1.9.0'
```

### Optional

Additional integrations: `docker`, `datahub`, `looker`, `gcs`, `s3`, `git`

### Example

```bash
pip install data-pipelines-cli[bigquery,docker,datahub,gcs]
```

### Troubleshooting

**Pre-release dbt versions**: data-pipelines-cli requires stable dbt-core releases. If you encounter errors with beta or RC versions, reinstall with stable versions:

```bash
pip install --force-reinstall 'dbt-core>=1.7.3,<2.0.0'
```

## Usage
First, create a repository with a global configuration file that you or your organization will be using. The repository
should contain `dp.yml.tmpl` file looking similar to this:
Expand Down
78 changes: 43 additions & 35 deletions data_pipelines_cli/cli_commands/publish.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
from __future__ import annotations

import json
import pathlib
import shutil
from typing import Any, Dict, List, Tuple, cast
from typing import Any, Dict, List, Tuple

import click
import yaml
from dbt.contracts.graph.manifest import Manifest
from dbt.contracts.graph.nodes import ColumnInfo, ManifestNode

from ..cli_constants import BUILD_DIR
from ..cli_utils import echo_info, echo_warning
Expand All @@ -29,43 +29,52 @@ def _get_project_name_and_version() -> Tuple[str, str]:
return dbt_project_config["name"], dbt_project_config["version"]


def _get_database_and_schema_name(manifest: Manifest) -> Tuple[str, str]:
try:
model = next(
node
for node in (cast(ManifestNode, n) for n in manifest.nodes.values())
if node.resource_type == "model"
)
return model.database, model.schema
except StopIteration:
raise DataPipelinesError("There is no model in 'manifest.json' file.")
def _get_database_and_schema_name(manifest_dict: Dict[str, Any]) -> Tuple[str, str]:
nodes = manifest_dict.get("nodes")
if not nodes:
raise DataPipelinesError("Invalid manifest.json: missing 'nodes' key")

for node_id, node in nodes.items():
if node.get("resource_type") == "model":
database = node.get("database")
schema = node.get("schema")
if not database or not schema:
raise DataPipelinesError(
f"Model {node.get('name', node_id)} missing database or schema"
)
return database, schema

raise DataPipelinesError("There is no model in 'manifest.json' file.")


def _parse_columns_dict_into_table_list(columns: Dict[str, ColumnInfo]) -> List[DbtTableColumn]:
def _parse_columns_dict_into_table_list(columns: Dict[str, Any]) -> List[DbtTableColumn]:
return [
DbtTableColumn(
name=column.name,
description=column.description,
meta=column.meta,
quote=column.quote,
tags=column.tags,
name=col_data.get("name", ""),
description=col_data.get("description", ""),
meta=col_data.get("meta", {}),
quote=col_data.get("quote"),
tags=col_data.get("tags", []),
)
for column in columns.values()
for col_data in columns.values()
]


def _parse_models_schema(manifest: Manifest) -> List[DbtModel]:
return [
DbtModel(
name=node.name,
description=node.description,
tags=node.tags,
meta=node.meta,
columns=_parse_columns_dict_into_table_list(node.columns),
)
for node in (cast(ManifestNode, n) for n in manifest.nodes.values())
if node.resource_type == "model"
]
def _parse_models_schema(manifest_dict: Dict[str, Any]) -> List[DbtModel]:
nodes = manifest_dict.get("nodes", {})
models = []
for node_id, node in nodes.items():
if node.get("resource_type") == "model":
models.append(
DbtModel(
name=node.get("name", ""),
description=node.get("description", ""),
tags=node.get("tags", []),
meta=node.get("meta", {}),
columns=_parse_columns_dict_into_table_list(node.get("columns", {})),
)
)
return models


def _get_dag_id() -> str:
Expand All @@ -76,15 +85,14 @@ def _get_dag_id() -> str:
def _create_source(project_name: str) -> DbtSource:
with open(pathlib.Path.cwd().joinpath("target", "manifest.json"), "r") as manifest_json:
manifest_dict = json.load(manifest_json)
manifest = Manifest.from_dict(manifest_dict)

database_name, schema_name = _get_database_and_schema_name(manifest)
database_name, schema_name = _get_database_and_schema_name(manifest_dict)

return DbtSource(
name=project_name,
database=database_name,
schema=schema_name,
tables=_parse_models_schema(manifest),
tables=_parse_models_schema(manifest_dict),
meta={"dag": _get_dag_id()},
tags=[f"project:{project_name}"],
)
Expand Down
18 changes: 16 additions & 2 deletions data_pipelines_cli/looker_utils.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from __future__ import annotations

import glob
import os
import pathlib
Expand All @@ -6,10 +8,17 @@

import requests
import yaml
from git import Repo

from .cli_constants import BUILD_DIR
from .cli_utils import echo_info, subprocess_run
from .cli_utils import echo_info, echo_warning, subprocess_run

try:
from git import Repo

GIT_EXISTS = True
except ImportError:
echo_warning("Git support not installed.")
GIT_EXISTS = False
from .config_generation import (
generate_profiles_yml,
read_dictionary_from_config_directory,
Expand Down Expand Up @@ -48,6 +57,11 @@ def deploy_lookML_model(key_path: str, env: str) -> None:
:param env: Name of the environment
:type env: str
"""
if not GIT_EXISTS:
from .errors import DependencyNotInstalledError

raise DependencyNotInstalledError("git")

profiles_path = generate_profiles_yml(env, False)
run_dbt_command(("docs", "generate"), env, profiles_path)

Expand Down
Loading