From e0cda01137c6760a688494bbeb835942e66df9ee Mon Sep 17 00:00:00 2001 From: Cat Areias Date: Thu, 19 May 2022 16:19:50 +0000 Subject: [PATCH 1/5] Added STS sections and tutorial --- docs/macros.md | 114 +++++++++++++++++++++++++++++++++++++++ docs/metadata.md | 56 +++++++++++++++++++ docs/tutorial/tut_sts.md | 99 ++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 4 files changed, 270 insertions(+) create mode 100644 docs/tutorial/tut_sts.md diff --git a/docs/macros.md b/docs/macros.md index a028863..d79f1e8 100644 --- a/docs/macros.md +++ b/docs/macros.md @@ -1636,6 +1636,120 @@ use [hashdiff aliasing](best_practices.md#hashdiff-aliasing) ___ +### sts + +###### view source: +[![Snowflake](./assets/images/platform_icons/snowflake.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/snowflake/sts.sql) +[![BigQuery](./assets/images/platform_icons/bigquery.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/bigquery/sts.sql) +[![SQLServer](./assets/images/platform_icons/sqlserver.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/sqlserver/sts.sql) + +Generates SQL to build a Status Tracking Satellite table using the provided parameters. + +#### Usage + +``` jinja +{{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, + src_status=src_status, source_model=source_model }} +``` + +#### Parameters + +| Parameter | Description | Type | Required? | +|--------------|----------------------------------------------|---------|-----------------------------------------------| +| src_pk | Source primary key column | String | :fontawesome-solid-check-circle:{ .required } | +| src_ldts | Source load date timestamp column | String | :fontawesome-solid-check-circle:{ .required } | +| src_source | Name of the column containing the source ID | String | :fontawesome-solid-check-circle:{ .required } | +| src_status | Source data status column | String | :fontawesome-solid-check-circle:{ .required } | +| source_model | Staging model name | String | :fontawesome-solid-check-circle:{ .required } | + +!!! tip + [Read the tutorial](tutorial/tut_sts.md) for more details + +#### Example Metadata + +[See examples](metadata.md#status-tracking-satellites) + +#### Example Output + +=== "Snowflake" + + === "Single-Source" + + ```sql + WITH source_data AS ( + SELECT a.CUSTOMER_PK, a.LOAD_DATE, a.SOURCE + FROM DBTVAULT.TEST.STG_CUSTOMER AS a + WHERE a.CUSTOMER_PK IS NOT NULL + ), + + latest_records AS ( + SELECT b.CUSTOMER_PK, b.LOAD_DATE, b.SOURCE, b.STATUS + FROM ( + SELECT current_records.CUSTOMER_PK, current_records.LOAD_DATE, current_records.SOURCE, current_records.STATUS, + RANK() OVER ( + PARTITION BY current_records.CUSTOMER_PK + ORDER BY current_records.LOAD_DATE DESC + ) AS rank + FROM DBTVAULT.TEST.STS AS current_records + ) AS b + WHERE b.rank = 1 + ), + + records_to_insert AS ( + SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, + 'I' AS "STATUS" + FROM source_data AS stage + WHERE NOT EXISTS ( + SELECT 1 + FROM latest_records + WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + AND latest_records.STATUS != 'D') + ) + + UNION ALL + + SELECT DISTINCT latest_records.CUSTOMER_PK, stage.LOAD_DATE, latest_records.SOURCE, + 'D' AS "STATUS" + FROM source_data AS stage, latest_records + WHERE NOT EXISTS ( + SELECT 1 + FROM source_data AS stage + WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + AND latest_records.SOURCE IS NOT NULL) + ) + + UNION ALL + + SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, + 'U' AS "STATUS" + FROM source_data AS stage + WHERE EXISTS ( + SELECT 1 + FROM latest_records + WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + AND latest_records."STATUS" != 'D') + ) + ) + + SELECT * FROM records_to_insert + ``` + +=== "Google BigQuery" + + === "Single-Source" + + ```sql + + ``` + +=== "MS SQL Server" + + === "Single-Source" + + ```sql + + ``` + ### eff_sat ###### view source: diff --git a/docs/metadata.md b/docs/metadata.md index c059866..ba9a726 100644 --- a/docs/metadata.md +++ b/docs/metadata.md @@ -700,6 +700,62 @@ Hashdiff aliasing allows you to set an alias for the `HASHDIFF` column. #### Metadata +=== "Per-model - YAML strings" + + ```jinja + {%- set yaml_metadata -%} + source_model: v_stg_order_customer + src_pk: ORDER_CUSTOMER_HK + src_dfk: + - ORDER_HK + src_sfk: CUSTOMER_HK + src_start_date: START_DATE + src_end_date: END_DATE + src_eff: EFFECTIVE_FROM + src_ldts: LOAD_DATETIME + src_source: RECORD_SOURCE + {%- endset -%} + + {% set metadata_dict = fromyaml(yaml_metadata) %} + + {{ dbtvault.eff_sat(src_pk=metadata_dict["src_pk"], + src_dfk=metadata_dict["src_dfk"], + src_sfk=metadata_dict["src_sfk"], + src_start_date=metadata_dict["src_start_date"], + src_end_date=metadata_dict["src_end_date"], + src_eff=metadata_dict["src_eff"], + src_ldts=metadata_dict["src_ldts"], + src_source=metadata_dict["src_source"], + source_model=metadata_dict["source_model"]) }} + ``` + +=== "Per-Model - Variables" + + ```jinja + {%- set source_model = "v_stg_order_customer" -%} + {%- set src_pk = "ORDER_CUSTOMER_HK" -%} + {%- set src_dfk = ["ORDER_HK"] -%} + {%- set src_sfk = "CUSTOMER_HK" -%} + {%- set src_start_date = "START_DATE" -%} + {%- set src_end_date = "END_DATE" -%} + {%- set src_eff = "EFFECTIVE_FROM" -%} + {%- set src_ldts = "LOAD_DATETIME" -%} + {%- set src_source = "RECORD_SOURCE" -%} + + {{ dbtvault.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk, + src_start_date=src_start_date, src_end_date=src_end_date, + src_eff=src_eff, src_ldts=src_ldts, src_source=src_source, + source_model=source_model) }} + ``` + +### Status Tracking Satellites + +#### Parameters + +[eff_sat macro parameters](macros.md#eff_sat) + +#### Metadata + === "Per-model - YAML strings" ```jinja diff --git a/docs/tutorial/tut_sts.md b/docs/tutorial/tut_sts.md new file mode 100644 index 0000000..e0a50ae --- /dev/null +++ b/docs/tutorial/tut_sts.md @@ -0,0 +1,99 @@ +> OVERVIEW OF STRUCTURE HERE - REPLACE ME +Status Tracking Satellite theory +A Status Tracking Satellite (STS) ... +### Structure + +A Status Tracking Satellite contains: + +#### Primary Key (src_pk) +A primary key (or surrogate key) which is usually a hashed representation of the natural key. +For a Status Tracking Satellite, this should be the same as the corresponding link's PK. + +#### Load date (src_ldts) +A load date or load date timestamp. This identifies when the record was first loaded into the database. + +#### Record Source (src_source) +The source for the record. This can be a code which is assigned to a source name in an external lookup table, +or a string directly naming the source system. + +#### Status (src_status) +Status of the record, can have value Insert(I), Update(U), Delete(D) +Insert: +Update: +Delete: + +### Creating Status Tracking Satellite models + +Create a new dbt model as before. We'll call this one `sts_customer`. + +=== "sts_customer.sql" + + ```jinja + {{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, + src_status=src_status, source_model=source_model }} + ``` + +To create an STS model, we simply copy and paste the above template into a model named after the STS we +are creating. dbtvault will generate an STS using parameters provided in the next steps. + +#### Materialisation + +The recommended materialisation for **Status Tracking Satellites** is `incremental`, as we load and add new records to the existing data set. + +### Adding the metadata + +Let's look at the metadata we need to provide to the [sts macro](../macros.md#sts). + +We provide the column names which we would like to select from the staging area (`source_model`). + +Using our [knowledge](#structure) of what columns we need in our `sts_customer` STS, we can identify columns in our +staging layer which map to them: + +| Parameter | Value | +|--------------|----------------| +| source_model | v_stg_customer | +| src_pk | CUSTOMER_PK | +| src_ldts | LOAD_DATE | +| src_source | SOURCE | +| src_status | STATUS | + +When we provide the metadata above, our model should look like the following: + +=== "sts_customer.sql" + + ```jinja + {{ config(materialized='incremental') }} + + {%- set source_model = "v_stg_customer" -%} + {%- set src_pk = "CUSTOMER_PK" -%} + {%- set src_ldts = "LOAD_DATE" -%} + {%- set src_source = "SOURCE" -%} + {%- set src_status = "STATUS" -%} + + {{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, + src_status=src_status, + source_model=source_model }} + ``` + +!!! Note + See our [metadata reference](../metadata.md#status-tracking-satellites) for more detail on how to provide metadata to Status Tracking Satellites. + +### Running dbt + +With our metadata provided and our model complete, we can run dbt to create our `sts_customer` Status Tracking Satellite, as follows: + +=== "< dbt v0.20.x" + `dbt run -m +sts_customer` +=== "> dbt v0.21.0" + `dbt run -s +sts_customer` + +The resulting Status Tracking Satellite will look like this: + +| CUSTOMER_PK | LOAD_DATE | SOURCE | STATUS | +|-------------|-------------|--------|--------| +| B8C37E... | 1993-01-01 | * | I | +| . | . | . | . | +| . | . | . | . | +| FED333... | 1993-01-01 | * | U | + +--8<-- "includes/abbreviations.md" \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 0c24b72..9468bce 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -37,6 +37,7 @@ nav: - Links: 'tutorial/tut_links.md' - Transactional Links: 'tutorial/tut_t_links.md' - Satellites: 'tutorial/tut_satellites.md' + - Status Tracking Satellites: 'tutorial/tut_sts.md' - Effectivity Satellites: 'tutorial/tut_eff_satellites.md' - Multi-Active Satellites: 'tutorial/tut_multi_active_satellites.md' - Extended Tracking Satellites: 'tutorial/tut_xts.md' From b33c872e764eeaba285cf93362ec38e5154e772e Mon Sep 17 00:00:00 2001 From: Tim Wilson Date: Wed, 20 Jul 2022 15:50:49 +0100 Subject: [PATCH 2/5] WIP sts documentation --- docs/macros.md | 81 ++++++++++++++++++++------------------------------ 1 file changed, 32 insertions(+), 49 deletions(-) diff --git a/docs/macros.md b/docs/macros.md index d79f1e8..0313ded 100644 --- a/docs/macros.md +++ b/docs/macros.md @@ -1673,7 +1673,7 @@ Generates SQL to build a Status Tracking Satellite table using the provided para === "Snowflake" - === "Single-Source" + === "Base Load" ```sql WITH source_data AS ( @@ -1681,62 +1681,39 @@ Generates SQL to build a Status Tracking Satellite table using the provided para FROM DBTVAULT.TEST.STG_CUSTOMER AS a WHERE a.CUSTOMER_PK IS NOT NULL ), - - latest_records AS ( - SELECT b.CUSTOMER_PK, b.LOAD_DATE, b.SOURCE, b.STATUS - FROM ( - SELECT current_records.CUSTOMER_PK, current_records.LOAD_DATE, current_records.SOURCE, current_records.STATUS, - RANK() OVER ( - PARTITION BY current_records.CUSTOMER_PK - ORDER BY current_records.LOAD_DATE DESC - ) AS rank - FROM DBTVAULT.TEST.STS AS current_records - ) AS b - WHERE b.rank = 1 - ), - - records_to_insert AS ( + + records_with_status AS ( SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, - 'I' AS "STATUS" + 'I' AS STATUS FROM source_data AS stage - WHERE NOT EXISTS ( - SELECT 1 - FROM latest_records - WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK - AND latest_records.STATUS != 'D') - ) - - UNION ALL - - SELECT DISTINCT latest_records.CUSTOMER_PK, stage.LOAD_DATE, latest_records.SOURCE, - 'D' AS "STATUS" - FROM source_data AS stage, latest_records - WHERE NOT EXISTS ( - SELECT 1 - FROM source_data AS stage - WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK - AND latest_records.SOURCE IS NOT NULL) - ) + ), - UNION ALL + records_with_status_and_hashdiff AS ( + SELECT d.CUSTOMER_PK, d.LOAD_DATE, d.SOURCE, d.STATUS, + CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF + FROM records_with_status AS d + ), - SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, - 'U' AS "STATUS" - FROM source_data AS stage - WHERE EXISTS ( - SELECT 1 - FROM latest_records - WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK - AND latest_records."STATUS" != 'D') - ) + records_to_insert AS ( + SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, stage.STATUS, stage.STATUS_HASHDIFF + FROM records_with_status_and_hashdiff AS stage ) - - SELECT * FROM records_to_insert + + SELECT * FROM records_to_insert ``` + === "Subsequent Loads" + + === "Google BigQuery" - === "Single-Source" + === "Base Load" + + ```sql + + ``` + + === "Subsequent Loads" ```sql @@ -1744,7 +1721,13 @@ Generates SQL to build a Status Tracking Satellite table using the provided para === "MS SQL Server" - === "Single-Source" + === "Base Load" + + ```sql + + ``` + + === "Subsequent Loads" ```sql From dbef3fbc5e8800ce923e4bf86f71550c201a772a Mon Sep 17 00:00:00 2001 From: Tim Wilson Date: Wed, 20 Jul 2022 16:01:45 +0100 Subject: [PATCH 3/5] WIP sts documentation --- docs/macros.md | 101 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 92 insertions(+), 9 deletions(-) diff --git a/docs/macros.md b/docs/macros.md index 0313ded..4fb1a87 100644 --- a/docs/macros.md +++ b/docs/macros.md @@ -1640,8 +1640,6 @@ ___ ###### view source: [![Snowflake](./assets/images/platform_icons/snowflake.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/snowflake/sts.sql) -[![BigQuery](./assets/images/platform_icons/bigquery.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/bigquery/sts.sql) -[![SQLServer](./assets/images/platform_icons/sqlserver.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/sqlserver/sts.sql) Generates SQL to build a Status Tracking Satellite table using the provided parameters. @@ -1654,13 +1652,14 @@ Generates SQL to build a Status Tracking Satellite table using the provided para #### Parameters -| Parameter | Description | Type | Required? | -|--------------|----------------------------------------------|---------|-----------------------------------------------| -| src_pk | Source primary key column | String | :fontawesome-solid-check-circle:{ .required } | -| src_ldts | Source load date timestamp column | String | :fontawesome-solid-check-circle:{ .required } | -| src_source | Name of the column containing the source ID | String | :fontawesome-solid-check-circle:{ .required } | -| src_status | Source data status column | String | :fontawesome-solid-check-circle:{ .required } | -| source_model | Staging model name | String | :fontawesome-solid-check-circle:{ .required } | +| Parameter | Description | Type | Required? | +|--------------|---------------------------------------------|---------|-----------------------------------------------| +| src_pk | Source primary key column | String | :fontawesome-solid-check-circle:{ .required } | +| src_ldts | Source load date timestamp column | String | :fontawesome-solid-check-circle:{ .required } | +| src_source | Name of the column containing the source ID | String | :fontawesome-solid-check-circle:{ .required } | +| src_status | Source data status column | String | :fontawesome-solid-check-circle:{ .required } | +| src_hashdiff | Alias name for status hashdiff column | String | :fontawesome-solid-check-circle:{ .required } | +| source_model | Staging model name | String | :fontawesome-solid-check-circle:{ .required } | !!! tip [Read the tutorial](tutorial/tut_sts.md) for more details @@ -1704,6 +1703,90 @@ Generates SQL to build a Status Tracking Satellite table using the provided para === "Subsequent Loads" + ```sql + WITH source_data AS ( + SELECT a.CUSTOMER_PK, a.LOAD_DATE, a.SOURCE + FROM DBTVAULT.TEST.STG_CUSTOMER AS a + WHERE a.CUSTOMER_PK IS NOT NULL + ), + + stage_datetime AS ( + SELECT MAX(b.LOAD_DATE) AS LOAD_DATETIME + FROM source_data AS b + ), + + latest_records AS ( + SELECT c.CUSTOMER_PK, c.LOAD_DATE, c.SOURCE, c.STATUS, c.STATUS_HASHDIFF + FROM ( + SELECT current_records.CUSTOMER_PK, current_records.LOAD_DATE, current_records.SOURCE, current_records.STATUS, current_records.STATUS_HASHDIFF, + RANK() OVER ( + PARTITION BY current_records.CUSTOMER_PK + ORDER BY current_records.LOAD_DATE DESC + ) AS rank + FROM DBTVAULT_DEV.TEST_TIM_WILSON.STS AS current_records + ) AS c + WHERE c.rank = 1 + ), + + records_with_status AS ( + SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, + 'I' AS STATUS + FROM source_data AS stage + WHERE NOT EXISTS ( + SELECT 1 + FROM latest_records + WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + AND latest_records.STATUS != 'D') + ) + + UNION ALL + + SELECT DISTINCT latest_records.CUSTOMER_PK, + stage_datetime.LOAD_DATETIME AS LOAD_DATE, + latest_records.SOURCE, + 'D' AS STATUS + FROM latest_records + INNER JOIN stage_datetime + ON 1 = 1 + WHERE NOT EXISTS ( + SELECT 1 + FROM source_data AS stage + WHERE latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + ) + AND latest_records.STATUS != 'D' + AND stage_datetime.LOAD_DATETIME IS NOT NULL + + UNION ALL + + SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, + 'U' AS STATUS + FROM source_data AS stage + WHERE EXISTS ( + SELECT 1 + FROM latest_records + WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + AND latest_records.STATUS != 'D' + AND stage.LOAD_DATE != latest_records.LOAD_DATE) + ) + ), + + records_with_status_and_hashdiff AS ( + SELECT d.CUSTOMER_PK, d.LOAD_DATE, d.SOURCE, d.STATUS, + CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF + FROM records_with_status AS d + ), + + records_to_insert AS ( + SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, stage.STATUS, stage.STATUS_HASHDIFF + FROM records_with_status_and_hashdiff AS stage + LEFT JOIN latest_records + ON latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + WHERE latest_records.STATUS_HASHDIFF != stage.STATUS_HASHDIFF + OR latest_records.STATUS_HASHDIFF IS NULL + ) + + SELECT * FROM records_to_insert + ``` === "Google BigQuery" From 58bb4a80179b1b4f12dcdd2bd10ec065b72cc3e1 Mon Sep 17 00:00:00 2001 From: Tim Wilson Date: Wed, 20 Jul 2022 16:18:18 +0100 Subject: [PATCH 4/5] WIP sts documentation --- docs/macros.md | 40 +++++++++---------- docs/metadata.md | 102 +++++++++++++++++++++-------------------------- 2 files changed, 66 insertions(+), 76 deletions(-) diff --git a/docs/macros.md b/docs/macros.md index 4fb1a87..024857f 100644 --- a/docs/macros.md +++ b/docs/macros.md @@ -1676,25 +1676,25 @@ Generates SQL to build a Status Tracking Satellite table using the provided para ```sql WITH source_data AS ( - SELECT a.CUSTOMER_PK, a.LOAD_DATE, a.SOURCE + SELECT a.CUSTOMER_HK, a.LOAD_DATE, a.RECORD_SOURCE FROM DBTVAULT.TEST.STG_CUSTOMER AS a - WHERE a.CUSTOMER_PK IS NOT NULL + WHERE a.CUSTOMER_HK IS NOT NULL ), records_with_status AS ( - SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, 'I' AS STATUS FROM source_data AS stage ), records_with_status_and_hashdiff AS ( - SELECT d.CUSTOMER_PK, d.LOAD_DATE, d.SOURCE, d.STATUS, + SELECT d.CUSTOMER_HK, d.LOAD_DATE, d.RECORD_SOURCE, d.STATUS, CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF FROM records_with_status AS d ), records_to_insert AS ( - SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, stage.STATUS, stage.STATUS_HASHDIFF + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, stage.STATUS, stage.STATUS_HASHDIFF FROM records_with_status_and_hashdiff AS stage ) @@ -1705,9 +1705,9 @@ Generates SQL to build a Status Tracking Satellite table using the provided para ```sql WITH source_data AS ( - SELECT a.CUSTOMER_PK, a.LOAD_DATE, a.SOURCE + SELECT a.CUSTOMER_HK, a.LOAD_DATE, a.RECORD_SOURCE FROM DBTVAULT.TEST.STG_CUSTOMER AS a - WHERE a.CUSTOMER_PK IS NOT NULL + WHERE a.CUSTOMER_HK IS NOT NULL ), stage_datetime AS ( @@ -1716,11 +1716,11 @@ Generates SQL to build a Status Tracking Satellite table using the provided para ), latest_records AS ( - SELECT c.CUSTOMER_PK, c.LOAD_DATE, c.SOURCE, c.STATUS, c.STATUS_HASHDIFF + SELECT c.CUSTOMER_HK, c.LOAD_DATE, c.RECORD_SOURCE, c.STATUS, c.STATUS_HASHDIFF FROM ( - SELECT current_records.CUSTOMER_PK, current_records.LOAD_DATE, current_records.SOURCE, current_records.STATUS, current_records.STATUS_HASHDIFF, + SELECT current_records.CUSTOMER_HK, current_records.LOAD_DATE, current_records.RECORD_SOURCE, current_records.STATUS, current_records.STATUS_HASHDIFF, RANK() OVER ( - PARTITION BY current_records.CUSTOMER_PK + PARTITION BY current_records.CUSTOMER_HK ORDER BY current_records.LOAD_DATE DESC ) AS rank FROM DBTVAULT_DEV.TEST_TIM_WILSON.STS AS current_records @@ -1729,21 +1729,21 @@ Generates SQL to build a Status Tracking Satellite table using the provided para ), records_with_status AS ( - SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, 'I' AS STATUS FROM source_data AS stage WHERE NOT EXISTS ( SELECT 1 FROM latest_records - WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + WHERE (latest_records.CUSTOMER_HK = stage.CUSTOMER_HK AND latest_records.STATUS != 'D') ) UNION ALL - SELECT DISTINCT latest_records.CUSTOMER_PK, + SELECT DISTINCT latest_records.CUSTOMER_HK, stage_datetime.LOAD_DATETIME AS LOAD_DATE, - latest_records.SOURCE, + latest_records.RECORD_SOURCE, 'D' AS STATUS FROM latest_records INNER JOIN stage_datetime @@ -1751,36 +1751,36 @@ Generates SQL to build a Status Tracking Satellite table using the provided para WHERE NOT EXISTS ( SELECT 1 FROM source_data AS stage - WHERE latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + WHERE latest_records.CUSTOMER_HK = stage.CUSTOMER_HK ) AND latest_records.STATUS != 'D' AND stage_datetime.LOAD_DATETIME IS NOT NULL UNION ALL - SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, 'U' AS STATUS FROM source_data AS stage WHERE EXISTS ( SELECT 1 FROM latest_records - WHERE (latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + WHERE (latest_records.CUSTOMER_HK = stage.CUSTOMER_HK AND latest_records.STATUS != 'D' AND stage.LOAD_DATE != latest_records.LOAD_DATE) ) ), records_with_status_and_hashdiff AS ( - SELECT d.CUSTOMER_PK, d.LOAD_DATE, d.SOURCE, d.STATUS, + SELECT d.CUSTOMER_HK, d.LOAD_DATE, d.RECORD_SOURCE, d.STATUS, CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF FROM records_with_status AS d ), records_to_insert AS ( - SELECT DISTINCT stage.CUSTOMER_PK, stage.LOAD_DATE, stage.SOURCE, stage.STATUS, stage.STATUS_HASHDIFF + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, stage.STATUS, stage.STATUS_HASHDIFF FROM records_with_status_and_hashdiff AS stage LEFT JOIN latest_records - ON latest_records.CUSTOMER_PK = stage.CUSTOMER_PK + ON latest_records.CUSTOMER_HK = stage.CUSTOMER_HK WHERE latest_records.STATUS_HASHDIFF != stage.STATUS_HASHDIFF OR latest_records.STATUS_HASHDIFF IS NULL ) diff --git a/docs/metadata.md b/docs/metadata.md index ba9a726..08ffd06 100644 --- a/docs/metadata.md +++ b/docs/metadata.md @@ -700,62 +700,6 @@ Hashdiff aliasing allows you to set an alias for the `HASHDIFF` column. #### Metadata -=== "Per-model - YAML strings" - - ```jinja - {%- set yaml_metadata -%} - source_model: v_stg_order_customer - src_pk: ORDER_CUSTOMER_HK - src_dfk: - - ORDER_HK - src_sfk: CUSTOMER_HK - src_start_date: START_DATE - src_end_date: END_DATE - src_eff: EFFECTIVE_FROM - src_ldts: LOAD_DATETIME - src_source: RECORD_SOURCE - {%- endset -%} - - {% set metadata_dict = fromyaml(yaml_metadata) %} - - {{ dbtvault.eff_sat(src_pk=metadata_dict["src_pk"], - src_dfk=metadata_dict["src_dfk"], - src_sfk=metadata_dict["src_sfk"], - src_start_date=metadata_dict["src_start_date"], - src_end_date=metadata_dict["src_end_date"], - src_eff=metadata_dict["src_eff"], - src_ldts=metadata_dict["src_ldts"], - src_source=metadata_dict["src_source"], - source_model=metadata_dict["source_model"]) }} - ``` - -=== "Per-Model - Variables" - - ```jinja - {%- set source_model = "v_stg_order_customer" -%} - {%- set src_pk = "ORDER_CUSTOMER_HK" -%} - {%- set src_dfk = ["ORDER_HK"] -%} - {%- set src_sfk = "CUSTOMER_HK" -%} - {%- set src_start_date = "START_DATE" -%} - {%- set src_end_date = "END_DATE" -%} - {%- set src_eff = "EFFECTIVE_FROM" -%} - {%- set src_ldts = "LOAD_DATETIME" -%} - {%- set src_source = "RECORD_SOURCE" -%} - - {{ dbtvault.eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk, - src_start_date=src_start_date, src_end_date=src_end_date, - src_eff=src_eff, src_ldts=src_ldts, src_source=src_source, - source_model=source_model) }} - ``` - -### Status Tracking Satellites - -#### Parameters - -[eff_sat macro parameters](macros.md#eff_sat) - -#### Metadata - === "Per-model - YAML strings" ```jinja @@ -1013,6 +957,52 @@ derived_columns: ___ +### Status Tracking Satellites + +#### Parameters + +[sts macro parameters](macros.md#sts) + +#### Metadata + +=== "Per-model - YAML strings" + + ```jinja + {%- set yaml_metadata -%} + source_model: v_stg_customer + src_pk: CUSTOMER_HK + src_ldts: LOAD_DATE + src_source: RECORD_SOURCE + src_status: STATUS + src_hashdiff: STATUS_HASHDIFF + {%- endset -%} + + {% set metadata_dict = fromyaml(yaml_metadata) %} + + {{ dbtvault.sts(src_pk=metadata_dict["src_pk"], + src_ldts=metadata_dict["src_ldts"], + src_source=metadata_dict["src_source"], + src_status=metadata_dict["src_status"], + src_hashdiff=metadata_dict["src_hashdiff"], + source_model=metadata_dict["source_model"]) }} + ``` + +=== "Per-Model - Variables" + + ```jinja + {%- set source_model = "v_stg_order_customer" -%} + {%- set src_pk = "CUSTOMER_HK" -%} + {%- set src_ldts = "LOAD_DATE" -%} + {%- set src_source = "RECORD_SOURCE" -%} + {%- set src_status = "STATUS" -%} + {%- set src_hashdiff = "STATUS_HASHDIFF" -%} + + {{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, + src_status=src_status, src_hashdiff=src_hashdiff, + source_model=source_model) }} + ``` +___ + ### Point-In-Time (PIT) Tables #### Parameters From efa98eba396be2bca52fe03bc7062cf93fe47716 Mon Sep 17 00:00:00 2001 From: Tim Wilson Date: Wed, 20 Jul 2022 17:08:05 +0100 Subject: [PATCH 5/5] Documentation complete --- docs/macros.md | 336 ++++++++++++++++++--------------------- docs/tutorial/tut_sts.md | 65 ++++---- 2 files changed, 194 insertions(+), 207 deletions(-) diff --git a/docs/macros.md b/docs/macros.md index 024857f..fe10da9 100644 --- a/docs/macros.md +++ b/docs/macros.md @@ -1636,186 +1636,6 @@ use [hashdiff aliasing](best_practices.md#hashdiff-aliasing) ___ -### sts - -###### view source: -[![Snowflake](./assets/images/platform_icons/snowflake.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/snowflake/sts.sql) - -Generates SQL to build a Status Tracking Satellite table using the provided parameters. - -#### Usage - -``` jinja -{{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, - src_status=src_status, source_model=source_model }} -``` - -#### Parameters - -| Parameter | Description | Type | Required? | -|--------------|---------------------------------------------|---------|-----------------------------------------------| -| src_pk | Source primary key column | String | :fontawesome-solid-check-circle:{ .required } | -| src_ldts | Source load date timestamp column | String | :fontawesome-solid-check-circle:{ .required } | -| src_source | Name of the column containing the source ID | String | :fontawesome-solid-check-circle:{ .required } | -| src_status | Source data status column | String | :fontawesome-solid-check-circle:{ .required } | -| src_hashdiff | Alias name for status hashdiff column | String | :fontawesome-solid-check-circle:{ .required } | -| source_model | Staging model name | String | :fontawesome-solid-check-circle:{ .required } | - -!!! tip - [Read the tutorial](tutorial/tut_sts.md) for more details - -#### Example Metadata - -[See examples](metadata.md#status-tracking-satellites) - -#### Example Output - -=== "Snowflake" - - === "Base Load" - - ```sql - WITH source_data AS ( - SELECT a.CUSTOMER_HK, a.LOAD_DATE, a.RECORD_SOURCE - FROM DBTVAULT.TEST.STG_CUSTOMER AS a - WHERE a.CUSTOMER_HK IS NOT NULL - ), - - records_with_status AS ( - SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, - 'I' AS STATUS - FROM source_data AS stage - ), - - records_with_status_and_hashdiff AS ( - SELECT d.CUSTOMER_HK, d.LOAD_DATE, d.RECORD_SOURCE, d.STATUS, - CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF - FROM records_with_status AS d - ), - - records_to_insert AS ( - SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, stage.STATUS, stage.STATUS_HASHDIFF - FROM records_with_status_and_hashdiff AS stage - ) - - SELECT * FROM records_to_insert - ``` - - === "Subsequent Loads" - - ```sql - WITH source_data AS ( - SELECT a.CUSTOMER_HK, a.LOAD_DATE, a.RECORD_SOURCE - FROM DBTVAULT.TEST.STG_CUSTOMER AS a - WHERE a.CUSTOMER_HK IS NOT NULL - ), - - stage_datetime AS ( - SELECT MAX(b.LOAD_DATE) AS LOAD_DATETIME - FROM source_data AS b - ), - - latest_records AS ( - SELECT c.CUSTOMER_HK, c.LOAD_DATE, c.RECORD_SOURCE, c.STATUS, c.STATUS_HASHDIFF - FROM ( - SELECT current_records.CUSTOMER_HK, current_records.LOAD_DATE, current_records.RECORD_SOURCE, current_records.STATUS, current_records.STATUS_HASHDIFF, - RANK() OVER ( - PARTITION BY current_records.CUSTOMER_HK - ORDER BY current_records.LOAD_DATE DESC - ) AS rank - FROM DBTVAULT_DEV.TEST_TIM_WILSON.STS AS current_records - ) AS c - WHERE c.rank = 1 - ), - - records_with_status AS ( - SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, - 'I' AS STATUS - FROM source_data AS stage - WHERE NOT EXISTS ( - SELECT 1 - FROM latest_records - WHERE (latest_records.CUSTOMER_HK = stage.CUSTOMER_HK - AND latest_records.STATUS != 'D') - ) - - UNION ALL - - SELECT DISTINCT latest_records.CUSTOMER_HK, - stage_datetime.LOAD_DATETIME AS LOAD_DATE, - latest_records.RECORD_SOURCE, - 'D' AS STATUS - FROM latest_records - INNER JOIN stage_datetime - ON 1 = 1 - WHERE NOT EXISTS ( - SELECT 1 - FROM source_data AS stage - WHERE latest_records.CUSTOMER_HK = stage.CUSTOMER_HK - ) - AND latest_records.STATUS != 'D' - AND stage_datetime.LOAD_DATETIME IS NOT NULL - - UNION ALL - - SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, - 'U' AS STATUS - FROM source_data AS stage - WHERE EXISTS ( - SELECT 1 - FROM latest_records - WHERE (latest_records.CUSTOMER_HK = stage.CUSTOMER_HK - AND latest_records.STATUS != 'D' - AND stage.LOAD_DATE != latest_records.LOAD_DATE) - ) - ), - - records_with_status_and_hashdiff AS ( - SELECT d.CUSTOMER_HK, d.LOAD_DATE, d.RECORD_SOURCE, d.STATUS, - CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF - FROM records_with_status AS d - ), - - records_to_insert AS ( - SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, stage.STATUS, stage.STATUS_HASHDIFF - FROM records_with_status_and_hashdiff AS stage - LEFT JOIN latest_records - ON latest_records.CUSTOMER_HK = stage.CUSTOMER_HK - WHERE latest_records.STATUS_HASHDIFF != stage.STATUS_HASHDIFF - OR latest_records.STATUS_HASHDIFF IS NULL - ) - - SELECT * FROM records_to_insert - ``` - -=== "Google BigQuery" - - === "Base Load" - - ```sql - - ``` - - === "Subsequent Loads" - - ```sql - - ``` - -=== "MS SQL Server" - - === "Base Load" - - ```sql - - ``` - - === "Subsequent Loads" - - ```sql - - ``` - ### eff_sat ###### view source: @@ -3233,6 +3053,162 @@ Generates SQL to build an Extended Tracking Satellite table using the provided p SELECT * FROM records_to_insert ``` +--- + +### sts + +###### view source: +[![Snowflake](./assets/images/platform_icons/snowflake.png)](https://github.com/Datavault-UK/dbtvault/blob/release/0.8.3/macros/tables/snowflake/sts.sql) + +Generates SQL to build a Status Tracking Satellite table using the provided parameters. + +#### Usage + +``` jinja +{{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, + src_status=src_status, src_hashdiff=src_hashdiff, source_model=source_model }} +``` + +#### Parameters + +| Parameter | Description | Type | Required? | +|--------------|---------------------------------------------|---------|-----------------------------------------------| +| src_pk | Source primary key column | String | :fontawesome-solid-check-circle:{ .required } | +| src_ldts | Source load date timestamp column | String | :fontawesome-solid-check-circle:{ .required } | +| src_source | Name of the column containing the source ID | String | :fontawesome-solid-check-circle:{ .required } | +| src_status | Source data status column | String | :fontawesome-solid-check-circle:{ .required } | +| src_hashdiff | Name of the status hashdiff column | String | :fontawesome-solid-check-circle:{ .required } | +| source_model | Staging model name | String | :fontawesome-solid-check-circle:{ .required } | + +!!! tip + [Read the tutorial](tutorial/tut_sts.md) for more details + +#### Example Metadata + +[See examples](metadata.md#status-tracking-satellites) + +#### Example Output + +=== "Snowflake" + + === "Base Load" + + ```sql + WITH source_data AS ( + SELECT a.CUSTOMER_HK, a.LOAD_DATE, a.RECORD_SOURCE + FROM DBTVAULT.TEST.STG_CUSTOMER AS a + WHERE a.CUSTOMER_HK IS NOT NULL + ), + + records_with_status AS ( + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, + 'I' AS STATUS + FROM source_data AS stage + ), + + records_with_status_and_hashdiff AS ( + SELECT d.CUSTOMER_HK, d.LOAD_DATE, d.RECORD_SOURCE, d.STATUS, + CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF + FROM records_with_status AS d + ), + + records_to_insert AS ( + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, stage.STATUS, stage.STATUS_HASHDIFF + FROM records_with_status_and_hashdiff AS stage + ) + + SELECT * FROM records_to_insert + ``` + + === "Subsequent Loads" + + ```sql + WITH source_data AS ( + SELECT a.CUSTOMER_HK, a.LOAD_DATE, a.RECORD_SOURCE + FROM DBTVAULT.TEST.STG_CUSTOMER AS a + WHERE a.CUSTOMER_HK IS NOT NULL + ), + + stage_datetime AS ( + SELECT MAX(b.LOAD_DATE) AS LOAD_DATETIME + FROM source_data AS b + ), + + latest_records AS ( + SELECT c.CUSTOMER_HK, c.LOAD_DATE, c.RECORD_SOURCE, c.STATUS, c.STATUS_HASHDIFF + FROM ( + SELECT current_records.CUSTOMER_HK, current_records.LOAD_DATE, current_records.RECORD_SOURCE, current_records.STATUS, current_records.STATUS_HASHDIFF, + RANK() OVER ( + PARTITION BY current_records.CUSTOMER_HK + ORDER BY current_records.LOAD_DATE DESC + ) AS rank + FROM DBTVAULT_DEV.TEST_TIM_WILSON.STS AS current_records + ) AS c + WHERE c.rank = 1 + ), + + records_with_status AS ( + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, + 'I' AS STATUS + FROM source_data AS stage + WHERE NOT EXISTS ( + SELECT 1 + FROM latest_records + WHERE (latest_records.CUSTOMER_HK = stage.CUSTOMER_HK + AND latest_records.STATUS != 'D') + ) + + UNION ALL + + SELECT DISTINCT latest_records.CUSTOMER_HK, + stage_datetime.LOAD_DATETIME AS LOAD_DATE, + latest_records.RECORD_SOURCE, + 'D' AS STATUS + FROM latest_records + INNER JOIN stage_datetime + ON 1 = 1 + WHERE NOT EXISTS ( + SELECT 1 + FROM source_data AS stage + WHERE latest_records.CUSTOMER_HK = stage.CUSTOMER_HK + ) + AND latest_records.STATUS != 'D' + AND stage_datetime.LOAD_DATETIME IS NOT NULL + + UNION ALL + + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, + 'U' AS STATUS + FROM source_data AS stage + WHERE EXISTS ( + SELECT 1 + FROM latest_records + WHERE (latest_records.CUSTOMER_HK = stage.CUSTOMER_HK + AND latest_records.STATUS != 'D' + AND stage.LOAD_DATE != latest_records.LOAD_DATE) + ) + ), + + records_with_status_and_hashdiff AS ( + SELECT d.CUSTOMER_HK, d.LOAD_DATE, d.RECORD_SOURCE, d.STATUS, + CAST((MD5_BINARY(NULLIF(UPPER(TRIM(CAST(STATUS AS VARCHAR))), ''))) AS BINARY(16)) AS STATUS_HASHDIFF + FROM records_with_status AS d + ), + + records_to_insert AS ( + SELECT DISTINCT stage.CUSTOMER_HK, stage.LOAD_DATE, stage.RECORD_SOURCE, stage.STATUS, stage.STATUS_HASHDIFF + FROM records_with_status_and_hashdiff AS stage + LEFT JOIN latest_records + ON latest_records.CUSTOMER_HK = stage.CUSTOMER_HK + WHERE latest_records.STATUS_HASHDIFF != stage.STATUS_HASHDIFF + OR latest_records.STATUS_HASHDIFF IS NULL + ) + + SELECT * FROM records_to_insert + ``` + +--- + ### pit ###### view source: diff --git a/docs/tutorial/tut_sts.md b/docs/tutorial/tut_sts.md index e0a50ae..8f9dfaf 100644 --- a/docs/tutorial/tut_sts.md +++ b/docs/tutorial/tut_sts.md @@ -1,13 +1,21 @@ -> OVERVIEW OF STRUCTURE HERE - REPLACE ME -Status Tracking Satellite theory -A Status Tracking Satellite (STS) ... +Status tracking satellites +optional Data Vault 2.0 entity +can be maintained where there is no change data capture (CDC) system operating on the source +to track the status of the source business entity +for instance data with deleted status can be excluded from downstream business reporting + +!!! Note + Unlike other raw vault loads the source data provided must be a full snapshot of the source entity's business keys + that has been staged in the normal manner to add the primary key, load date and record source + + ### Structure -A Status Tracking Satellite contains: +In general, Status Tracking Satellites consist of 5 columns, described below. #### Primary Key (src_pk) A primary key (or surrogate key) which is usually a hashed representation of the natural key. -For a Status Tracking Satellite, this should be the same as the corresponding link's PK. +For a Status Tracking Satellite, this should be the same as the corresponding Hub's PK. #### Load date (src_ldts) A load date or load date timestamp. This identifies when the record was first loaded into the database. @@ -17,10 +25,10 @@ The source for the record. This can be a code which is assigned to a source name or a string directly naming the source system. #### Status (src_status) -Status of the record, can have value Insert(I), Update(U), Delete(D) -Insert: -Update: -Delete: +The status of the record calculated during processing, can be one of the values Insert(I), Update(U), or Delete(D). + +#### Hashdiff (src_hashdiff) +This is the name of the column that will store the hash of the record status value calculated during processing. ### Creating Status Tracking Satellite models @@ -30,7 +38,7 @@ Create a new dbt model as before. We'll call this one `sts_customer`. ```jinja {{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, - src_status=src_status, source_model=source_model }} + src_status=src_status, src_hashdiff=src_hashdiff, source_model=source_model }} ``` To create an STS model, we simply copy and paste the above template into a model named after the STS we @@ -38,7 +46,8 @@ are creating. dbtvault will generate an STS using parameters provided in the nex #### Materialisation -The recommended materialisation for **Status Tracking Satellites** is `incremental`, as we load and add new records to the existing data set. +The materialisation for **Status Tracking Satellites** must be `incremental`, as we only load and add new records +to the existing data set for a single point in time. ### Adding the metadata @@ -49,13 +58,14 @@ We provide the column names which we would like to select from the staging area Using our [knowledge](#structure) of what columns we need in our `sts_customer` STS, we can identify columns in our staging layer which map to them: -| Parameter | Value | -|--------------|----------------| -| source_model | v_stg_customer | -| src_pk | CUSTOMER_PK | -| src_ldts | LOAD_DATE | -| src_source | SOURCE | -| src_status | STATUS | +| Parameter | Value | +|--------------|-----------------| +| source_model | v_stg_customer | +| src_pk | CUSTOMER_HK | +| src_ldts | LOAD_DATE | +| src_source | RECORD_SOURCE | +| src_status | STATUS | +| src_hashdiff | STATUS_HASDHIFF | When we provide the metadata above, our model should look like the following: @@ -65,13 +75,14 @@ When we provide the metadata above, our model should look like the following: {{ config(materialized='incremental') }} {%- set source_model = "v_stg_customer" -%} - {%- set src_pk = "CUSTOMER_PK" -%} + {%- set src_pk = "CUSTOMER_HK" -%} {%- set src_ldts = "LOAD_DATE" -%} - {%- set src_source = "SOURCE" -%} + {%- set src_source = "RECORD_SOURCE" -%} {%- set src_status = "STATUS" -%} + {%- set src_hashdiff = "STATUS_HASHDIFF" -%} {{ dbtvault.sts(src_pk=src_pk, src_ldts=src_ldts, src_source=src_source, - src_status=src_status, + src_status=src_status, src_hashdiff=src_hashdiff, source_model=source_model }} ``` @@ -89,11 +100,11 @@ With our metadata provided and our model complete, we can run dbt to create our The resulting Status Tracking Satellite will look like this: -| CUSTOMER_PK | LOAD_DATE | SOURCE | STATUS | -|-------------|-------------|--------|--------| -| B8C37E... | 1993-01-01 | * | I | -| . | . | . | . | -| . | . | . | . | -| FED333... | 1993-01-01 | * | U | +| CUSTOMER_PK | LOAD_DATE | SOURCE | STATUS | STATUS_HASHDIFF | +|-------------|-------------|--------|--------|-----------------| +| B8C37E... | 1993-01-01 | * | I | DD7536... | +| . | . | . | . | . | +| . | . | . | . | . | +| FED333... | 1993-01-01 | * | U | 4C6143... | --8<-- "includes/abbreviations.md" \ No newline at end of file