Skip to content

Commit 28ac2b2

Browse files
authored
Merge pull request #1151 from janmatzek/SVS-1245-add-custom-fields-to-gooddata-pipelines
2 parents 880b8de + 67d1d9b commit 28ac2b2

22 files changed

+2160
-1
lines changed
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
---
2+
title: "LDM Extension"
3+
linkTitle: "LDM Extension"
4+
weight: 3
5+
no_list: true
6+
---
7+
8+
Child workspaces inherit [Logical Data Model](https://www.gooddata.com/docs/cloud/model-data/concepts/logical-data-model/) (LDM) from their parent. You can use GoodData Pipelines to extend child workspace's LDM with extra datasets specific to the tenant requirements.
9+
10+
{{% alert color="info" %}} See [Set Up Multiple Tenants](https://www.gooddata.com/docs/cloud/workspaces/) to learn more about leveraging multitenancy in GoodData.{{% /alert %}}
11+
12+
This documentation operates with terms like *custom datasets* and *custom fields*. Within this context, *custom* refers to extension of the LDM beyond inherited datasets.
13+
14+
## Usage
15+
16+
Start by initializing the LdmExtensionManager:
17+
18+
```python
19+
from gooddata_pipelines import LdmExtensionManager
20+
21+
host = "http://localhost:3000"
22+
token = "some_user_token"
23+
24+
ldm_extension_manager = LdmExtensionManager.create(host=host, token=token)
25+
26+
```
27+
28+
To extend the LDM, you need to define the custom datasets and the fields they should contain. The script also checks the validity of analytical objects before and after the update. Updates introducing new invalid relations are automatically rolled back. You can opt out of this behavior by setting the `check_relations` parameter to False.
29+
30+
### Custom Dataset Definitions
31+
32+
The custom dataset represents a new dataset appended to the child LDM. It is defined by the following parameters:
33+
34+
| name | type | description |
35+
|------|------|-------------|
36+
| workspace_id | string | ID of the child workspace. |
37+
| dataset_id | string | ID of the custom dataset. |
38+
| dataset_name | string | Name of the custom dataset. |
39+
| dataset_datasource_id | string | ID of the data source. |
40+
| dataset_source_table | string | Name of the table in the Physical Data Model. |
41+
| dataset_source_sql | string \| None | SQL query defining the dataset. |
42+
| parent_dataset_reference | string \| None | ID of the parent dataset to which the custom one will be connected. |
43+
| parent_dataset_reference_attribute_id | string | ID of the attribute used for creating the relationship in the parent dataset. |
44+
| dataset_reference_source_column | string | Name of the column used for creating the relationship in the custom dataset. |
45+
| dataset_reference_source_column_data_type | [ColumnDataType](#columndatatype) | Column data type. |
46+
| workspace_data_filter_id | string | ID of the workspace data filter to use. |
47+
| workspace_data_filter_column_name | string | Name of the column in custom dataset used for filtering. |
48+
49+
#### Validity constraints
50+
51+
Either `dataset_source_table` or `dataset_source_sql` must be specified with a truthy value, but not both. An exception will be raised if both parameters are falsy or if both have truthy values.
52+
53+
### Custom Field Definitions
54+
55+
The custom fields define the individual fields in the custom datasets defined above. Each custom field needs to be specified with the following parameters:
56+
57+
| name | type | description |
58+
|---------------|----------|-----------------|
59+
| workspace_id | string | ID of the child workspace. |
60+
| dataset_id | string | ID of the custom dataset. |
61+
| custom_field_id | string | ID of the custom field. |
62+
| custom_field_name | string | Name of the custom field. |
63+
| custom_field_type | [CustomFieldType](#customfieldtype) | Indicates whether the field represents an attribute, a date, or a fact. |
64+
| custom_field_source_column | string | Name of the column in the physical data model. |
65+
| custom_field_source_column_data_type | [ColumnDataType](#columndatatype) | Data type of the field. |
66+
67+
#### Validity constraints
68+
69+
The custom field definitions must comply with the following criteria:
70+
71+
- Each attribute and fact must have a unique combination of `workspace_id` and `custom_field_id` values.
72+
- Each date must have a unique combination of `dataset_id` and `custom_field_id` values.
73+
74+
### Enumerations
75+
76+
Some parameters of custom fields' and datasets' definitions are specified via CustomFieldType and ColumnDataType enums.
77+
78+
#### CustomFieldType
79+
80+
The following field types are supported:
81+
82+
| name | value |
83+
|------|-------|
84+
| ATTRIBUTE | "attribute" |
85+
| FACT | "fact" |
86+
| DATE | "date" |
87+
88+
#### ColumnDataType
89+
90+
The following data types are supported:
91+
92+
| name | value |
93+
|------|-------|
94+
| INT | "INT" |
95+
| STRING | "STRING" |
96+
| DATE | "DATE" |
97+
| NUMERIC | "NUMERIC" |
98+
| TIMESTAMP | "TIMESTAMP" |
99+
| TIMESTAMP_TZ | "TIMESTAMP_TZ" |
100+
| BOOLEAN | "BOOLEAN" |
101+
102+
### Relations Check
103+
104+
As changes to the LDM may impact existing analytical objects, the script will perform checks to prevent potentially undesirable changes.
105+
106+
{{% alert color="warning" %}} Changes to the LDM can invalidate existing objects. For example, removing a previously added custom field will break any analytical objects using that field. {{% /alert %}}
107+
108+
To prevent this, the script will:
109+
110+
1. Store current workspace layout (analytical objects and LDM).
111+
1. Check whether relations of metrics, visualizations and dashboards are valid. A set of current objects with invalid relations is created.
112+
1. Push the updated LDM to GoodData Cloud.
113+
1. Check object relations again. New set of objects with invalid relations is created.
114+
1. The sets are compared:
115+
- If the new set is a subset of the old one, the update is considered successful.
116+
- Otherwise, the update is rolled back. The initially stored workspace layout will be pushed to GoodData Cloud again, reverting changes to the workspace.
117+
118+
You can opt out of this check and rollback behavior by setting `check_relations` parameter to `False` when using the LdmExtensionManager.
119+
120+
```python
121+
# By setting the `check_relations` to False, you will bypass the default checks
122+
# and rollback mechanism. Note that this may invalidate existing objects.
123+
ldm_extension_manager.process(
124+
custom_datasets=custom_dataset_definitions,
125+
custom_fields=custom_field_definitions,
126+
check_relations=False,
127+
)
128+
129+
```
130+
131+
## Example
132+
133+
Here is a complete example of extending a child workspace's LDM:
134+
135+
```python
136+
from gooddata_pipelines import (
137+
ColumnDataType,
138+
CustomDatasetDefinition,
139+
CustomFieldDefinition,
140+
CustomFieldType,
141+
LdmExtensionManager,
142+
)
143+
144+
import logging
145+
146+
logging.basicConfig(level=logging.INFO)
147+
logger = logging.getLogger(__name__)
148+
149+
host = "http://localhost:3000"
150+
token = "some_user_token"
151+
152+
# Initialize the manager
153+
ldm_extension_manager = LdmExtensionManager.create(host=host, token=token)
154+
155+
# Optionally, you can subscribe to the logger object to receive log messages
156+
ldm_extension_manager.logger.subscribe(logger)
157+
158+
# Prepare the definitions
159+
custom_dataset_definitions = [
160+
CustomDatasetDefinition(
161+
workspace_id="child_workspace_id",
162+
dataset_id="products_custom_dataset_id",
163+
dataset_name="Custom Products Dataset",
164+
dataset_datasource_id="gdc_datasource_id",
165+
dataset_source_table="products_custom",
166+
dataset_source_sql=None,
167+
parent_dataset_reference="products",
168+
parent_dataset_reference_attribute_id="products.product_id",
169+
dataset_reference_source_column="product_id",
170+
dataset_reference_source_column_data_type=ColumnDataType.INT,
171+
workspace_data_filter_id="wdf_id",
172+
workspace_data_filter_column_name="wdf_column",
173+
)
174+
]
175+
176+
custom_field_definitions = [
177+
CustomFieldDefinition(
178+
workspace_id="child_workspace_id",
179+
dataset_id="products_custom_dataset_id",
180+
custom_field_id="is_sold_out",
181+
custom_field_name="Sold Out",
182+
custom_field_type=CustomFieldType.ATTRIBUTE,
183+
custom_field_source_column="is_sold_out",
184+
custom_field_source_column_data_type=ColumnDataType.BOOLEAN,
185+
),
186+
CustomFieldDefinition(
187+
workspace_id="child_workspace_id",
188+
dataset_id="products_custom_dataset_id",
189+
custom_field_id="category_detail",
190+
custom_field_name="Category (Detail)",
191+
custom_field_type=CustomFieldType.ATTRIBUTE,
192+
custom_field_source_column="category_detail",
193+
custom_field_source_column_data_type=ColumnDataType.STRING,
194+
),
195+
]
196+
197+
# Call the process method to extend the LDM
198+
ldm_extension_manager.process(
199+
custom_datasets=custom_dataset_definitions,
200+
custom_fields=custom_field_definitions,
201+
)
202+
203+
```

gooddata-pipelines/README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,12 @@ full_load_data: list[UserFullLoad] = UserFullLoad.from_list_of_dicts(
5757
provisioner.full_load(full_load_data)
5858
```
5959

60-
Ready-made scripts covering the basic use cases can be found here in the [GoodData Productivity Tools](https://github.com/gooddata/gooddata-productivity-tools) repository
60+
## Bugs & Requests
61+
62+
Please use the [GitHub issue tracker](https://github.com/gooddata/gooddata-python-sdk/issues) to submit bugs
63+
or request features.
64+
65+
## Changelog
66+
67+
See [Github releases](https://github.com/gooddata/gooddata-python-sdk/releases) for released versions
68+
and a list of changes.

gooddata-pipelines/gooddata_pipelines/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,15 @@
1313
from .backup_and_restore.storage.local_storage import LocalStorage
1414
from .backup_and_restore.storage.s3_storage import S3Storage
1515

16+
# -------- LDM Extension --------
17+
from .ldm_extension.ldm_extension_manager import LdmExtensionManager
18+
from .ldm_extension.models.custom_data_object import (
19+
ColumnDataType,
20+
CustomDatasetDefinition,
21+
CustomFieldDefinition,
22+
CustomFieldType,
23+
)
24+
1625
# -------- Provisioning --------
1726
from .provisioning.entities.user_data_filters.models.udf_models import (
1827
UserDataFilterFullLoad,
@@ -65,5 +74,10 @@
6574
"UserDataFilterProvisioner",
6675
"UserDataFilterFullLoad",
6776
"EntityType",
77+
"LdmExtensionManager",
78+
"CustomDatasetDefinition",
79+
"CustomFieldDefinition",
80+
"ColumnDataType",
81+
"CustomFieldType",
6882
"__version__",
6983
]

gooddata-pipelines/gooddata_pipelines/api/gooddata_api.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,44 @@ def get_automations(self, workspace_id: str) -> requests.Response:
174174
)
175175
return self._get(endpoint)
176176

177+
def get_all_metrics(self, workspace_id: str) -> requests.Response:
178+
"""Get all metrics from the specified workspace.
179+
180+
Args:
181+
workspace_id (str): The ID of the workspace to retrieve metrics from.
182+
Returns:
183+
requests.Response: The response containing the metrics.
184+
"""
185+
endpoint = f"/entities/workspaces/{workspace_id}/metrics"
186+
headers = {**self.headers, "X-GDC-VALIDATE-RELATIONS": "true"}
187+
return self._get(endpoint, headers=headers)
188+
189+
def get_all_visualization_objects(
190+
self, workspace_id: str
191+
) -> requests.Response:
192+
"""Get all visualizations from the specified workspace.
193+
194+
Args:
195+
workspace_id (str): The ID of the workspace to retrieve visualizations from.
196+
Returns:
197+
requests.Response: The response containing the visualizations.
198+
"""
199+
endpoint = f"/entities/workspaces/{workspace_id}/visualizationObjects"
200+
headers = {**self.headers, "X-GDC-VALIDATE-RELATIONS": "true"}
201+
return self._get(endpoint, headers=headers)
202+
203+
def get_all_dashboards(self, workspace_id: str) -> requests.Response:
204+
"""Get all dashboards from the specified workspace.
205+
206+
Args:
207+
workspace_id (str): The ID of the workspace to retrieve dashboards from.
208+
Returns:
209+
requests.Response: The response containing the dashboards.
210+
"""
211+
endpoint = f"/entities/workspaces/{workspace_id}/analyticalDashboards"
212+
headers = {**self.headers, "X-GDC-VALIDATE-RELATIONS": "true"}
213+
return self._get(endpoint, headers=headers)
214+
177215
def _get(
178216
self, endpoint: str, headers: dict[str, str] | None = None
179217
) -> requests.Response:
@@ -253,3 +291,15 @@ def _delete(
253291
url = self._get_url(endpoint)
254292

255293
return requests.delete(url, headers=self.headers, timeout=TIMEOUT)
294+
295+
@staticmethod
296+
def raise_if_response_not_ok(*responses: requests.Response) -> None:
297+
"""Check if responses from API calls are OK.
298+
299+
Raises ValueError if any response is not OK (status code not 2xx).
300+
"""
301+
for response in responses:
302+
if not response.ok:
303+
raise ValueError(
304+
f"Request to {response.url} failed with status code {response.status_code}: {response.text}"
305+
)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# (C) 2025 GoodData Corporation

0 commit comments

Comments
 (0)