-
Notifications
You must be signed in to change notification settings - Fork 37
IFC-1405: Update restricted words for generics attributes and relationships #7287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughThe changes introduce several improvements to the schema handling and validation process:
Overall, these changes improve the schema handling and validation process, particularly when dealing with hierarchical data structures and schemas loaded from the API. Alterations to the declarations of exported or public entities
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (2)backend/tests/**/*📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
backend/**/*📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CodSpeed Performance ReportMerging #7287 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
backend/infrahub/core/constants/__init__.py (1)
42-47: Make reserved names immutable (tuple) and consider export clarityUse a tuple for immutability and type clarity. Optionally add to all if you rely on star-imports.
-RESERVED_ATTR_REL_HIERARCHICAL_NAMES = [ - "parent", - "children", - "ancestors", - "descendants", -] +RESERVED_ATTR_REL_HIERARCHICAL_NAMES: tuple[str, ...] = ( + "parent", + "children", + "ancestors", + "descendants", +)backend/tests/unit/core/schema_manager/test_manager_schema.py (2)
701-788: Nice parametric table for name validationsGood refactor into a dataclass + table-driven cases; improves coverage and readability. Consider freezing the dataclass to avoid accidental mutation in future edits.
-@dataclass +@dataclass(frozen=True) class SchemaBranchValidateNamesTestCaseData:
805-873: Add positive control and extension/update scenarios for hierarchical restrictionsTests correctly fail on reserved names when loading_from_api=True for both relationships and attributes. Please add:
- A positive control where the auto-generated 'parent'/'children' (kind=Hierarchy) on hierarchical nodes do not fail when not using loading_from_api.
- A case where a reserved relationship name is added via SchemaRoot.extensions.nodes to an existing hierarchical node to ensure it is also rejected.
- A case updating an existing node (already present in the branch) via API to add a reserved relationship name, ensuring it's caught.
I can draft these tests if helpful.
Also applies to: 874-931
backend/infrahub/core/schema/schema_branch.py (1)
469-474: Docstring: document loading_from_api semanticsPlease add an Args section documenting loading_from_api and its effect (marking API-sourced relationships for validation).
backend/infrahub/core/schema/relationship_schema.py (1)
26-35: Duplicate retains PrivateAttr; add accessor docstrings
model_copy(deep=True) (invoked by duplicate()) preserves private attributes by design (docs.pydantic.dev). Add brief docstrings to the loaded_from_api getter and setter for clarity:@property def loaded_from_api(self) -> bool: + """True when this relationship schema originated from API input.""" return self._loaded_from_api @loaded_from_api.setter def loaded_from_api(self, value: bool) -> None: + """Mark this schema as loaded from API to adjust validation logic.""" self._loaded_from_api = value
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/infrahub/api/schema.py(1 hunks)backend/infrahub/core/constants/__init__.py(1 hunks)backend/infrahub/core/schema/relationship_schema.py(2 hunks)backend/infrahub/core/schema/schema_branch.py(5 hunks)backend/tests/unit/core/schema_manager/test_manager_schema.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
backend/infrahub/api/schema.pybackend/infrahub/core/constants/__init__.pybackend/infrahub/core/schema/schema_branch.pybackend/tests/unit/core/schema_manager/test_manager_schema.pybackend/infrahub/core/schema/relationship_schema.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful
**/*.py: Use type hints for all function parameters and return values in Python code
Prefer asynchronous implementations when feasible (use Async whenever possible)
Define asynchronous functions withasync def
Useawaitfor asynchronous calls
Use Pydantic models instead of standard library dataclasses
All Python functions must include a docstring
Use Ruff and MyPy for linting and type checkingUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Useasync deffor asynchronous functions
Useawaitfor asynchronous calls within async functions
Use Pydantic models instead of dataclasses for data modeling
All Python functions must have docstrings
Docstrings must use triple quotes (""")
Follow Google-style docstring format
Docstrings should include, when applicable: brief description, detailed description, Args/Parameters (without types), Returns, Raises, and Examples
Validate and lint Python with ruff and mypy
Format Python code using the project’s formatter (invoke format)
Files:
backend/infrahub/api/schema.pybackend/infrahub/core/constants/__init__.pybackend/infrahub/core/schema/schema_branch.pybackend/tests/unit/core/schema_manager/test_manager_schema.pybackend/infrahub/core/schema/relationship_schema.py
backend/tests/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Place backend tests in
backend/tests/
Files:
backend/tests/unit/core/schema_manager/test_manager_schema.py
🧬 Code graph analysis (3)
backend/infrahub/api/schema.py (1)
backend/infrahub/core/schema/schema_branch.py (1)
load_schema(469-502)
backend/infrahub/core/schema/schema_branch.py (3)
backend/infrahub/core/schema/relationship_schema.py (2)
loaded_from_api(29-30)loaded_from_api(33-34)backend/infrahub/core/schema/generic_schema.py (1)
GenericSchema(14-54)backend/infrahub/core/schema/node_schema.py (1)
NodeSchema(15-147)
backend/tests/unit/core/schema_manager/test_manager_schema.py (2)
backend/infrahub/core/schema/schema_branch.py (2)
load_schema(469-502)validate_names(975-996)backend/infrahub/core/schema/__init__.py (1)
SchemaRoot(47-99)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: E2E-testing-playwright
- GitHub Check: backend-benchmark
- GitHub Check: backend-tests-functional
- GitHub Check: backend-tests-unit
- GitHub Check: validate-generated-documentation
- GitHub Check: backend-docker-integration
- GitHub Check: backend-tests-integration
🔇 Additional comments (4)
backend/infrahub/api/schema.py (1)
159-160: LGTM: propagate API-origin to schema loadPassing loading_from_api=True ensures downstream validations can distinguish API-sourced relationships.
backend/tests/unit/core/schema_manager/test_manager_schema.py (2)
5-7: LGTM: test dependencies and constants importImports look correct and scoped.
Also applies to: 16-17
791-803: LGTM: validates duplicate and reserved names via tableSolid targeted assertion with clear messages.
backend/infrahub/core/schema/schema_branch.py (1)
998-1019: LGTM: targeted hierarchical-name enforcementThe logic cleanly scopes enforcement to hierarchical nodes and only API-sourced relationships, avoiding collisions with system-generated hierarchy edges.
Run the following to list all uses of reserved names and manually confirm each is paired with
kind='Hierarchy':#!/bin/bash rg -n --type=py "name\s*=\s*['\"](parent|children|ancestors|descendants)['\"]" -C5Please review the results to ensure none occur under a non-
Hierarchykind.
| if loading_from_api: | ||
| for relationship in item.relationships: | ||
| relationship.loaded_from_api = True | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: loaded_from_api set on the wrong object; extensions not flagged
Currently, you set relationship.loaded_from_api on item (incoming), but for updates you store new_item in the cache, so the flag isn’t present on the stored relationships. Also, relationships introduced via schema.extensions.nodes are never flagged, allowing a bypass.
Fix by marking the relationships on the stored object, limited to those provided by the incoming payload; and handle extensions similarly.
for item in schema.nodes + schema.generics:
try:
if item.id:
new_item = self.get_by_id(id=item.id)
if new_item.kind != item.kind:
self.delete(name=new_item.kind)
else:
new_item = self.get(name=item.kind)
if (new_item.is_node_schema and not item.is_node_schema) or (
new_item.is_generic_schema and not item.is_generic_schema
):
current_node_type = "Node" if new_item.is_node_schema else "Generic"
raise ValidationError(
f"{item.kind} already exist in the schema as a {current_node_type}. Either rename it or delete the existing one."
)
- new_item.update(item)
- self.set(name=item.kind, schema=new_item)
+ new_item.update(item)
+ if loading_from_api:
+ incoming_rel_names = {r.name for r in item.relationships}
+ for rel in new_item.relationships:
+ if rel.name in incoming_rel_names:
+ rel.loaded_from_api = True
+ self.set(name=item.kind, schema=new_item)
except SchemaNotFoundError:
- self.set(name=item.kind, schema=item)
-
- if loading_from_api:
- for relationship in item.relationships:
- relationship.loaded_from_api = True
+ if loading_from_api:
+ for relationship in item.relationships:
+ relationship.loaded_from_api = True
+ self.set(name=item.kind, schema=item)
for node_extension in schema.extensions.nodes:
new_item = self.get(name=node_extension.kind)
- new_item.update(node_extension)
- self.set(name=node_extension.kind, schema=new_item)
+ new_item.update(node_extension)
+ if loading_from_api and node_extension.relationships:
+ ext_rel_names = {r.name for r in node_extension.relationships}
+ for rel in new_item.relationships:
+ if rel.name in ext_rel_names:
+ rel.loaded_from_api = True
+ self.set(name=node_extension.kind, schema=new_item)Also applies to: 499-503
🤖 Prompt for AI Agents
In backend/infrahub/core/schema/schema_branch.py around lines 495-498 (and
similarly 499-503), the code currently sets loaded_from_api on the incoming
item.relationships which is never the object stored in the cache; change it to
set loaded_from_api on the relationships of the stored object (new_item)
instead, and only for relationships that were present in the incoming payload
(match by relationship name or id from the payload). Additionally, detect any
relationships added via schema.extensions.nodes and mark those corresponding
relationships on the stored object as loaded_from_api as well so extensions
cannot bypass the flagging.
47bf95e to
60e5619
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/tests/fixtures/schemas/restricted_names_01.json(1 hunks)backend/tests/unit/api/test_40_schema.py(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- backend/tests/fixtures/schemas/restricted_names_01.json
🧰 Additional context used
📓 Path-based instructions (3)
backend/tests/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Place backend tests in
backend/tests/
Files:
backend/tests/unit/api/test_40_schema.py
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
backend/tests/unit/api/test_40_schema.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful
**/*.py: Use type hints for all function parameters and return values in Python code
Prefer asynchronous implementations when feasible (use Async whenever possible)
Define asynchronous functions withasync def
Useawaitfor asynchronous calls
Use Pydantic models instead of standard library dataclasses
All Python functions must include a docstring
Use Ruff and MyPy for linting and type checkingUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Useasync deffor asynchronous functions
Useawaitfor asynchronous calls within async functions
Use Pydantic models instead of dataclasses for data modeling
All Python functions must have docstrings
Docstrings must use triple quotes (""")
Follow Google-style docstring format
Docstrings should include, when applicable: brief description, detailed description, Args/Parameters (without types), Returns, Raises, and Examples
Validate and lint Python with ruff and mypy
Format Python code using the project’s formatter (invoke format)
Files:
backend/tests/unit/api/test_40_schema.py
🧬 Code graph analysis (1)
backend/tests/unit/api/test_40_schema.py (2)
backend/tests/unit/api/conftest.py (1)
admin_headers(30-31)backend/tests/conftest.py (1)
schema_file(1063-1067)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: backend-benchmark
- GitHub Check: E2E-testing-playwright
- GitHub Check: validate-generated-documentation
- GitHub Check: backend-docker-integration
- GitHub Check: backend-tests-functional
- GitHub Check: backend-tests-integration
- GitHub Check: backend-tests-unit
- GitHub Check: scan
- GitHub Check: Cloudflare Pages
🔇 Additional comments (1)
backend/tests/unit/api/test_40_schema.py (1)
7-7: LGTM!The import correctly adds the new constant needed for the parameterized test.
| @pytest.mark.parametrize( | ||
| "reserved_name", | ||
| [pytest.param(reserved_name, id=reserved_name) for reserved_name in RESERVED_ATTR_REL_HIERARCHICAL_NAMES], | ||
| ) | ||
| async def test_schema_load_restricted_names( | ||
| db: InfrahubDatabase, | ||
| client: TestClient, | ||
| admin_headers, | ||
| default_branch: Branch, | ||
| prefect_test_fixture, | ||
| workflow_local, | ||
| authentication_base, | ||
| helper, | ||
| reserved_name, | ||
| ): | ||
| schema = helper.schema_file("restricted_names_01.json") | ||
| schema["nodes"][1]["relationships"][0]["name"] = reserved_name | ||
| with client: | ||
| response = client.post( | ||
| "/api/schema/load", | ||
| headers=admin_headers, | ||
| json={"schemas": [schema]}, | ||
| ) | ||
|
|
||
| assert response.status_code == 422 | ||
| assert ( | ||
| response.json()["errors"][0]["message"] | ||
| == f"TestingParent: {reserved_name} isn't allowed as a relationship name on hierarchical nodes." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add a docstring to the test function.
The test logic and assertions are correct, but the function is missing a required docstring. All Python functions must include a docstring in Google-style format.
As per coding guidelines.
Apply this diff to add a docstring:
@pytest.mark.parametrize(
"reserved_name",
[pytest.param(reserved_name, id=reserved_name) for reserved_name in RESERVED_ATTR_REL_HIERARCHICAL_NAMES],
)
async def test_schema_load_restricted_names(
db: InfrahubDatabase,
client: TestClient,
admin_headers,
default_branch: Branch,
prefect_test_fixture,
workflow_local,
authentication_base,
helper,
reserved_name,
):
+ """Test that loading a schema with reserved hierarchical names fails with proper error.
+
+ Verifies that relationship names matching RESERVED_ATTR_REL_HIERARCHICAL_NAMES
+ (parent, children, ancestors, descendants) are rejected for hierarchical nodes
+ when loading schemas via the API.
+ """
schema = helper.schema_file("restricted_names_01.json")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.parametrize( | |
| "reserved_name", | |
| [pytest.param(reserved_name, id=reserved_name) for reserved_name in RESERVED_ATTR_REL_HIERARCHICAL_NAMES], | |
| ) | |
| async def test_schema_load_restricted_names( | |
| db: InfrahubDatabase, | |
| client: TestClient, | |
| admin_headers, | |
| default_branch: Branch, | |
| prefect_test_fixture, | |
| workflow_local, | |
| authentication_base, | |
| helper, | |
| reserved_name, | |
| ): | |
| schema = helper.schema_file("restricted_names_01.json") | |
| schema["nodes"][1]["relationships"][0]["name"] = reserved_name | |
| with client: | |
| response = client.post( | |
| "/api/schema/load", | |
| headers=admin_headers, | |
| json={"schemas": [schema]}, | |
| ) | |
| assert response.status_code == 422 | |
| assert ( | |
| response.json()["errors"][0]["message"] | |
| == f"TestingParent: {reserved_name} isn't allowed as a relationship name on hierarchical nodes." | |
| ) | |
| @pytest.mark.parametrize( | |
| "reserved_name", | |
| [pytest.param(reserved_name, id=reserved_name) for reserved_name in RESERVED_ATTR_REL_HIERARCHICAL_NAMES], | |
| ) | |
| async def test_schema_load_restricted_names( | |
| db: InfrahubDatabase, | |
| client: TestClient, | |
| admin_headers, | |
| default_branch: Branch, | |
| prefect_test_fixture, | |
| workflow_local, | |
| authentication_base, | |
| helper, | |
| reserved_name, | |
| ): | |
| """Test that loading a schema with reserved hierarchical names fails with proper error. | |
| Verifies that relationship names matching RESERVED_ATTR_REL_HIERARCHICAL_NAMES | |
| (parent, children, ancestors, descendants) are rejected for hierarchical nodes | |
| when loading schemas via the API. | |
| """ | |
| schema = helper.schema_file("restricted_names_01.json") | |
| schema["nodes"][1]["relationships"][0]["name"] = reserved_name | |
| with client: | |
| response = client.post( | |
| "/api/schema/load", | |
| headers=admin_headers, | |
| json={"schemas": [schema]}, | |
| ) | |
| assert response.status_code == 422 | |
| assert ( | |
| response.json()["errors"][0]["message"] | |
| == f"TestingParent: {reserved_name} isn't allowed as a relationship name on hierarchical nodes." | |
| ) |
🤖 Prompt for AI Agents
In backend/tests/unit/api/test_40_schema.py around lines 448 to 476, the test
function test_schema_load_restricted_names is missing the required Google-style
docstring; add a docstring as the first statement in the function that
succinctly describes what the test verifies (that loading a schema with reserved
relationship names on hierarchical nodes returns a 422 with the expected error),
and include a brief Google-style Args section listing key fixtures/parameters
used by the test (e.g., db, client, admin_headers, default_branch, helper,
reserved_name).
ajtmccarty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate that you are following the existing patterns that we use for schema validation and that you are restricting values as minimally as possible, but I think we want a slightly different approach for this logic.
I suspect that we cannot add this restriction logic anywhere in SchemaBranch.process b/c we could be loading an existing schema that has valid parent, children, etc. relationships on it
I think we probably need to add this restriction either somewhere in the load_schema endpoint or on the SchemasLoadAPI object. I think the most correct approach is to add a new SchemaApiValidator class (or some other name) that runs this type of API-level validation on an incoming schema and can be extended to add more validations in the future, but for this small check, maybe something on SchemasLoadAPI would be okay too
also, can you add fixes #7287 to the PR description so that it is correctly linked in GH please?
| _loaded_from_api: bool = PrivateAttr(default=False) | ||
|
|
||
| @property | ||
| def loaded_from_api(self) -> bool: | ||
| return self._loaded_from_api | ||
|
|
||
| @loaded_from_api.setter | ||
| def loaded_from_api(self, value: bool) -> None: | ||
| self._loaded_from_api = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this violates the Single Responsibility Principle, which is something we do all over the place, but we are trying not to add any more
the RelationshipSchema should not need data indicating whether it was loaded from the API or not
if we need to do validation of data from the API before it is instantiated into a RelationshipSchema, then that should happen in the API endpoint somewhere
| is_hierarchical_node = (isinstance(node, GenericSchema) and node.hierarchical) or ( | ||
| isinstance(node, NodeSchema) and node.hierarchy | ||
| ) | ||
|
|
||
| if not is_hierarchical_node or node.kind in INTERNAL_SCHEMA_NODE_KINDS: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we want this restriction to be more general and prevent using any of the RESERVED_ATTR_REL_HIERARCHICAL_NAMES names on any attribute or relationship that is defined by the user, not just on hierarchical nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we stick to restricting RESERVED_ATTR_REL_HIERARCHICAL_NAMES globally that would be the easiest solution as we can just iterate over the input from the user and restrict these attribute or relationship names from entering the system. My main concern would be if there are users out there that have already used these names anywhere.
I.e. today this is a perfectly valid schema:
# yaml-language-server: $schema=https://schema.infrahub.app/infrahub/schema/latest.json
---
version: '1.0'
nodes:
- name: Person
namespace: Test
label: "Person"
default_filter: name__value
display_labels:
- name__value
attributes:
- name: name
kind: Text
- name: children
kind: Number
- name: description
kind: Text
optional: trueIt's fine to use "children" as the name of an attribute as long as the node itself isn't hierarchical. It might be perfectly fine but it could also be that some installations face problems due to this.
I think it would be problematic to only have a validation within the |
Fixes: #6125
Summary by CodeRabbit
New Features
Tests