-
Notifications
You must be signed in to change notification settings - Fork 37
no duplicated edges during merge involving migrated-kind nodes #7536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request addresses issues with duplicate edges created during branch merges when nodes have kind or inheritance migrations. It modifies core diff query logic, particularly deletion operations and merge resolution for migrated peers. A new migration (Migration041) is introduced to remove duplicate edges created after node deletions and kind migrations. Query logic for time-filtering and peer resolution is refined to handle branch-aware and cross-branch scenarios. Tests are added to validate the migration behavior and the updated merge and diff calculation logic for migrated node kinds. Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CodSpeed Performance ReportMerging #7536 will not alter performanceComparing Summary
|
f48b374 to
9383e27
Compare
| CALL (diff_thing) { | ||
| DETACH DELETE diff_thing | ||
| } IN TRANSACTIONS | ||
| DETACH DELETE d_root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce memory usage of this query so that it can delete a very large diff (over 1 million database vertices)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixes to make sure we use the latest Node with a given UUID in case it had its kind/inheritance updated in the past, resulting in multiple Nodes with the same UUID
| // skip paths where nodes/attrs/rels are updated after $from_time, those are handled in other queries | ||
| AND ( | ||
| r_root.from <= $from_time AND (r_root.to IS NULL OR r_root.branch <> diff_rel.branch OR r_root.to <= $from_time) | ||
| AND r_node.from <= $from_time AND (r_node.to IS NULL OR r_node.branch <> diff_rel.branch OR r_node.to <= $from_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are three diff calculation queries that work together: one for node-level adds/deletes, one for attribute/relationship-level changes, and one for property-level changes. each query excludes changes covered by any higher-level queries. for example, the attribute/relationship query excludes changes covered by the node query because we don't need to track which individual attributes and relationships were deleted for a given node if the whole node was deleted
this filter in the property-level query worked almost all of the time to exclude properties handled at the node or attribute/relationship levels, but it did not work correctly if a peer node had its kind/inheritance updated. in this case, the query would erroneously include some paths to the deleted version of the migrated node. making this chunk of the query (and the matching one below) more specific resolved the issue
| undeleted_car = await NodeManager.get_one(db=db, branch=default_branch, id=e_car_2.id) | ||
| assert undeleted_car.get_kind() == "TestElectricCar" | ||
|
|
||
| async def test_diff_and_merge_with_migrated_node_kind_peer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test for merging a branch that includes relationships to a migrated-kind node
| assert is_protected_prop.previous_value is False | ||
|
|
||
|
|
||
| async def test_migrated_kind_on_main_then_relationship_update_on_branch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test for calculating the diff correctly when it includes relationships to a migrated-kind node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset created by the infrahub db selected-export command using the old code that created the illegal edges after a merge including relationships to a migrated-kind peer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
backend/tests/unit/core/migrations/graph/test_041/data_export/edges.csvis excluded by!**/*.csvbackend/tests/unit/core/migrations/graph/test_041/data_export/vertices.csvis excluded by!**/*.csv
📒 Files selected for processing (11)
backend/infrahub/core/diff/query/delete_query.py(1 hunks)backend/infrahub/core/diff/query/merge.py(4 hunks)backend/infrahub/core/migrations/graph/__init__.py(2 hunks)backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py(1 hunks)backend/infrahub/core/query/diff.py(2 hunks)backend/tests/unit/core/diff/test_diff_and_merge.py(1 hunks)backend/tests/unit/core/diff/test_diff_calculator.py(1 hunks)backend/tests/unit/core/migrations/graph/test_041/data_export/README.md(1 hunks)backend/tests/unit/core/migrations/graph/test_041/test_041.py(1 hunks)backend/tests/unit/core/test_branch_merge.py(1 hunks)changelog/7432.fixed.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{md,mdx}
📄 CodeRabbit inference engine (CLAUDE.md)
Lint Markdown/MDX files with markdownlint using
.markdownlint.yaml
Files:
changelog/7432.fixed.mdbackend/tests/unit/core/migrations/graph/test_041/data_export/README.md
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
backend/infrahub/core/query/diff.pybackend/tests/unit/core/migrations/graph/test_041/data_export/README.mdbackend/infrahub/core/diff/query/merge.pybackend/tests/unit/core/diff/test_diff_calculator.pybackend/tests/unit/core/test_branch_merge.pybackend/infrahub/core/migrations/graph/__init__.pybackend/tests/unit/core/diff/test_diff_and_merge.pybackend/infrahub/core/migrations/graph/m041_deleted_dup_edges.pybackend/infrahub/core/diff/query/delete_query.pybackend/tests/unit/core/migrations/graph/test_041/test_041.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpfulUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions withasync def
Useawaitfor asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions withasync def
Await asynchronous calls withawait
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy
Files:
backend/infrahub/core/query/diff.pybackend/infrahub/core/diff/query/merge.pybackend/tests/unit/core/diff/test_diff_calculator.pybackend/tests/unit/core/test_branch_merge.pybackend/infrahub/core/migrations/graph/__init__.pybackend/tests/unit/core/diff/test_diff_and_merge.pybackend/infrahub/core/migrations/graph/m041_deleted_dup_edges.pybackend/infrahub/core/diff/query/delete_query.pybackend/tests/unit/core/migrations/graph/test_041/test_041.py
backend/tests/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Place backend tests in
backend/tests/
Files:
backend/tests/unit/core/migrations/graph/test_041/data_export/README.mdbackend/tests/unit/core/diff/test_diff_calculator.pybackend/tests/unit/core/test_branch_merge.pybackend/tests/unit/core/diff/test_diff_and_merge.pybackend/tests/unit/core/migrations/graph/test_041/test_041.py
🧠 Learnings (1)
📚 Learning: 2025-08-05T00:42:31.994Z
Learnt from: ajtmccarty
Repo: opsmill/infrahub PR: 6914
File: backend/infrahub/cli/db_commands/check_inheritance.py:189-284
Timestamp: 2025-08-05T00:42:31.994Z
Learning: In Infrahub's migration system, NodeDuplicateQuery migrations are idempotent and do not require transaction management. Using transactions could cause out-of-memory errors when processing large numbers of nodes.
Applied to files:
backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.pybackend/infrahub/core/diff/query/delete_query.py
🧬 Code graph analysis (6)
backend/tests/unit/core/diff/test_diff_calculator.py (7)
backend/infrahub/core/path.py (1)
SchemaPath(123-172)backend/infrahub/core/constants/__init__.py (2)
SchemaPathType(252-255)DiffAction(213-217)backend/infrahub/core/branch/tasks.py (1)
create_branch(341-388)backend/infrahub/core/manager.py (1)
get_many(1111-1218)backend/infrahub/core/diff/calculator.py (2)
DiffCalculator(35-236)calculate_diff(123-236)backend/infrahub/core/branch/models.py (1)
get_branched_from(89-92)backend/infrahub/core/constants/database.py (1)
DatabaseEdgeType(6-15)
backend/tests/unit/core/test_branch_merge.py (1)
backend/infrahub/core/manager.py (1)
NodeManager(135-1363)
backend/infrahub/core/migrations/graph/__init__.py (1)
backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py (1)
Migration041(111-149)
backend/tests/unit/core/diff/test_diff_and_merge.py (7)
backend/infrahub/core/schema/manager.py (3)
update_schema_branch(170-200)load_schema_from_db(643-726)set_schema_branch(161-164)backend/infrahub/core/path.py (1)
SchemaPath(123-172)backend/infrahub/core/constants/__init__.py (1)
SchemaPathType(252-255)backend/infrahub/core/branch/tasks.py (1)
create_branch(341-388)backend/infrahub/core/diff/merger/merger.py (2)
merge_graph(43-111)rollback(113-117)backend/infrahub/exceptions.py (2)
SchemaNotFoundError(266-279)NodeNotFoundError(203-219)backend/tests/helpers/db_validation.py (1)
verify_no_duplicate_paths(107-132)
backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py (6)
backend/infrahub/core/diff/repository/repository.py (2)
DiffRepository(51-591)delete_all_diff_roots(379-381)backend/infrahub/core/initialization.py (1)
get_root_node(49-63)backend/infrahub/core/migrations/shared.py (2)
MigrationResult(29-38)ArbitraryMigration(205-217)backend/infrahub/core/query/__init__.py (2)
QueryType(140-142)add_to_query(415-425)backend/infrahub/dependencies/registry.py (2)
build_component_registry(30-58)get_component_registry(61-62)backend/infrahub/core/branch/models.py (1)
get_by_name(136-153)
backend/tests/unit/core/migrations/graph/test_041/test_041.py (5)
backend/infrahub/cli/db.py (1)
load_export(732-801)backend/infrahub/core/manager.py (1)
get_many(1111-1218)backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py (2)
Migration041(111-149)execute(127-149)backend/infrahub/core/schema/schema_branch.py (2)
SchemaBranch(72-2322)node_names(103-104)backend/tests/helpers/db_validation.py (1)
verify_no_duplicate_paths(107-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: E2E-testing-invoke-demo-start
- GitHub Check: backend-tests-functional
- GitHub Check: backend-benchmark
- GitHub Check: backend-docker-integration
- GitHub Check: E2E-testing-playwright
- GitHub Check: backend-tests-unit
- GitHub Check: backend-tests-integration
- GitHub Check: Cloudflare Pages
🔇 Additional comments (1)
backend/tests/unit/core/diff/test_diff_and_merge.py (1)
1223-1317: Test implementation is thorough and correct.This test effectively validates the scenario where:
- A node kind migration occurs on the default branch
- A feature branch is created post-migration
- Relationship updates are made on the feature branch
- Changes are merged to the default branch
- Rollback correctly preserves the migration while reverting branch-specific changes
The test properly asserts that after rollback (line 1311), the car remains
Test2NewCarsince the migration occurred on main, while the color and owner changes are correctly reverted. This is the expected behavior for this scenario.
| WITH n, delete_branch, delete_time, added_e, peer | ||
| WHERE delete_branch = added_e.branch | ||
| OR delete_time < added_e_branched_from | ||
| DELETE added_e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep added_e_branched_from in scope before filtering.
Line 65 runs WHERE delete_time < added_e_branched_from, but the preceding WITH only forwards n, delete_branch, delete_time, added_e, peer. Cypher drops any variable not listed in the WITH, so added_e_branched_from is undefined and the query will fail at runtime. Please include it (and anything else you need later) in the WITH projection before the WHERE. (neo4j.com)
-WITH n, delete_branch, delete_time, added_e, peer
+WITH n, delete_branch, delete_time, added_e, peer, added_e_branched_from
WHERE delete_branch = added_e.branch
OR delete_time < added_e_branched_from
DELETE added_e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| WITH n, delete_branch, delete_time, added_e, peer | |
| WHERE delete_branch = added_e.branch | |
| OR delete_time < added_e_branched_from | |
| DELETE added_e | |
| WITH n, delete_branch, delete_time, added_e, peer, added_e_branched_from | |
| WHERE delete_branch = added_e.branch | |
| OR delete_time < added_e_branched_from | |
| DELETE added_e |
🤖 Prompt for AI Agents
In backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py around lines
63 to 66, the WITH currently forwards n, delete_branch, delete_time, added_e,
peer but the subsequent WHERE references added_e_branched_from which is not in
scope; update the WITH projection to include added_e_branched_from (and any
other downstream variables you will need) so the variable is carried into the
WHERE/DELETE portion of the query.
| async def test_diff_and_merge_with_migrated_node_kind_peer( | ||
| self, | ||
| db: InfrahubDatabase, | ||
| default_branch: Branch, | ||
| register_internal_models_schema: SchemaBranch, | ||
| register_core_models_schema: SchemaBranch, | ||
| car_person_schema: SchemaBranch, | ||
| car_accord_main: Node, | ||
| car_camry_main: Node, | ||
| person_jane_main: Node, | ||
| person_john_main: Node, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a docstring to document the test's purpose.
This test method lacks a docstring. Per coding guidelines, all Python functions should include a Google-style docstring with a brief description.
Apply this diff to add a docstring:
async def test_diff_and_merge_with_migrated_node_kind_peer(
self,
db: InfrahubDatabase,
default_branch: Branch,
register_internal_models_schema: SchemaBranch,
register_core_models_schema: SchemaBranch,
car_person_schema: SchemaBranch,
car_accord_main: Node,
car_camry_main: Node,
person_jane_main: Node,
person_john_main: Node,
):
+ """Test merging a branch with relationship updates to nodes migrated on the default branch.
+
+ This test validates that when a node kind migration occurs on the default branch
+ before a feature branch is created, subsequent changes to relationships on the
+ feature branch are correctly merged and rolled back. The key difference from
+ test_diff_and_merge_with_migrated_node_kind is that the migration happens on
+ the default branch first, then the feature branch is created and modified.
+ """Based on coding guidelines
🤖 Prompt for AI Agents
In backend/tests/unit/core/diff/test_diff_and_merge.py around lines 1211 to
1222, the test method test_diff_and_merge_with_migrated_node_kind_peer is
missing a docstring; add a Google-style triple-quoted docstring immediately
below the def that briefly describes the test purpose (e.g., verifies diff and
merge behavior when a node kind has been migrated and a peer exists), one or two
concise sentences, and optional short "Args" only if needed — keep it minimal
and aligned with project docstring style.
| if schema_name == "TestCar": | ||
| car_schema = car_person_schema.get(name=schema_name, duplicate=False) | ||
| car_schema.name = "NewCar" | ||
| car_schema.namespace = "Test2" | ||
| db.schema.set(name="Test2NewCar", schema=car_schema) | ||
| elif schema_name == "TestPerson": | ||
| person_schema = car_person_schema.get(name="TestPerson", duplicate=False) | ||
| cars_rel = person_schema.get_relationship("cars") | ||
| cars_rel.peer = "Test2NewCar" | ||
| cars_driven_rel = person_schema.get_relationship("cars_driven") | ||
| cars_driven_rel.peer = "Test2NewCar" | ||
| db.schema.set(name="Test2NewPerson", schema=person_schema) | ||
| else: | ||
| db.schema.set(name=schema_name, schema=car_person_schema.get(name=schema_name, duplicate=False)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix AttributeError when updating schema fixtures
InfrahubDatabase does not expose a schema attribute, so every db.schema.set(...) call will raise an AttributeError before the migration ever runs. Please update the test to use the schema registry (and pass the branch so the schema lands on the right namespace). You can do that by adding the registry import, pulling in the default_branch fixture, and swapping the assignments, e.g.:
-from infrahub.core.manager import NodeManager
+from infrahub.core import registry
+from infrahub.core.manager import NodeManager
@@
- async def test_migration_041(self, db: InfrahubDatabase, load_bad_data, car_person_schema: SchemaBranch):
+ async def test_migration_041(
+ self,
+ db: InfrahubDatabase,
+ load_bad_data,
+ car_person_schema: SchemaBranch,
+ default_branch: Branch,
+ ):
@@
- db.schema.set(name="Test2NewCar", schema=car_schema)
+ registry.schema.set(name="Test2NewCar", schema=car_schema, branch=default_branch.name)
@@
- db.schema.set(name="Test2NewPerson", schema=person_schema)
+ registry.schema.set(name="Test2NewPerson", schema=person_schema, branch=default_branch.name)
@@
- db.schema.set(name=schema_name, schema=car_person_schema.get(name=schema_name, duplicate=False))
+ registry.schema.set(
+ name=schema_name,
+ schema=car_person_schema.get(name=schema_name, duplicate=False),
+ branch=default_branch.name,
+ )Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/tests/unit/core/migrations/graph/test_041/test_041.py around lines 93
to 107, the test calls db.schema.set(...) but InfrahubDatabase does not expose a
schema attribute which raises AttributeError; update the test to import and use
the shared schema registry and the default_branch fixture, replace every
db.schema.set(...) call with registry.set(name=..., schema=...,
branch=default_branch) (ensuring you pass the mutated schema object obtained via
car_person_schema.get(..., duplicate=False)), and add the registry import and
default_branch parameter to the test signature so schemas are registered on the
correct branch.
a little background, schema updates that include a change to name, namespace, or inheritance are handled in a special way at the database level. we create duplicate vertices of all Nodes using the updated schema, update the label(s) on the new version, and soft-delete the old version. the end result is two nodes with the same UUID linked to all of the same attribute and relationship vertices. and some of our cypher queries have not been updated to handle this case.
Diff merge bug
there was a bug in the diff merge logic that affected relationship updates that included peers that had their kind/inheritance updated on the default branch by creating duplicated edges to the older, deleted versions of migrated nodes. this PR fixes that bug and adds a test for it.
this PR also includes a migration and test for removing these duplicated/illegal edges on the default branch.
Diff calculation bug
while working on this, I also found that the diff calculation logic included a related bug that would cause edges linked to the older versions of migrated kind nodes to be included in the diff when they should be excluded. this PR includes a fix and test for that bug as well.
Diff delete bug for huge diffs
and one final thing, while testing this against a very large diff, I found that the cypher query can run out of memory and crash, so there is one final update to reduce the memory usage of the query to delete a diff
Summary by CodeRabbit
Release Notes