Skip to content

Conversation

@ajtmccarty
Copy link
Contributor

@ajtmccarty ajtmccarty commented Nov 1, 2025

a little background, schema updates that include a change to name, namespace, or inheritance are handled in a special way at the database level. we create duplicate vertices of all Nodes using the updated schema, update the label(s) on the new version, and soft-delete the old version. the end result is two nodes with the same UUID linked to all of the same attribute and relationship vertices. and some of our cypher queries have not been updated to handle this case.

Diff merge bug

there was a bug in the diff merge logic that affected relationship updates that included peers that had their kind/inheritance updated on the default branch by creating duplicated edges to the older, deleted versions of migrated nodes. this PR fixes that bug and adds a test for it.

this PR also includes a migration and test for removing these duplicated/illegal edges on the default branch.

Diff calculation bug

while working on this, I also found that the diff calculation logic included a related bug that would cause edges linked to the older versions of migrated kind nodes to be included in the diff when they should be excluded. this PR includes a fix and test for that bug as well.

Diff delete bug for huge diffs

and one final thing, while testing this against a very large diff, I found that the cypher query can run out of memory and crash, so there is one final update to reduce the memory usage of the query to delete a diff

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Fixed branch merge logic that could create duplicate edges when node kinds or inheritance were updated, preventing relationship constraint violations.
    • Fixed diff calculation to properly handle nodes with migrated kinds or inheritance when determining branch-specific changes.

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Nov 1, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 1, 2025

Walkthrough

This pull request addresses issues with duplicate edges created during branch merges when nodes have kind or inheritance migrations. It modifies core diff query logic, particularly deletion operations and merge resolution for migrated peers. A new migration (Migration041) is introduced to remove duplicate edges created after node deletions and kind migrations. Query logic for time-filtering and peer resolution is refined to handle branch-aware and cross-branch scenarios. Tests are added to validate the migration behavior and the updated merge and diff calculation logic for migrated node kinds.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main fix: preventing duplicated edges during merge operations involving nodes whose kind or inheritance was migrated.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ajtm-halloween-dup-edges-merge

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 1, 2025

CodSpeed Performance Report

Merging #7536 will not alter performance

Comparing ajtm-halloween-dup-edges-merge (9a12ba3) with stable (61d6162)

Summary

✅ 10 untouched

@ajtmccarty ajtmccarty force-pushed the ajtm-halloween-dup-edges-merge branch from f48b374 to 9383e27 Compare November 3, 2025 21:42
@ajtmccarty ajtmccarty marked this pull request as ready for review November 4, 2025 18:07
@ajtmccarty ajtmccarty requested a review from a team as a code owner November 4, 2025 18:07
CALL (diff_thing) {
DETACH DELETE diff_thing
} IN TRANSACTIONS
DETACH DELETE d_root
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce memory usage of this query so that it can delete a very large diff (over 1 million database vertices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes to make sure we use the latest Node with a given UUID in case it had its kind/inheritance updated in the past, resulting in multiple Nodes with the same UUID

// skip paths where nodes/attrs/rels are updated after $from_time, those are handled in other queries
AND (
r_root.from <= $from_time AND (r_root.to IS NULL OR r_root.branch <> diff_rel.branch OR r_root.to <= $from_time)
AND r_node.from <= $from_time AND (r_node.to IS NULL OR r_node.branch <> diff_rel.branch OR r_node.to <= $from_time)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are three diff calculation queries that work together: one for node-level adds/deletes, one for attribute/relationship-level changes, and one for property-level changes. each query excludes changes covered by any higher-level queries. for example, the attribute/relationship query excludes changes covered by the node query because we don't need to track which individual attributes and relationships were deleted for a given node if the whole node was deleted

this filter in the property-level query worked almost all of the time to exclude properties handled at the node or attribute/relationship levels, but it did not work correctly if a peer node had its kind/inheritance updated. in this case, the query would erroneously include some paths to the deleted version of the migrated node. making this chunk of the query (and the matching one below) more specific resolved the issue

undeleted_car = await NodeManager.get_one(db=db, branch=default_branch, id=e_car_2.id)
assert undeleted_car.get_kind() == "TestElectricCar"

async def test_diff_and_merge_with_migrated_node_kind_peer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test for merging a branch that includes relationships to a migrated-kind node

assert is_protected_prop.previous_value is False


async def test_migrated_kind_on_main_then_relationship_update_on_branch(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test for calculating the diff correctly when it includes relationships to a migrated-kind node

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset created by the infrahub db selected-export command using the old code that created the illegal edges after a merge including relationships to a migrated-kind peer

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61d6162 and 9a12ba3.

⛔ Files ignored due to path filters (2)
  • backend/tests/unit/core/migrations/graph/test_041/data_export/edges.csv is excluded by !**/*.csv
  • backend/tests/unit/core/migrations/graph/test_041/data_export/vertices.csv is excluded by !**/*.csv
📒 Files selected for processing (11)
  • backend/infrahub/core/diff/query/delete_query.py (1 hunks)
  • backend/infrahub/core/diff/query/merge.py (4 hunks)
  • backend/infrahub/core/migrations/graph/__init__.py (2 hunks)
  • backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py (1 hunks)
  • backend/infrahub/core/query/diff.py (2 hunks)
  • backend/tests/unit/core/diff/test_diff_and_merge.py (1 hunks)
  • backend/tests/unit/core/diff/test_diff_calculator.py (1 hunks)
  • backend/tests/unit/core/migrations/graph/test_041/data_export/README.md (1 hunks)
  • backend/tests/unit/core/migrations/graph/test_041/test_041.py (1 hunks)
  • backend/tests/unit/core/test_branch_merge.py (1 hunks)
  • changelog/7432.fixed.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{md,mdx}

📄 CodeRabbit inference engine (CLAUDE.md)

Lint Markdown/MDX files with markdownlint using .markdownlint.yaml

Files:

  • changelog/7432.fixed.md
  • backend/tests/unit/core/migrations/graph/test_041/data_export/README.md
backend/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run backend tests with pytest or via invoke tasks

Files:

  • backend/infrahub/core/query/diff.py
  • backend/tests/unit/core/migrations/graph/test_041/data_export/README.md
  • backend/infrahub/core/diff/query/merge.py
  • backend/tests/unit/core/diff/test_diff_calculator.py
  • backend/tests/unit/core/test_branch_merge.py
  • backend/infrahub/core/migrations/graph/__init__.py
  • backend/tests/unit/core/diff/test_diff_and_merge.py
  • backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py
  • backend/infrahub/core/diff/query/delete_query.py
  • backend/tests/unit/core/migrations/graph/test_041/test_041.py
**/*.py

📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)

**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpful

Use ruff and mypy to validate and lint Python files

**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions with async def
Use await for asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking

**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions with async def
Await asynchronous calls with await
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy

Files:

  • backend/infrahub/core/query/diff.py
  • backend/infrahub/core/diff/query/merge.py
  • backend/tests/unit/core/diff/test_diff_calculator.py
  • backend/tests/unit/core/test_branch_merge.py
  • backend/infrahub/core/migrations/graph/__init__.py
  • backend/tests/unit/core/diff/test_diff_and_merge.py
  • backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py
  • backend/infrahub/core/diff/query/delete_query.py
  • backend/tests/unit/core/migrations/graph/test_041/test_041.py
backend/tests/**/*

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Place backend tests in backend/tests/

Files:

  • backend/tests/unit/core/migrations/graph/test_041/data_export/README.md
  • backend/tests/unit/core/diff/test_diff_calculator.py
  • backend/tests/unit/core/test_branch_merge.py
  • backend/tests/unit/core/diff/test_diff_and_merge.py
  • backend/tests/unit/core/migrations/graph/test_041/test_041.py
🧠 Learnings (1)
📚 Learning: 2025-08-05T00:42:31.994Z
Learnt from: ajtmccarty
Repo: opsmill/infrahub PR: 6914
File: backend/infrahub/cli/db_commands/check_inheritance.py:189-284
Timestamp: 2025-08-05T00:42:31.994Z
Learning: In Infrahub's migration system, NodeDuplicateQuery migrations are idempotent and do not require transaction management. Using transactions could cause out-of-memory errors when processing large numbers of nodes.

Applied to files:

  • backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py
  • backend/infrahub/core/diff/query/delete_query.py
🧬 Code graph analysis (6)
backend/tests/unit/core/diff/test_diff_calculator.py (7)
backend/infrahub/core/path.py (1)
  • SchemaPath (123-172)
backend/infrahub/core/constants/__init__.py (2)
  • SchemaPathType (252-255)
  • DiffAction (213-217)
backend/infrahub/core/branch/tasks.py (1)
  • create_branch (341-388)
backend/infrahub/core/manager.py (1)
  • get_many (1111-1218)
backend/infrahub/core/diff/calculator.py (2)
  • DiffCalculator (35-236)
  • calculate_diff (123-236)
backend/infrahub/core/branch/models.py (1)
  • get_branched_from (89-92)
backend/infrahub/core/constants/database.py (1)
  • DatabaseEdgeType (6-15)
backend/tests/unit/core/test_branch_merge.py (1)
backend/infrahub/core/manager.py (1)
  • NodeManager (135-1363)
backend/infrahub/core/migrations/graph/__init__.py (1)
backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py (1)
  • Migration041 (111-149)
backend/tests/unit/core/diff/test_diff_and_merge.py (7)
backend/infrahub/core/schema/manager.py (3)
  • update_schema_branch (170-200)
  • load_schema_from_db (643-726)
  • set_schema_branch (161-164)
backend/infrahub/core/path.py (1)
  • SchemaPath (123-172)
backend/infrahub/core/constants/__init__.py (1)
  • SchemaPathType (252-255)
backend/infrahub/core/branch/tasks.py (1)
  • create_branch (341-388)
backend/infrahub/core/diff/merger/merger.py (2)
  • merge_graph (43-111)
  • rollback (113-117)
backend/infrahub/exceptions.py (2)
  • SchemaNotFoundError (266-279)
  • NodeNotFoundError (203-219)
backend/tests/helpers/db_validation.py (1)
  • verify_no_duplicate_paths (107-132)
backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py (6)
backend/infrahub/core/diff/repository/repository.py (2)
  • DiffRepository (51-591)
  • delete_all_diff_roots (379-381)
backend/infrahub/core/initialization.py (1)
  • get_root_node (49-63)
backend/infrahub/core/migrations/shared.py (2)
  • MigrationResult (29-38)
  • ArbitraryMigration (205-217)
backend/infrahub/core/query/__init__.py (2)
  • QueryType (140-142)
  • add_to_query (415-425)
backend/infrahub/dependencies/registry.py (2)
  • build_component_registry (30-58)
  • get_component_registry (61-62)
backend/infrahub/core/branch/models.py (1)
  • get_by_name (136-153)
backend/tests/unit/core/migrations/graph/test_041/test_041.py (5)
backend/infrahub/cli/db.py (1)
  • load_export (732-801)
backend/infrahub/core/manager.py (1)
  • get_many (1111-1218)
backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py (2)
  • Migration041 (111-149)
  • execute (127-149)
backend/infrahub/core/schema/schema_branch.py (2)
  • SchemaBranch (72-2322)
  • node_names (103-104)
backend/tests/helpers/db_validation.py (1)
  • verify_no_duplicate_paths (107-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E-testing-version-upgrade / From 1.3.0
  • GitHub Check: E2E-testing-invoke-demo-start
  • GitHub Check: backend-tests-functional
  • GitHub Check: backend-benchmark
  • GitHub Check: backend-docker-integration
  • GitHub Check: E2E-testing-playwright
  • GitHub Check: backend-tests-unit
  • GitHub Check: backend-tests-integration
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (1)
backend/tests/unit/core/diff/test_diff_and_merge.py (1)

1223-1317: Test implementation is thorough and correct.

This test effectively validates the scenario where:

  1. A node kind migration occurs on the default branch
  2. A feature branch is created post-migration
  3. Relationship updates are made on the feature branch
  4. Changes are merged to the default branch
  5. Rollback correctly preserves the migration while reverting branch-specific changes

The test properly asserts that after rollback (line 1311), the car remains Test2NewCar since the migration occurred on main, while the color and owner changes are correctly reverted. This is the expected behavior for this scenario.

Comment on lines +63 to +66
WITH n, delete_branch, delete_time, added_e, peer
WHERE delete_branch = added_e.branch
OR delete_time < added_e_branched_from
DELETE added_e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep added_e_branched_from in scope before filtering.

Line 65 runs WHERE delete_time < added_e_branched_from, but the preceding WITH only forwards n, delete_branch, delete_time, added_e, peer. Cypher drops any variable not listed in the WITH, so added_e_branched_from is undefined and the query will fail at runtime. Please include it (and anything else you need later) in the WITH projection before the WHERE. (neo4j.com)

-WITH n, delete_branch, delete_time, added_e, peer
+WITH n, delete_branch, delete_time, added_e, peer, added_e_branched_from
 WHERE delete_branch = added_e.branch
 OR delete_time < added_e_branched_from
 DELETE added_e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
WITH n, delete_branch, delete_time, added_e, peer
WHERE delete_branch = added_e.branch
OR delete_time < added_e_branched_from
DELETE added_e
WITH n, delete_branch, delete_time, added_e, peer, added_e_branched_from
WHERE delete_branch = added_e.branch
OR delete_time < added_e_branched_from
DELETE added_e
🤖 Prompt for AI Agents
In backend/infrahub/core/migrations/graph/m041_deleted_dup_edges.py around lines
63 to 66, the WITH currently forwards n, delete_branch, delete_time, added_e,
peer but the subsequent WHERE references added_e_branched_from which is not in
scope; update the WITH projection to include added_e_branched_from (and any
other downstream variables you will need) so the variable is carried into the
WHERE/DELETE portion of the query.

Comment on lines +1211 to +1222
async def test_diff_and_merge_with_migrated_node_kind_peer(
self,
db: InfrahubDatabase,
default_branch: Branch,
register_internal_models_schema: SchemaBranch,
register_core_models_schema: SchemaBranch,
car_person_schema: SchemaBranch,
car_accord_main: Node,
car_camry_main: Node,
person_jane_main: Node,
person_john_main: Node,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a docstring to document the test's purpose.

This test method lacks a docstring. Per coding guidelines, all Python functions should include a Google-style docstring with a brief description.

Apply this diff to add a docstring:

     async def test_diff_and_merge_with_migrated_node_kind_peer(
         self,
         db: InfrahubDatabase,
         default_branch: Branch,
         register_internal_models_schema: SchemaBranch,
         register_core_models_schema: SchemaBranch,
         car_person_schema: SchemaBranch,
         car_accord_main: Node,
         car_camry_main: Node,
         person_jane_main: Node,
         person_john_main: Node,
     ):
+        """Test merging a branch with relationship updates to nodes migrated on the default branch.
+
+        This test validates that when a node kind migration occurs on the default branch
+        before a feature branch is created, subsequent changes to relationships on the
+        feature branch are correctly merged and rolled back. The key difference from
+        test_diff_and_merge_with_migrated_node_kind is that the migration happens on
+        the default branch first, then the feature branch is created and modified.
+        """

Based on coding guidelines

🤖 Prompt for AI Agents
In backend/tests/unit/core/diff/test_diff_and_merge.py around lines 1211 to
1222, the test method test_diff_and_merge_with_migrated_node_kind_peer is
missing a docstring; add a Google-style triple-quoted docstring immediately
below the def that briefly describes the test purpose (e.g., verifies diff and
merge behavior when a node kind has been migrated and a peer exists), one or two
concise sentences, and optional short "Args" only if needed — keep it minimal
and aligned with project docstring style.

Comment on lines +93 to +107
if schema_name == "TestCar":
car_schema = car_person_schema.get(name=schema_name, duplicate=False)
car_schema.name = "NewCar"
car_schema.namespace = "Test2"
db.schema.set(name="Test2NewCar", schema=car_schema)
elif schema_name == "TestPerson":
person_schema = car_person_schema.get(name="TestPerson", duplicate=False)
cars_rel = person_schema.get_relationship("cars")
cars_rel.peer = "Test2NewCar"
cars_driven_rel = person_schema.get_relationship("cars_driven")
cars_driven_rel.peer = "Test2NewCar"
db.schema.set(name="Test2NewPerson", schema=person_schema)
else:
db.schema.set(name=schema_name, schema=car_person_schema.get(name=schema_name, duplicate=False))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix AttributeError when updating schema fixtures
InfrahubDatabase does not expose a schema attribute, so every db.schema.set(...) call will raise an AttributeError before the migration ever runs. Please update the test to use the schema registry (and pass the branch so the schema lands on the right namespace). You can do that by adding the registry import, pulling in the default_branch fixture, and swapping the assignments, e.g.:

-from infrahub.core.manager import NodeManager
+from infrahub.core import registry
+from infrahub.core.manager import NodeManager
@@
-    async def test_migration_041(self, db: InfrahubDatabase, load_bad_data, car_person_schema: SchemaBranch):
+    async def test_migration_041(
+        self,
+        db: InfrahubDatabase,
+        load_bad_data,
+        car_person_schema: SchemaBranch,
+        default_branch: Branch,
+    ):
@@
-                db.schema.set(name="Test2NewCar", schema=car_schema)
+                registry.schema.set(name="Test2NewCar", schema=car_schema, branch=default_branch.name)
@@
-                db.schema.set(name="Test2NewPerson", schema=person_schema)
+                registry.schema.set(name="Test2NewPerson", schema=person_schema, branch=default_branch.name)
@@
-                db.schema.set(name=schema_name, schema=car_person_schema.get(name=schema_name, duplicate=False))
+                registry.schema.set(
+                    name=schema_name,
+                    schema=car_person_schema.get(name=schema_name, duplicate=False),
+                    branch=default_branch.name,
+                )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In backend/tests/unit/core/migrations/graph/test_041/test_041.py around lines 93
to 107, the test calls db.schema.set(...) but InfrahubDatabase does not expose a
schema attribute which raises AttributeError; update the test to import and use
the shared schema registry and the default_branch fixture, replace every
db.schema.set(...) call with registry.set(name=..., schema=...,
branch=default_branch) (ensuring you pass the mutated schema object obtained via
car_person_schema.get(..., duplicate=False)), and add the registry import and
default_branch parameter to the test signature so schemas are registered on the
correct branch.

@ajtmccarty ajtmccarty merged commit cab20ef into stable Nov 5, 2025
38 checks passed
@ajtmccarty ajtmccarty deleted the ajtm-halloween-dup-edges-merge branch November 5, 2025 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants