Skip to content

Conversation

lbhm
Copy link
Collaborator

@lbhm lbhm commented Sep 9, 2025

What changes are proposed in this pull request?

This PR follows up on #1266 and adds support for reading the row index metadata column to the default engine. The implementation directly follows the approach proposed in #920 and slightly modifies it to match the new metadata column API.

Quoting from #920

Deletion vectors (and row tracking, eventually) rely on accurate file-level row indexes. But they're not implemented in the kernel's default parquet reader. That means we must rely on the position of rows in data batches returned by each read, and we cannot apply optimizations such as stats-based row group skipping (see #860).

Add row index support to the default Parquet reader, in the form of a new RowIndex variant of ReorderIndexTransform. [...] The default parquet reader recognizes (the RowIndex metadata) column and injects a transform to generate row indexes (with appropriate adjustments for any row group skipping that might occur).

Fixes #919

NOTE: If/when arrow-rs parquet reader gains native support for row indexes, e.g. apache/arrow-rs#7307, we should switch to using that. Our solution here is not robust to advanced parquet reader features like page-level skipping. row-level predicate pushdown, etc.

This PR affects the following public APIs

None - the breaking changes were introduced in #1266.

How was this change tested?

New UT.

@lbhm lbhm marked this pull request as draft September 9, 2025 09:54
Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 80.55556% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.28%. Comparing base (459e832) to head (35cadc6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/arrow_utils.rs 86.48% 15 Missing and 5 partials ⚠️
kernel/src/engine/default/parquet.rs 42.85% 7 Missing and 1 partial ⚠️
kernel/src/engine/parquet_row_group_skipping.rs 69.23% 1 Missing and 3 partials ⚠️
kernel/src/engine/sync/parquet.rs 40.00% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1272      +/-   ##
==========================================
+ Coverage   84.23%   84.28%   +0.04%     
==========================================
  Files         111      111              
  Lines       27497    27641     +144     
  Branches    27497    27641     +144     
==========================================
+ Hits        23163    23296     +133     
- Misses       3203     3215      +12     
+ Partials     1131     1130       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Sep 9, 2025
@lbhm lbhm marked this pull request as ready for review September 10, 2025 18:51
@lbhm lbhm force-pushed the row-index-metadata-column branch from 58651a0 to b66e562 Compare September 11, 2025 07:57
@lbhm lbhm requested a review from scovich September 11, 2025 07:59
Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach LGTM. All the blockers are lower in the PR stack, I believe.

@lbhm lbhm force-pushed the row-index-metadata-column branch from b66e562 to 38d4963 Compare September 12, 2025 14:19
Comment on lines +572 to +579
Some(MetadataColumnSpec::RowIndex) => {
debug!("Inserting a row index column: {}", field.name());
reorder_indices.push(ReorderIndex::row_index(
requested_position,
Arc::new(field.try_into_arrow()?),
));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the other PR discussions, it seems like we have two possible approaches for resolving name collisions with metadata:

  1. Forbid name collisions (at least vs. table schema). Note that this can still lead to name collisions at file level, because there's no rule forbidding the parquet file to have extra columns in it.
  2. Treat metadata columns as special, and don't even look at the parquet schema when resolving them.

If we go with approach 2/, then this loop even need to check for metadata columns because the requested schema never contains them. If we go with approach 1/... not sure. Do we blow up with an "incompatible parquet file" error of some kind?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with approach 2/, my understanding is that this loop can stay as is because it loops over the fields from the requested schema that couldn't be found in the parquet columns.
Since we modify match_parquet_fields to always consider metadata columns as not matched/found in #1266, this loop should be good as is?

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! would just like to make the row computation optional

@lbhm
Copy link
Collaborator Author

lbhm commented Sep 14, 2025

@nicklan @scovich, I made row index computation optional based on @nicklan's suggestion. Could you take another look and LMK if you agree with the approach?

@lbhm lbhm requested review from nicklan and scovich September 14, 2025 08:38
@zachschuermann zachschuermann self-requested a review September 16, 2025 17:34
@lbhm lbhm force-pushed the row-index-metadata-column branch 3 times, most recently from 333bd45 to 8eceb51 Compare September 18, 2025 08:58
Address review comments

Address 2nd review comments

Make row indexes optional
@lbhm lbhm force-pushed the row-index-metadata-column branch from 8eceb51 to 465784a Compare September 18, 2025 20:01
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks, excited to have this support

Copy link
Member

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i can stamp to unblock and let's take follow-ups but main concern is lacking a test for whenever we have a predicate? (lmk if i've overlooked it)

let starting_offsets = match self.row_group_ordinals {
Some(ordinals) => ordinals
.iter()
.map(|i| self.row_group_row_index_ranges[*i].clone())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need clone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I think raw indexing can cause panics - can we see if we can do better here either docs for invariant or add bounds check etc.?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about panicking in raw indexing! I changed this to get() to ignore invalid indices in optimized builds and added a debug_assert! that should catch improper use of RowIndexBuilder::select_row_groups (i.e., out-of-bounds indices in self.row_group_ordinals) during tests.

As for the clone: I think clone is a reasonable trade-off here, since (1) we are only cloning Ranges (i.e., two i64s) and (2) preventing the clone would require us to modify the underlying vector (costly) or designing something like a FilteredRowIndexIterator that does not collect intermediate results.

pub(crate) fn fixup_parquet_read<T>(
batch: RecordBatch,
requested_ordering: &[ReorderIndex],
row_indexes: Option<&mut <RowIndexBuilder as IntoIterator>::IntoIter>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the <Struct as Trait>::AssociatedType feels a little hard to parse (I understand writing the full type above would be worse) - wonder if we can improve just with type alias?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced type FlattenedRangeIterator<T>.

])?);

let snapshot = Snapshot::builder_for(location).build(engine.as_ref())?;
let scan = snapshot.scan_builder().with_schema(schema).build()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test with a predicate to test the skipping logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added unit tests for row group skipping in arrow_utils.rs. As for integration tests, we first need to address #860 to support predicate push-down in scans, right?

@zachschuermann
Copy link
Member

I quickly vibe-coded a test to derisk merge + release. seems like everything is fine, will follow up with it shortly after release cc @nicklan

@zachschuermann zachschuermann merged commit 9880c2c into delta-io:main Sep 19, 2025
20 of 21 checks passed
@zachschuermann zachschuermann removed the breaking-change Change that require a major version bump label Sep 19, 2025
@zachschuermann zachschuermann changed the title feat!: Introduce row index metadata column feat: Introduce row index metadata column Sep 19, 2025
zachschuermann added a commit that referenced this pull request Sep 25, 2025
…Builder (#1334)

## What changes are proposed in this pull request?

This PR follows up on #1272 and addresses open review comments.

- It removes raw pointer indexing that can theoretically lead to panics.
- We refactor `impl IntoIter` to a fallible `RowInderBuilder::build`
method that verifies that row group ordinals are unique and within
bounds.
- It replaces complex type signatures with more intuitive type aliases.
- It adds more unit tests for `RowIndexBuilder`.

Note: We cannot add integration tests for row index reads with filter
predicates before addressing #860 and implementing predicate pushdown.

## How was this change tested?

New UT.

---------

Co-authored-by: Zach Schuermann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add row index support to the default parquet reader
4 participants