Skip to content

Conversation

lbhm
Copy link
Collaborator

@lbhm lbhm commented Sep 12, 2025

What changes are proposed in this pull request?

This PR cleans up some nits as a follow-up to #1239.

How was this change tested?

Mostly refactoring. I added one more UT to verify the changes to the RowTrackingVisitor.

Comment on lines +927 to +934
arrow::array::{
Array, BooleanArray, Int32Array, Int64Array, ListArray, ListBuilder, MapBuilder,
MapFieldNames, RecordBatch, StringArray, StringBuilder, StructArray,
},
arrow::datatypes::{DataType as ArrowDataType, Field, Schema},
arrow::json::ReaderBuilder,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

codecov bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 91.48936% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.12%. Comparing base (8089117) to head (a83f2f0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/row_tracking.rs 94.28% 0 Missing and 2 partials ⚠️
kernel/src/transaction/mod.rs 83.33% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1291      +/-   ##
==========================================
- Coverage   84.12%   84.12%   -0.01%     
==========================================
  Files         111      111              
  Lines       26893    26915      +22     
  Branches    26893    26915      +22     
==========================================
+ Hits        22625    22641      +16     
  Misses       3161     3161              
- Partials     1107     1113       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +224 to +225
.chain(add_actions)
.chain(set_transaction_actions);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 385 to 386
let commit_versions: Vec<i64> =
vec![commit_version.try_into()?; base_row_ids.len()];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +360 to +361
let mut row_tracking_visitor =
RowTrackingVisitor::new(row_id_high_water_mark, Some(self.add_files_metadata.len()));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +84 to +85
/// Computed base row IDs of the visited actions, organized by batch
pub(crate) base_row_id_batches: Vec<Vec<i64>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the overall memory usage the same? or are we 'keeping' more than we need to with this approach? old one was just tracking one layer up right?

Copy link
Collaborator Author

@lbhm lbhm Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in both cases we need to materialize the entire Vec<Vec<i64>> because we need to know the final row ID row high watermark before we return the (lazily evaluated) add actions iterator. With this approach, we are even saving a clone() on the inner Vec (see lines 360/367 in the old transaction/mod.rs).

@lbhm lbhm force-pushed the row-tracking-write-cleanup branch from 82f9949 to 7617594 Compare September 17, 2025 07:14
@lbhm lbhm requested a review from zachschuermann September 17, 2025 07:26
@lbhm lbhm force-pushed the row-tracking-write-cleanup branch from 7617594 to 6855979 Compare September 18, 2025 12:04
Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lbhm lbhm force-pushed the row-tracking-write-cleanup branch from 6855979 to 52102eb Compare September 18, 2025 19:03
Copy link
Member

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zachschuermann zachschuermann merged commit 764cb9d into delta-io:main Sep 18, 2025
21 checks passed
@lbhm lbhm deleted the row-tracking-write-cleanup branch September 18, 2025 19:50
aleksandarskrbic pushed a commit to aleksandarskrbic/delta-kernel-rs that referenced this pull request Sep 21, 2025
## What changes are proposed in this pull request?

This PR cleans up some nits as a follow-up to delta-io#1239.

## How was this change tested?

Mostly refactoring. I added one more UT to verify the changes to the
`RowTrackingVisitor`.

Co-authored-by: Zach Schuermann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants