-
Notifications
You must be signed in to change notification settings - Fork 107
refactor: Row tracking write cleanup #1291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Row tracking write cleanup #1291
Conversation
arrow::array::{ | ||
Array, BooleanArray, Int32Array, Int64Array, ListArray, ListBuilder, MapBuilder, | ||
MapFieldNames, RecordBatch, StringArray, StringBuilder, StructArray, | ||
}, | ||
arrow::datatypes::{DataType as ArrowDataType, Field, Schema}, | ||
arrow::json::ReaderBuilder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1239 (comment)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1291 +/- ##
==========================================
- Coverage 84.12% 84.12% -0.01%
==========================================
Files 111 111
Lines 26893 26915 +22
Branches 26893 26915 +22
==========================================
+ Hits 22625 22641 +16
Misses 3161 3161
- Partials 1107 1113 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
.chain(add_actions) | ||
.chain(set_transaction_actions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1239 (comment)
kernel/src/transaction/mod.rs
Outdated
let commit_versions: Vec<i64> = | ||
vec![commit_version.try_into()?; base_row_ids.len()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1239 (comment)
let mut row_tracking_visitor = | ||
RowTrackingVisitor::new(row_id_high_water_mark, Some(self.add_files_metadata.len())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1239 (comment)
/// Computed base row IDs of the visited actions, organized by batch | ||
pub(crate) base_row_id_batches: Vec<Vec<i64>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the overall memory usage the same? or are we 'keeping' more than we need to with this approach? old one was just tracking one layer up right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, in both cases we need to materialize the entire Vec<Vec<i64>>
because we need to know the final row ID row high watermark before we return the (lazily evaluated) add actions iterator. With this approach, we are even saving a clone()
on the inner Vec
(see lines 360/367 in the old transaction/mod.rs
).
82f9949
to
7617594
Compare
7617594
to
6855979
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
6855979
to
52102eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
## What changes are proposed in this pull request? This PR cleans up some nits as a follow-up to delta-io#1239. ## How was this change tested? Mostly refactoring. I added one more UT to verify the changes to the `RowTrackingVisitor`. Co-authored-by: Zach Schuermann <[email protected]>
What changes are proposed in this pull request?
This PR cleans up some nits as a follow-up to #1239.
How was this change tested?
Mostly refactoring. I added one more UT to verify the changes to the
RowTrackingVisitor
.