Skip to content

Conversation

zeevm
Copy link
Contributor

@zeevm zeevm commented Aug 31, 2025

Description

add deletion_vector_descriptor method to get the DV descriptor

@Copilot Copilot AI review requested due to automatic review settings August 31, 2025 04:50
@zeevm zeevm requested review from roeap, rtyler and hntd187 as code owners August 31, 2025 04:50
@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Aug 31, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new public method deletion_vector_descriptor to the LogicalFileView struct that provides access to the deletion vector descriptor. This enhances the API by allowing external consumers to retrieve deletion vector descriptors without directly accessing the internal deletion vector implementation.

  • Adds a new public method deletion_vector_descriptor that returns Option<DeletionVectorDescriptor>

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

.and_then(|c| Scalar::from_array(c.as_ref(), self.index))
.map(|s| round_ms_datetimes(s, &ceil_datetime))
}

Copy link
Preview

Copilot AI Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new public method deletion_vector_descriptor lacks documentation. Consider adding a doc comment explaining what the method returns and when it would be None.

Copilot uses AI. Check for mistakes.

@rtyler
Copy link
Member

rtyler commented Sep 2, 2025

@zeevm Would you share more about the use-case on bubbling this transaction log information further up to the consumer?

@zeevm
Copy link
Contributor Author

zeevm commented Sep 2, 2025

@zeevm Would you share more about the use-case on bubbling this transaction log information further up to the consumer?

@rtyler sure, our query engine has several "Delta query" modes, one of the modes builds an internal representation of the Delta transaction log to accelerate queries, this internal representation also includes the deletion vectors of course, hence we need this exposed.

@roeap
Copy link
Collaborator

roeap commented Sep 3, 2025

qq: @zeevm - is this by any chance related to caching deletion vectors read from disk? If so there may be better ways to achieve this - i.e. doing this on the kernel engine side. As it so happens this is something I am working on right now as well.

@zeevm
Copy link
Contributor Author

zeevm commented Sep 3, 2025

@roeap we're caching the entire transaction log, including the DVs in a proprietary internal format.
Don't understand what you mean by doing it on the delta-kernel side, the delta-rs crate doesn't (currently, hope it changes...) expose the internal delta-kernel snapshot, so it has to be a delta-rs exposed functionality.

@roeap
Copy link
Collaborator

roeap commented Sep 3, 2025

@zeevm - yeah, if you are caching the whole log data this is maybe a bit different.

what you mean by doing it on the delta-kernel side

I should have been more specific. this would mean having a custom engine implementation for kernel which all IO goes through. I am currently exploring a datafusion engine and will be adding some caching here soon. I can ping you once this gets wired up to see where one could inject custom caches etc ...

@zeevm
Copy link
Contributor Author

zeevm commented Sep 3, 2025

@roeap again, I don't understand how a custom kernel engine would be used here, delta-rs doesn't exposed the underlying delta-kernel implementation so there's no way for me to "inject" any customer engine into delta-kernel.
Or ware you suggesting I drop delta-rs and move to delta-kernel completely?

@roeap
Copy link
Collaborator

roeap commented Sep 3, 2025

@zeevm - certainly not. And you are right, right now, you can only customize logstore / objectstore. However with the ongoing work around moving the scans to leverage the kernel metadata, we should also allow a custom engine implementation and for the caching efforts also a custom cache which is used in conjunction the default engine used in delta-rs.

w.r.t. to this specific PR, while we are hoping to hide more and more delta specific things, right now we already expose the raw deletion vector data, so we should be able to expose the descriptor. We are however hoping that we can eventually just expose all of this as arrow.

In a way we do this already since much of this is just a view into the arrow data.

@zeevm
Copy link
Contributor Author

zeevm commented Sep 3, 2025

@roeap yep, it's just nicer to already get a parsed descriptor than the raw view and duplicate the code on my end to get the nicely structured descriptor, so for time being until more low level functionality is exposed can we merge this?

Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some rustdoc, I am comfortable adding this now with the understanding that we can/will yoink it in the future once we're further along

Copy link

codecov bot commented Sep 6, 2025

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.36%. Comparing base (747804b) to head (7dccd3c).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/kernel/snapshot/iterators.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3721      +/-   ##
==========================================
- Coverage   75.38%   75.36%   -0.02%     
==========================================
  Files         144      145       +1     
  Lines       43661    43949     +288     
  Branches    43661    43949     +288     
==========================================
+ Hits        32912    33123     +211     
- Misses       9157     9222      +65     
- Partials     1592     1604      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zeevm
Copy link
Contributor Author

zeevm commented Sep 15, 2025

@rtyler anything blocking this from being merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants