-
Notifications
You must be signed in to change notification settings - Fork 107
feat: support writing domain metadata (2/2) #1275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1275 +/- ##
==========================================
+ Coverage 84.55% 84.67% +0.11%
==========================================
Files 113 113
Lines 28009 28337 +328
Branches 28009 28337 +328
==========================================
+ Hits 23683 23993 +310
- Misses 3189 3206 +17
- Partials 1137 1138 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fd368f5
to
dece4cd
Compare
a568124
to
2c50fd0
Compare
2c50fd0
to
f7319a6
Compare
f7319a6
to
05ae907
Compare
## What changes are proposed in this pull request? <!-- Please clarify what changes you are proposing and why the changes are needed. The purpose of this section is to outline the changes, why they are needed, and how this PR fixes the issue. If the reason for the change is already explained clearly in an issue, then it does not need to be restated here. 1. If you propose a new API or feature, clarify the use case for a new API or feature. 2. If you fix a bug, you can clarify why it is a bug. --> This PR is (1/2) to support writing domain metadata. 1. (this PR) support adding or updating a domain metadata configuration 2. (follow up) support _removing_ a domain metadata, #1275. - resolves #1270 Writing domain metadata, as per the [Delta protocol](https://arc.net/l/quote/fwjwtumy) requires: - The table must be on writer version 7, and a feature named `domainMetadata` must exist in the table's `writerFeatures`. We satisfy this via an explicit check in `Transaction::commit`. - Two overlapping transactions conflict if they both contain a domain metadata action for the same metadata domain. We satisfy this with Kernel's coarse grained conflict detection based on the fact that the `delta_log` dir already has a log file with the name this txn was going to write. No new logic needed in this PR. ## How was this change tested? <!-- Please make sure to add test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested, ideally via a reproducible test documented in the PR description. --> Added new integration tests in `write.rs`.
f4bfcd5
to
ef7ca8a
Compare
ef7ca8a
to
7897691
Compare
kernel/src/actions/mod.rs
Outdated
pub(crate) fn remove(domain: String) -> Self { | ||
Self { | ||
domain, | ||
configuration: String::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this how existing implementations do removes? I kinda figured it would retain the old configuration but haven't dug into this yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great call out! i went back and looked at the protocol again. "Writers should preserve an accurate pre-image of the configuration" -> so we should keep it around actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retaining the old configuration would require log replay though, right? We could accept configuration: Option<String>
as an additional parameter to let the caller provide the old value if already known, but TBH I'm not sure if that improves the current solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that it requires log replay. But as long as we do that once it should be okay. I checked the Java version of the delta kernel for inspiration and they follow this pattern where remove takes only the domain
and they internally determine the pre-image configuration value to preserve. Code ptr: https://github.com/delta-io/delta/blob/master/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionImpl.java#L230C15-L230C35.
The other thing is if we let users pass it in, we likely should validate that the pre-image they provide is correct. Which would require log replay anyways.
Co-authored-by: Zach Schuermann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I left a few nits and one suggestion for improving test coverage.
kernel/src/actions/mod.rs
Outdated
pub(crate) fn remove(domain: String) -> Self { | ||
Self { | ||
domain, | ||
configuration: String::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retaining the old configuration would require log replay though, right? We could accept configuration: Option<String>
as an additional parameter to let the caller provide the old value if already known, but TBH I'm not sure if that improves the current solution.
kernel/src/transaction/mod.rs
Outdated
} | ||
|
||
/// Remove domain metadata from the Delta log. | ||
/// This creates a tombstone to logically delete the specified domain. We don't check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IMO it is good style to have a paragraph break (i.e., empty line) after the one-line summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I'm not following what you mean. Do you mean the "We don't check" should start on the next line perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant
/// Remove domain metadata from the Delta log.
///
/// This creates a tombstone to logically delete the specified domain. We don't check
See this clippy lint for reference.
kernel/src/actions/mod.rs
Outdated
} | ||
|
||
// Create a new DomainMetadata action to remove a domain. | ||
pub(crate) fn remove(domain: String) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like remove
is a somewhat misleading name for a method that is actually a constructor. I don't have a strong opinion here, but maybe something like new_tombstone
communicates the function's intent more clearly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe also test that domain_metadata::domain_metadata_configuration
correctly returns None
when called with a domain that has a tombstone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out. The last test verifies this case. It sets the domain metadata in txn1 then removes it in txn2. Then we assert that:
- The log contains the tombstone record
- Calling get domain metadata for that domain returns
None
What changes are proposed in this pull request?
This PR is (2/2) to support writing domain metadata. It adds support for removing metadata for user-specified domains. As per the Delta specification, for a removed domain metadata "writers should preserve an accurate pre-image of the configuration". Thus for any removals, we need to perform a log replay and restore the original configuration of the domain with the
removed
flag set totrue
. Furthermore, for any removal where the domain does not already exist in the log, we treat this as a no-op and do not write any record to the Delta Log.How was this change tested?
Added new integration tests in
write.rs
.