Skip to content

Conversation

MannDP
Copy link
Collaborator

@MannDP MannDP commented Sep 9, 2025

What changes are proposed in this pull request?

This PR is (2/2) to support writing domain metadata. It adds support for removing metadata for user-specified domains. As per the Delta specification, for a removed domain metadata "writers should preserve an accurate pre-image of the configuration". Thus for any removals, we need to perform a log replay and restore the original configuration of the domain with the removed flag set to true. Furthermore, for any removal where the domain does not already exist in the log, we treat this as a no-op and do not write any record to the Delta Log.

How was this change tested?

Added new integration tests in write.rs.

Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 97.72727% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.67%. Comparing base (05e6698) to head (eae4394).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/transaction/mod.rs 96.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1275      +/-   ##
==========================================
+ Coverage   84.55%   84.67%   +0.11%     
==========================================
  Files         113      113              
  Lines       28009    28337     +328     
  Branches    28009    28337     +328     
==========================================
+ Hits        23683    23993     +310     
- Misses       3189     3206      +17     
- Partials     1137     1138       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from fd368f5 to dece4cd Compare September 11, 2025 00:23
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch 2 times, most recently from a568124 to 2c50fd0 Compare September 11, 2025 19:03
@MannDP MannDP changed the title <WIP> feat: support writing domain metadata (2/2) feat: support writing domain metadata (2/2) Sep 11, 2025
@github-actions github-actions bot added the breaking-change Change that require a major version bump label Sep 11, 2025
@MannDP MannDP marked this pull request as ready for review September 11, 2025 19:05
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from 2c50fd0 to f7319a6 Compare September 17, 2025 16:13
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from f7319a6 to 05ae907 Compare September 17, 2025 16:18
MannDP added a commit that referenced this pull request Sep 24, 2025
## What changes are proposed in this pull request?
<!--
Please clarify what changes you are proposing and why the changes are
needed.
The purpose of this section is to outline the changes, why they are
needed, and how this PR fixes the issue.
If the reason for the change is already explained clearly in an issue,
then it does not need to be restated here.
1. If you propose a new API or feature, clarify the use case for a new
API or feature.
  2. If you fix a bug, you can clarify why it is a bug.
-->
This PR is (1/2) to support writing domain metadata.
1. (this PR) support adding or updating a domain metadata configuration
2. (follow up) support _removing_ a domain metadata,
#1275.

- resolves #1270

Writing domain metadata, as per the [Delta
protocol](https://arc.net/l/quote/fwjwtumy) requires:
- The table must be on writer version 7, and a feature named
`domainMetadata` must exist in the table's `writerFeatures`. We satisfy
this via an explicit check in `Transaction::commit`.
- Two overlapping transactions conflict if they both contain a domain
metadata action for the same metadata domain. We satisfy this with
Kernel's coarse grained conflict detection based on the fact that the
`delta_log` dir already has a log file with the name this txn was going
to write. No new logic needed in this PR.

## How was this change tested?
<!--
Please make sure to add test cases that check the changes thoroughly
including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please
clarify how you tested, ideally via a reproducible test documented in
the PR description.
-->
Added new integration tests in `write.rs`.
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch 2 times, most recently from f4bfcd5 to ef7ca8a Compare September 24, 2025 20:35
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from ef7ca8a to 7897691 Compare September 24, 2025 20:37
pub(crate) fn remove(domain: String) -> Self {
Self {
domain,
configuration: String::new(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this how existing implementations do removes? I kinda figured it would retain the old configuration but haven't dug into this yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great call out! i went back and looked at the protocol again. "Writers should preserve an accurate pre-image of the configuration" -> so we should keep it around actually.

Copy link
Collaborator

@lbhm lbhm Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining the old configuration would require log replay though, right? We could accept configuration: Option<String> as an additional parameter to let the caller provide the old value if already known, but TBH I'm not sure if that improves the current solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that it requires log replay. But as long as we do that once it should be okay. I checked the Java version of the delta kernel for inspiration and they follow this pattern where remove takes only the domain and they internally determine the pre-image configuration value to preserve. Code ptr: https://github.com/delta-io/delta/blob/master/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionImpl.java#L230C15-L230C35.

The other thing is if we let users pass it in, we likely should validate that the pre-image they provide is correct. Which would require log replay anyways.

Copy link
Collaborator

@lbhm lbhm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I left a few nits and one suggestion for improving test coverage.

pub(crate) fn remove(domain: String) -> Self {
Self {
domain,
configuration: String::new(),
Copy link
Collaborator

@lbhm lbhm Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining the old configuration would require log replay though, right? We could accept configuration: Option<String> as an additional parameter to let the caller provide the old value if already known, but TBH I'm not sure if that improves the current solution.

}

/// Remove domain metadata from the Delta log.
/// This creates a tombstone to logically delete the specified domain. We don't check
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMO it is good style to have a paragraph break (i.e., empty line) after the one-line summary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not following what you mean. Do you mean the "We don't check" should start on the next line perhaps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant

/// Remove domain metadata from the Delta log.
///
/// This creates a tombstone to logically delete the specified domain. We don't check

See this clippy lint for reference.

}

// Create a new DomainMetadata action to remove a domain.
pub(crate) fn remove(domain: String) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like remove is a somewhat misleading name for a method that is actually a constructor. I don't have a strong opinion here, but maybe something like new_tombstone communicates the function's intent more clearly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe also test that domain_metadata::domain_metadata_configuration correctly returns None when called with a domain that has a tombstone?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out. The last test verifies this case. It sets the domain metadata in txn1 then removes it in txn2. Then we assert that:

  1. The log contains the tombstone record
  2. Calling get domain metadata for that domain returns None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that require a major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support domain metadata writes
3 participants