-
Couldn't load subscription status.
- Fork 2.8k
Preserve DV-specific fields for deletion vectors #14351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Show resolved
Hide resolved
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Outdated
Show resolved
Hide resolved
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Outdated
Show resolved
Hide resolved
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Outdated
Show resolved
Hide resolved
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Outdated
Show resolved
Hide resolved
...k/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java
Outdated
Show resolved
Hide resolved
|
Thanks for pointing me to this PR, @nastra! As I see there is going to be a Anyway, @adawrapub I think something went sideways with the formatting for the older Spark tests, as there are a lot of lines changing the indentation, and also the build fails because of that. |
|
I'd like to echo an observation I write to the other PR: the runtime of this test suite seems long, introducing test dimensions could make it worse. I'd be careful here and only cover different format version where it matters and in fact adds to the test coverage. Another thing: @adawrapub I saw that some of the tests aren't run with the new params (like testTableWithManyPartitionStatisticFile). If I'm not mistaken these are the ones that create the test table not by the @beforeeach functionality but by calling
|
|
Thank you for the feedback @nastra and @gaborkaszab. I have updated the PR based on your comments. |
| return TestHelpers.V2_AND_ABOVE; | ||
| } | ||
|
|
||
| @Parameter private int formatVersion = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @Parameter private int formatVersion = 2; | |
| @Parameter private int formatVersion; |
|
@adawrapub there are still a bunch of tests that have |
|
I took another look and I'm still hesitant about the testing part:
My proposal is to identify the subset of tests where the format version is relevant now, practically the ones having delete files, and add test param for them, but not for the entire suite. Unless I miss something :) |
|
@gaborkaszab It takes me 1min 17 secs to run
This is because calls to @adawrapub also just preserving the DVs won't make the test suite pass automatically, because there's another issue that needs to be fixed so that rewriting table paths fully works with v3, which I mentioned in #14226 (comment)
The scope of #13671 is to fix DVs with rewriting table paths and it might seem ok to only parameterize the test methods that write DVs but it's possible that we're missing other subtle bugs around rewriting (since we're always doing this with v2 only), hence I'm advocating to parameterize the format version at the class level to make sure we're not missing anything. However, I understand the concern around long running test suites with a large test matrix, so we still need to keep this in mind and maybe selectively skip tests for certain format versions. @adawrapub can you please apply the following diff? This should make all test run with v2 and v3 and give you only a handful of failing tests that don't work with DVs |
|
Thanks for the explanation, @nastra ! Sure, let's add the dimension on the suite level! Another takeaway, that I have to check why these run that slow on my MBP :) |
|
@nastra Can you please review again. Updated based on feedback |
|
@adawrapub I'm a little confused on why you don't have failing tests with v3. Did you apply the diff I mentioned fully? |
This is to fix problem reported in #13671 by preserving DV specific fields. Addressed all PR comments
Original PR got closed because of issues with branch. Please let me know if there is a better way to handle this
cc @anuragmantri @nastra and @amogh-jahagirdar