-
Notifications
You must be signed in to change notification settings - Fork 337
Implement limit push down for IcebergTableProvider
#1673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| record_batch_stream_builder.with_row_groups(selected_row_group_indices); | ||
| } | ||
|
|
||
| if let Some(limit) = task.limit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we enable with_page_index as suggested by doc: https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.ArrowReaderBuilder.html#method.with_limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I extended should_load_page_index logic and ArrowReaderOptions is initialized with with_page_index(should_load_page_index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @krinart
|
Happy to help @ZENOTME and thanks for the feedback! |
Original PR: #19 Upstream PR: apache#1673
|
Hi. Anything else I can do on my side to get this merged? |
Which issue does this PR close?
N/A
What changes are included in this PR?
Previously
_limitwas ignored inIcebergTableProvider::scan:iceberg-rust/crates/integrations/datafusion/src/table/mod.rs
Lines 149 to 163 in aad9e2e
This PR propagates limit all the way to the
ArrowReaderBuilder.Note: limit push down is only applied to each batch which means that
IcebergTableProvider::scanmay potentially return more records than specified by limit.Which is OK according to
TableProvider::scandocumentation:Are these changes tested?
Unit tests