-
Notifications
You must be signed in to change notification settings - Fork 1.6k
WIP: Upgrade to arrow 56.1.0 #17275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: Upgrade to arrow 56.1.0 #17275
Conversation
+-----------------------------------+-----------------+---------------------+------+------------------+ | ||
| filename | file_size_bytes | metadata_size_bytes | hits | extra | | ||
+-----------------------------------+-----------------+---------------------+------+------------------+ | ||
| alltypes_plain.parquet | 1851 | 10181 | 2 | page_index=false | | ||
| alltypes_tiny_pages.parquet | 454233 | 881634 | 2 | page_index=true | | ||
| alltypes_tiny_pages.parquet | 454233 | 881418 | 2 | page_index=true | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know why the in-memory size of the ParquetMetadata has decreased, but it seems like a good improvement to me
@@ -535,8 +535,8 @@ async fn load_page_index<T: AsyncFileReader>( | |||
if missing_column_index || missing_offset_index { | |||
let m = Arc::try_unwrap(Arc::clone(parquet_metadata)) | |||
.unwrap_or_else(|e| e.as_ref().clone()); | |||
let mut reader = | |||
ParquetMetaDataReader::new_with_metadata(m).with_page_indexes(true); | |||
let mut reader = ParquetMetaDataReader::new_with_metadata(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to this change from @kczimm
datafusion-spark = { path = "datafusion/spark", version = "49.0.0" } | ||
datafusion-sql = { path = "datafusion/sql", version = "49.0.0" } | ||
datafusion-substrait = { path = "datafusion/substrait", version = "49.0.0" } | ||
datafusion = { path = "datafusion/core", version = "49.0.1", default-features = false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive by change to update all versions in Cargo.toml to the latest
@@ -1314,7 +1314,7 @@ physical_plan | |||
11)┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | |||
12)│ DataSourceExec ││ DataSourceExec │ | |||
13)│ -------------------- ││ -------------------- │ | |||
14)│ bytes: 6040 ││ bytes: 6040 │ | |||
14)│ bytes: 5932 ││ bytes: 5932 │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the in memory size may have improved due to
And the Vec doesn't have the same minimum alignment / size that the builders had
@@ -724,7 +724,7 @@ mod tests { | |||
.unwrap(); | |||
|
|||
let size = get_record_batch_memory_size(&batch); | |||
assert_eq!(size, 8320); | |||
assert_eq!(size, 8208); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Also due to Use
Vec
directly in builders arrow-rs#7984
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
|
🤖 |
🤖: Benchmark completed Details
|
Which issue does this PR close?
56.1.0
(August 2025) arrow-rs#7837Rationale for this change
Upgrade to the latest arrow release
What changes are included in this PR?
56.1.0
release arrow-rs#8202)TODO:
Are these changes tested?
Functionally By CI
I will also run benchmarks against this branch
Are there any user-facing changes?