WIP: Upgrade to arrow 56.1.0 #17275

alamb · 2025-08-21T18:42:00Z

Which issue does this PR close?

Related to Release arrow-rs / parquet Minor version 56.1.0 (August 2025) arrow-rs#7837

Rationale for this change

Upgrade to the latest arrow release

What changes are included in this PR?

Upgrade to 56.1.0 (preview in Prepare for 56.1.0 release arrow-rs#8202)
Update to remove deprecated APIs

TODO:

. Add new Parquet option to control the size of the predicate cache
File follow on ticket to hook in new parquet statistics

Are these changes tested?

Functionally By CI
I will also run benchmarks against this branch

Are there any user-facing changes?

alamb · 2025-08-21T19:07:06Z

datafusion-cli/src/main.rs

        +-----------------------------------+-----------------+---------------------+------+------------------+
        | filename                          | file_size_bytes | metadata_size_bytes | hits | extra            |
        +-----------------------------------+-----------------+---------------------+------+------------------+
        | alltypes_plain.parquet            | 1851            | 10181               | 2    | page_index=false |
-        | alltypes_tiny_pages.parquet       | 454233          | 881634              | 2    | page_index=true  |
+        | alltypes_tiny_pages.parquet       | 454233          | 881418              | 2    | page_index=true  |


I don't really know why the in-memory size of the ParquetMetadata has decreased, but it seems like a good improvement to me

alamb · 2025-08-21T19:07:47Z

datafusion/datasource-parquet/src/opener.rs

@@ -535,8 +535,8 @@ async fn load_page_index<T: AsyncFileReader>(
    if missing_column_index || missing_offset_index {
        let m = Arc::try_unwrap(Arc::clone(parquet_metadata))
            .unwrap_or_else(|e| e.as_ref().clone());
-        let mut reader =
-            ParquetMetaDataReader::new_with_metadata(m).with_page_indexes(true);
+        let mut reader = ParquetMetaDataReader::new_with_metadata(m)


Due to this change from @kczimm

Optionally read parquet page indexes arrow-rs#8070

alamb · 2025-08-21T19:08:34Z

Cargo.toml

-datafusion-spark = { path = "datafusion/spark", version = "49.0.0" }
-datafusion-sql = { path = "datafusion/sql", version = "49.0.0" }
-datafusion-substrait = { path = "datafusion/substrait", version = "49.0.0" }
+datafusion = { path = "datafusion/core", version = "49.0.1", default-features = false }


drive by change to update all versions in Cargo.toml to the latest

alamb · 2025-08-21T19:12:00Z

datafusion/sqllogictest/test_files/explain_tree.slt

@@ -1314,7 +1314,7 @@ physical_plan
 11)┌─────────────┴─────────────┐┌─────────────┴─────────────┐
 12)│       DataSourceExec      ││       DataSourceExec      │
 13)│    --------------------   ││    --------------------   │
-14)│        bytes: 6040        ││        bytes: 6040        │
+14)│        bytes: 5932        ││        bytes: 5932        │


I believe the in memory size may have improved due to

Use Vec directly in builders arrow-rs#7984

And the Vec doesn't have the same minimum alignment / size that the builders had

alamb · 2025-08-21T19:12:45Z

datafusion/physical-plan/src/spill/mod.rs

@@ -724,7 +724,7 @@ mod tests {
        .unwrap();

        let size = get_record_batch_memory_size(&batch);
-        assert_eq!(size, 8320);
+        assert_eq!(size, 8208);


Also due to Use Vec directly in builders arrow-rs#7984

alamb · 2025-08-21T21:16:29Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: clickbench_pushdown
Results will be posted here when complete

alamb · 2025-08-21T21:46:38Z

🤖: Benchmark completed

Details

Comparing HEAD and alamb_update_arrow
--------------------
Benchmark clickbench_pushdown.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.12 ms │            2.16 ms │     no change │
│ QQuery 1     │    53.64 ms │           53.34 ms │     no change │
│ QQuery 2     │   137.06 ms │          139.44 ms │     no change │
│ QQuery 3     │   162.87 ms │          166.12 ms │     no change │
│ QQuery 4     │  1064.08 ms │         1032.53 ms │     no change │
│ QQuery 5     │  1486.94 ms │         1487.45 ms │     no change │
│ QQuery 6     │     2.17 ms │            2.15 ms │     no change │
│ QQuery 7     │    76.67 ms │           73.51 ms │     no change │
│ QQuery 8     │  1433.99 ms │         1457.77 ms │     no change │
│ QQuery 9     │  1795.39 ms │         1791.48 ms │     no change │
│ QQuery 10    │   420.27 ms │          488.87 ms │  1.16x slower │
│ QQuery 11    │   500.34 ms │          555.10 ms │  1.11x slower │
│ QQuery 12    │  1789.42 ms │         1521.70 ms │ +1.18x faster │
│ QQuery 13    │  2698.81 ms │         2426.41 ms │ +1.11x faster │
│ QQuery 14    │  1898.26 ms │         1644.37 ms │ +1.15x faster │
│ QQuery 15    │  1211.08 ms │         1179.20 ms │     no change │
│ QQuery 16    │  2643.67 ms │         2615.41 ms │     no change │
│ QQuery 17    │  2617.48 ms │         2620.64 ms │     no change │
│ QQuery 18    │  5366.43 ms │         4887.05 ms │ +1.10x faster │
│ QQuery 19    │   125.17 ms │          149.12 ms │  1.19x slower │
│ QQuery 20    │  2109.40 ms │         1932.01 ms │ +1.09x faster │
│ QQuery 21    │  2442.36 ms │         2322.44 ms │     no change │
│ QQuery 22    │  5457.70 ms │         4063.91 ms │ +1.34x faster │
│ QQuery 23    │  2056.23 ms │         1470.65 ms │ +1.40x faster │
│ QQuery 24    │   291.14 ms │          252.50 ms │ +1.15x faster │
│ QQuery 25    │  1032.66 ms │          649.43 ms │ +1.59x faster │
│ QQuery 26    │   549.09 ms │          380.68 ms │ +1.44x faster │
│ QQuery 27    │  4127.01 ms │         2982.51 ms │ +1.38x faster │
│ QQuery 28    │ 26766.22 ms │        24180.99 ms │ +1.11x faster │
│ QQuery 29    │   971.85 ms │          956.54 ms │     no change │
│ QQuery 30    │  2164.88 ms │         2106.25 ms │     no change │
│ QQuery 31    │  2079.48 ms │         2061.44 ms │     no change │
│ QQuery 32    │  4410.45 ms │         4578.39 ms │     no change │
│ QQuery 33    │  5717.87 ms │         5584.02 ms │     no change │
│ QQuery 34    │  5719.25 ms │         5811.24 ms │     no change │
│ QQuery 35    │  1978.73 ms │         1989.30 ms │     no change │
│ QQuery 36    │    26.85 ms │           26.37 ms │     no change │
│ QQuery 37    │    25.78 ms │           26.11 ms │     no change │
│ QQuery 38    │    25.72 ms │           25.21 ms │     no change │
│ QQuery 39    │    25.86 ms │           25.17 ms │     no change │
│ QQuery 40    │    26.79 ms │           26.81 ms │     no change │
│ QQuery 41    │    26.15 ms │           25.77 ms │     no change │
│ QQuery 42    │    25.53 ms │           25.14 ms │     no change │
└──────────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 93542.84ms │
│ Total Time (alamb_update_arrow)   │ 85796.69ms │
│ Average Time (HEAD)               │  2175.41ms │
│ Average Time (alamb_update_arrow) │  1995.27ms │
│ Queries Faster                    │         12 │
│ Queries Slower                    │          3 │
│ Queries with No Change            │         28 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘

alamb · 2025-08-21T21:46:41Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: tpch_topk
Results will be posted here when complete

alamb · 2025-08-21T21:46:44Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_update_arrow
Results will be posted here when complete

alamb · 2025-08-21T22:46:44Z

🤖: Benchmark completed

Details

group                                         alamb_update_arrow                     main
-----                                         ------------------                     ----
logical_aggregate_with_join                   1.00    625.5±9.69µs        ? ?/sec    1.02    638.9±5.74µs        ? ?/sec
logical_select_all_from_1000                  1.01     11.4±0.08ms        ? ?/sec    1.00     11.2±0.05ms        ? ?/sec
logical_select_one_from_700                   1.00    410.3±2.69µs        ? ?/sec    1.03    422.5±1.78µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    369.5±1.82µs        ? ?/sec    1.03    380.9±3.30µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00    353.6±1.68µs        ? ?/sec    1.03    365.4±3.88µs        ? ?/sec
physical_intersection                         1.00    825.2±3.33µs        ? ?/sec    1.02    839.6±4.30µs        ? ?/sec
physical_join_consider_sort                   1.00  1372.4±11.75µs        ? ?/sec    1.03  1408.5±14.20µs        ? ?/sec
physical_join_distinct                        1.00    344.1±0.94µs        ? ?/sec    1.03    356.1±4.54µs        ? ?/sec
physical_many_self_joins                      1.00     10.0±0.03ms        ? ?/sec    1.05     10.5±0.03ms        ? ?/sec
physical_plan_clickbench_all                  1.00    188.9±2.79ms        ? ?/sec    1.01    190.0±2.58ms        ? ?/sec
physical_plan_clickbench_q1                   1.00      2.5±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_clickbench_q10                  1.03      3.5±0.20ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_clickbench_q11                  1.01      3.6±0.06ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
physical_plan_clickbench_q12                  1.01      3.8±0.05ms        ? ?/sec    1.00      3.7±0.03ms        ? ?/sec
physical_plan_clickbench_q13                  1.00      3.4±0.03ms        ? ?/sec    1.00      3.4±0.10ms        ? ?/sec
physical_plan_clickbench_q14                  1.00      3.6±0.05ms        ? ?/sec    1.00      3.6±0.08ms        ? ?/sec
physical_plan_clickbench_q15                  1.02      3.5±0.05ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
physical_plan_clickbench_q16                  1.00      3.3±0.03ms        ? ?/sec    1.01      3.3±0.03ms        ? ?/sec
physical_plan_clickbench_q17                  1.01      3.4±0.03ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
physical_plan_clickbench_q18                  1.02      3.0±0.02ms        ? ?/sec    1.00      2.9±0.04ms        ? ?/sec
physical_plan_clickbench_q19                  1.00      3.8±0.03ms        ? ?/sec    1.00      3.8±0.03ms        ? ?/sec
physical_plan_clickbench_q2                   1.01      3.0±0.03ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q20                  1.00      2.6±0.02ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
physical_plan_clickbench_q21                  1.00      3.0±0.03ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
physical_plan_clickbench_q22                  1.00      3.6±0.03ms        ? ?/sec    1.00      3.6±0.02ms        ? ?/sec
physical_plan_clickbench_q23                  1.00      3.9±0.04ms        ? ?/sec    1.01      3.9±0.02ms        ? ?/sec
physical_plan_clickbench_q24                  1.01      4.4±0.05ms        ? ?/sec    1.00      4.4±0.03ms        ? ?/sec
physical_plan_clickbench_q25                  1.01      3.2±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q26                  1.01      3.0±0.03ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_clickbench_q27                  1.01      3.2±0.04ms        ? ?/sec    1.00      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q28                  1.00      3.9±0.03ms        ? ?/sec    1.00      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q29                  1.01      4.7±0.08ms        ? ?/sec    1.00      4.6±0.05ms        ? ?/sec
physical_plan_clickbench_q3                   1.00      2.9±0.03ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
physical_plan_clickbench_q30                  1.01     13.2±0.25ms        ? ?/sec    1.00     13.2±0.09ms        ? ?/sec
physical_plan_clickbench_q31                  1.00      3.9±0.04ms        ? ?/sec    1.00      3.9±0.03ms        ? ?/sec
physical_plan_clickbench_q32                  1.00      3.9±0.05ms        ? ?/sec    1.01      3.9±0.04ms        ? ?/sec
physical_plan_clickbench_q33                  1.00      3.4±0.03ms        ? ?/sec    1.01      3.4±0.04ms        ? ?/sec
physical_plan_clickbench_q34                  1.01      3.1±0.05ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
physical_plan_clickbench_q35                  1.00      3.2±0.02ms        ? ?/sec    1.00      3.2±0.03ms        ? ?/sec
physical_plan_clickbench_q36                  1.01      3.9±0.06ms        ? ?/sec    1.00      3.9±0.03ms        ? ?/sec
physical_plan_clickbench_q37                  1.00      3.9±0.04ms        ? ?/sec    1.03      4.0±0.10ms        ? ?/sec
physical_plan_clickbench_q38                  1.00      3.9±0.03ms        ? ?/sec    1.01      4.0±0.06ms        ? ?/sec
physical_plan_clickbench_q39                  1.00      3.7±0.04ms        ? ?/sec    1.01      3.8±0.09ms        ? ?/sec
physical_plan_clickbench_q4                   1.00      2.6±0.03ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
physical_plan_clickbench_q40                  1.00      4.4±0.03ms        ? ?/sec    1.02      4.4±0.07ms        ? ?/sec
physical_plan_clickbench_q41                  1.00      4.0±0.06ms        ? ?/sec    1.01      4.0±0.04ms        ? ?/sec
physical_plan_clickbench_q42                  1.00      3.9±0.05ms        ? ?/sec    1.01      3.9±0.07ms        ? ?/sec
physical_plan_clickbench_q43                  1.01      4.2±0.07ms        ? ?/sec    1.00      4.2±0.04ms        ? ?/sec
physical_plan_clickbench_q44                  1.00      2.7±0.04ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_clickbench_q45                  1.00      2.7±0.03ms        ? ?/sec    1.02      2.8±0.05ms        ? ?/sec
physical_plan_clickbench_q46                  1.00      3.2±0.02ms        ? ?/sec    1.01      3.2±0.04ms        ? ?/sec
physical_plan_clickbench_q47                  1.00      3.8±0.05ms        ? ?/sec    1.01      3.9±0.05ms        ? ?/sec
physical_plan_clickbench_q48                  1.00      4.5±0.07ms        ? ?/sec    1.00      4.5±0.07ms        ? ?/sec
physical_plan_clickbench_q49                  1.01      4.8±0.09ms        ? ?/sec    1.00      4.8±0.08ms        ? ?/sec
physical_plan_clickbench_q5                   1.00      2.8±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_clickbench_q50                  1.00      4.2±0.04ms        ? ?/sec    1.01      4.3±0.06ms        ? ?/sec
physical_plan_clickbench_q51                  1.00      3.3±0.03ms        ? ?/sec    1.00      3.3±0.04ms        ? ?/sec
physical_plan_clickbench_q6                   1.01      2.8±0.03ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_clickbench_q7                   1.02      2.6±0.02ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_clickbench_q8                   1.01      3.5±0.05ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
physical_plan_clickbench_q9                   1.01      3.3±0.02ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
physical_plan_tpcds_all                       1.00   1020.7±4.89ms        ? ?/sec    1.00   1016.2±4.29ms        ? ?/sec
physical_plan_tpch_all                        1.00     62.1±0.18ms        ? ?/sec    1.00     61.8±0.27ms        ? ?/sec
physical_plan_tpch_q1                         1.00      2.0±0.03ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
physical_plan_tpch_q10                        1.01      3.8±0.03ms        ? ?/sec    1.00      3.8±0.02ms        ? ?/sec
physical_plan_tpch_q11                        1.01      3.3±0.01ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
physical_plan_tpch_q12                        1.00  1811.5±11.50µs        ? ?/sec    1.00  1811.6±19.56µs        ? ?/sec
physical_plan_tpch_q13                        1.00   1446.9±7.19µs        ? ?/sec    1.00   1442.2±8.85µs        ? ?/sec
physical_plan_tpch_q14                        1.00  1955.0±12.84µs        ? ?/sec    1.00  1952.8±11.32µs        ? ?/sec
physical_plan_tpch_q16                        1.02      2.5±0.06ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
physical_plan_tpch_q17                        1.01      2.4±0.05ms        ? ?/sec    1.00      2.4±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.00      2.7±0.00ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
physical_plan_tpch_q19                        1.01      3.2±0.04ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
physical_plan_tpch_q2                         1.00      5.5±0.06ms        ? ?/sec    1.00      5.5±0.01ms        ? ?/sec
physical_plan_tpch_q20                        1.00      3.1±0.00ms        ? ?/sec    1.01      3.1±0.06ms        ? ?/sec
physical_plan_tpch_q21                        1.00      4.1±0.01ms        ? ?/sec    1.00      4.1±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.00      2.7±0.02ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.00      2.5±0.01ms        ? ?/sec    1.01      2.6±0.00ms        ? ?/sec
physical_plan_tpch_q4                         1.00   1501.8±2.72µs        ? ?/sec    1.01   1519.2±5.89µs        ? ?/sec
physical_plan_tpch_q5                         1.00      3.1±0.01ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
physical_plan_tpch_q6                         1.01   868.0±11.63µs        ? ?/sec    1.00    863.3±3.82µs        ? ?/sec
physical_plan_tpch_q7                         1.00      4.3±0.01ms        ? ?/sec    1.01      4.3±0.09ms        ? ?/sec
physical_plan_tpch_q8                         1.01      5.1±0.01ms        ? ?/sec    1.00      5.1±0.01ms        ? ?/sec
physical_plan_tpch_q9                         1.00      4.1±0.01ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
physical_select_aggregates_from_200           1.01     16.8±0.06ms        ? ?/sec    1.00     16.7±0.03ms        ? ?/sec
physical_select_all_from_1000                 1.00     24.7±0.15ms        ? ?/sec    1.00     24.7±0.07ms        ? ?/sec
physical_select_one_from_700                  1.00   1053.0±5.55µs        ? ?/sec    1.06  1116.6±11.98µs        ? ?/sec
physical_sorted_union_orderby                 1.00     41.2±0.13ms        ? ?/sec    1.01     41.4±0.13ms        ? ?/sec
physical_theta_join_consider_sort             1.00  1744.8±74.52µs        ? ?/sec    1.01  1770.8±13.60µs        ? ?/sec
physical_unnest_to_join                       1.00   1289.4±3.11µs        ? ?/sec    1.02   1317.8±6.54µs        ? ?/sec
with_param_values_many_columns                1.00    142.8±1.14µs        ? ?/sec    1.00    143.4±1.66µs        ? ?/sec

alamb · 2025-08-21T22:46:47Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-08-21T23:39:42Z

🤖: Benchmark completed

Details

Comparing HEAD and alamb_update_arrow
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2665.48 ms │         2563.47 ms │ no change │
│ QQuery 1     │  1314.17 ms │         1287.20 ms │ no change │
│ QQuery 2     │  2542.19 ms │         2476.30 ms │ no change │
│ QQuery 3     │  1165.07 ms │         1191.78 ms │ no change │
│ QQuery 4     │  2216.35 ms │         2193.16 ms │ no change │
│ QQuery 5     │ 27207.42 ms │        27046.80 ms │ no change │
│ QQuery 6     │  4248.15 ms │         4133.04 ms │ no change │
│ QQuery 7     │  3311.04 ms │         3321.23 ms │ no change │
└──────────────┴─────────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 44669.87ms │
│ Total Time (alamb_update_arrow)   │ 44212.98ms │
│ Average Time (HEAD)               │  5583.73ms │
│ Average Time (alamb_update_arrow) │  5526.62ms │
│ Queries Faster                    │          0 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │          8 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.04 ms │            2.17 ms │  1.06x slower │
│ QQuery 1     │    49.04 ms │           48.92 ms │     no change │
│ QQuery 2     │   134.51 ms │          136.80 ms │     no change │
│ QQuery 3     │   153.85 ms │          167.46 ms │  1.09x slower │
│ QQuery 4     │   997.19 ms │         1019.02 ms │     no change │
│ QQuery 5     │  1496.89 ms │         1452.65 ms │     no change │
│ QQuery 6     │     2.10 ms │            2.09 ms │     no change │
│ QQuery 7     │    53.44 ms │           54.92 ms │     no change │
│ QQuery 8     │  1433.45 ms │         1439.05 ms │     no change │
│ QQuery 9     │  1815.52 ms │         1771.95 ms │     no change │
│ QQuery 10    │   398.01 ms │          375.42 ms │ +1.06x faster │
│ QQuery 11    │   452.48 ms │          423.68 ms │ +1.07x faster │
│ QQuery 12    │  1363.05 ms │         1349.15 ms │     no change │
│ QQuery 13    │  2141.34 ms │         2105.27 ms │     no change │
│ QQuery 14    │  1279.64 ms │         1225.10 ms │     no change │
│ QQuery 15    │  1166.89 ms │         1152.90 ms │     no change │
│ QQuery 16    │  2584.06 ms │         2609.31 ms │     no change │
│ QQuery 17    │  2565.90 ms │         2633.82 ms │     no change │
│ QQuery 18    │  4863.11 ms │         4815.32 ms │     no change │
│ QQuery 19    │   126.29 ms │          125.43 ms │     no change │
│ QQuery 20    │  2027.63 ms │         1948.73 ms │     no change │
│ QQuery 21    │  2346.63 ms │         2269.79 ms │     no change │
│ QQuery 22    │  4015.64 ms │         3892.22 ms │     no change │
│ QQuery 23    │ 14288.53 ms │        13655.02 ms │     no change │
│ QQuery 24    │   275.52 ms │          244.57 ms │ +1.13x faster │
│ QQuery 25    │   534.68 ms │          501.65 ms │ +1.07x faster │
│ QQuery 26    │   284.20 ms │          244.40 ms │ +1.16x faster │
│ QQuery 27    │  2855.71 ms │         2775.80 ms │     no change │
│ QQuery 28    │ 24690.09 ms │        22698.76 ms │ +1.09x faster │
│ QQuery 29    │   974.05 ms │          971.38 ms │     no change │
│ QQuery 30    │  1347.63 ms │         1284.08 ms │     no change │
│ QQuery 31    │  1344.82 ms │         1302.20 ms │     no change │
│ QQuery 32    │  4207.96 ms │         4407.11 ms │     no change │
│ QQuery 33    │  5550.52 ms │         5492.29 ms │     no change │
│ QQuery 34    │  5776.52 ms │         5820.94 ms │     no change │
│ QQuery 35    │  1970.81 ms │         2010.33 ms │     no change │
│ QQuery 36    │   124.23 ms │          120.61 ms │     no change │
│ QQuery 37    │    53.64 ms │           55.53 ms │     no change │
│ QQuery 38    │   119.37 ms │          121.20 ms │     no change │
│ QQuery 39    │   197.41 ms │          193.47 ms │     no change │
│ QQuery 40    │    41.85 ms │           41.69 ms │     no change │
│ QQuery 41    │    39.44 ms │           40.15 ms │     no change │
│ QQuery 42    │    32.61 ms │           31.27 ms │     no change │
└──────────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 96178.30ms │
│ Total Time (alamb_update_arrow)   │ 93033.61ms │
│ Average Time (HEAD)               │  2236.70ms │
│ Average Time (alamb_update_arrow) │  2163.57ms │
│ Queries Faster                    │          6 │
│ Queries Slower                    │          2 │
│ Queries with No Change            │         35 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 170.34 ms │          171.29 ms │ no change │
│ QQuery 2     │  25.47 ms │           26.72 ms │ no change │
│ QQuery 3     │  44.85 ms │           43.89 ms │ no change │
│ QQuery 4     │  26.37 ms │           26.51 ms │ no change │
│ QQuery 5     │  72.01 ms │           72.53 ms │ no change │
│ QQuery 6     │  19.48 ms │           19.69 ms │ no change │
│ QQuery 7     │ 144.55 ms │          143.25 ms │ no change │
│ QQuery 8     │  32.47 ms │           33.31 ms │ no change │
│ QQuery 9     │  83.75 ms │           82.22 ms │ no change │
│ QQuery 10    │  59.78 ms │           58.36 ms │ no change │
│ QQuery 11    │  40.66 ms │           41.71 ms │ no change │
│ QQuery 12    │  50.52 ms │           51.92 ms │ no change │
│ QQuery 13    │  44.99 ms │           45.74 ms │ no change │
│ QQuery 14    │  12.98 ms │           13.18 ms │ no change │
│ QQuery 15    │  23.66 ms │           24.09 ms │ no change │
│ QQuery 16    │  24.28 ms │           23.39 ms │ no change │
│ QQuery 17    │ 142.57 ms │          145.82 ms │ no change │
│ QQuery 18    │ 316.30 ms │          322.67 ms │ no change │
│ QQuery 19    │  36.37 ms │           35.75 ms │ no change │
│ QQuery 20    │  47.57 ms │           47.76 ms │ no change │
│ QQuery 21    │ 220.02 ms │          218.34 ms │ no change │
│ QQuery 22    │  19.60 ms │           18.80 ms │ no change │
└──────────────┴───────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 1658.59ms │
│ Total Time (alamb_update_arrow)   │ 1666.95ms │
│ Average Time (HEAD)               │   75.39ms │
│ Average Time (alamb_update_arrow) │   75.77ms │
│ Queries Faster                    │         0 │
│ Queries Slower                    │         0 │
│ Queries with No Change            │        22 │
│ Queries with Failure              │         0 │
└───────────────────────────────────┴───────────┘

alamb · 2025-08-22T13:12:59Z

Comparing HEAD and alamb_update_arrow

Benchmark clickbench_pushdown.json

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ alamb_update_arrow ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 10 │ 420.27 ms │ 488.87 ms │ 1.16x slower │
│ QQuery 11 │ 500.34 ms │ 555.10 ms │ 1.11x slower │
│ QQuery 12 │ 1789.42 ms │ 1521.70 ms │ +1.18x faster │
│ QQuery 13 │ 2698.81 ms │ 2426.41 ms │ +1.11x faster │
│ QQuery 14 │ 1898.26 ms │ 1644.37 ms │ +1.15x faster │
│ QQuery 18 │ 5366.43 ms │ 4887.05 ms │ +1.10x faster │
│ QQuery 19 │ 125.17 ms │ 149.12 ms │ 1.19x slower │
│ QQuery 20 │ 2109.40 ms │ 1932.01 ms │ +1.09x faster │
│ QQuery 22 │ 5457.70 ms │ 4063.91 ms │ +1.34x faster │
│ QQuery 23 │ 2056.23 ms │ 1470.65 ms │ +1.40x faster │
│ QQuery 24 │ 291.14 ms │ 252.50 ms │ +1.15x faster │
│ QQuery 25 │ 1032.66 ms │ 649.43 ms │ +1.59x faster │
│ QQuery 26 │ 549.09 ms │ 380.68 ms │ +1.44x faster │
│ QQuery 27 │ 4127.01 ms │ 2982.51 ms │ +1.38x faster │
│ QQuery 28 │ 26766.22 ms │ 24180.99 ms │ +1.11x faster │

I believe this is directly attributable to the predicate caching @XiangpengHao added in Speed up Parquet filter pushdown with predicate cache arrow-rs#8203

alamb · 2025-08-22T14:11:45Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/update_arrow (75c255e) to 02a7472 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-08-22T15:08:15Z

🤖: Benchmark completed

Details

Comparing HEAD and alamb_update_arrow
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2692.59 ms │         2666.68 ms │ no change │
│ QQuery 1     │  1321.98 ms │         1271.38 ms │ no change │
│ QQuery 2     │  2495.28 ms │         2458.95 ms │ no change │
│ QQuery 3     │  1172.25 ms │         1144.66 ms │ no change │
│ QQuery 4     │  2247.47 ms │         2245.24 ms │ no change │
│ QQuery 5     │ 27484.95 ms │        27616.36 ms │ no change │
│ QQuery 6     │  4282.47 ms │         4134.84 ms │ no change │
│ QQuery 7     │  3709.81 ms │         3637.91 ms │ no change │
└──────────────┴─────────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 45406.80ms │
│ Total Time (alamb_update_arrow)   │ 45176.03ms │
│ Average Time (HEAD)               │  5675.85ms │
│ Average Time (alamb_update_arrow) │  5647.00ms │
│ Queries Faster                    │          0 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │          8 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_update_arrow ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.07 ms │            2.17 ms │     no change │
│ QQuery 1     │    50.88 ms │           50.54 ms │     no change │
│ QQuery 2     │   134.35 ms │          135.69 ms │     no change │
│ QQuery 3     │   168.01 ms │          165.49 ms │     no change │
│ QQuery 4     │  1052.98 ms │         1098.17 ms │     no change │
│ QQuery 5     │  1503.44 ms │         1507.13 ms │     no change │
│ QQuery 6     │     2.16 ms │            2.16 ms │     no change │
│ QQuery 7     │    54.93 ms │           54.34 ms │     no change │
│ QQuery 8     │  1417.15 ms │         1511.29 ms │  1.07x slower │
│ QQuery 9     │  1793.97 ms │         1889.36 ms │  1.05x slower │
│ QQuery 10    │   394.83 ms │          377.25 ms │     no change │
│ QQuery 11    │   458.12 ms │          432.90 ms │ +1.06x faster │
│ QQuery 12    │  1332.79 ms │         1432.08 ms │  1.07x slower │
│ QQuery 13    │  2151.43 ms │         2141.78 ms │     no change │
│ QQuery 14    │  1300.96 ms │         1284.70 ms │     no change │
│ QQuery 15    │  1208.29 ms │         1285.02 ms │  1.06x slower │
│ QQuery 16    │  2688.10 ms │         2708.07 ms │     no change │
│ QQuery 17    │  2633.73 ms │         2690.02 ms │     no change │
│ QQuery 18    │  5156.57 ms │         5011.25 ms │     no change │
│ QQuery 19    │   130.35 ms │          125.98 ms │     no change │
│ QQuery 20    │  2067.29 ms │         1934.95 ms │ +1.07x faster │
│ QQuery 21    │  2362.03 ms │         2275.48 ms │     no change │
│ QQuery 22    │  4118.76 ms │         3881.38 ms │ +1.06x faster │
│ QQuery 23    │ 20288.59 ms │        13781.94 ms │ +1.47x faster │
│ QQuery 24    │   265.30 ms │          245.44 ms │ +1.08x faster │
│ QQuery 25    │   529.99 ms │          496.27 ms │ +1.07x faster │
│ QQuery 26    │   279.59 ms │          262.16 ms │ +1.07x faster │
│ QQuery 27    │  2934.27 ms │         2818.28 ms │     no change │
│ QQuery 28    │ 24861.76 ms │        22844.16 ms │ +1.09x faster │
│ QQuery 29    │   972.69 ms │          946.02 ms │     no change │
│ QQuery 30    │  1365.65 ms │         1328.64 ms │     no change │
│ QQuery 31    │  1387.62 ms │         1312.99 ms │ +1.06x faster │
│ QQuery 32    │  4623.95 ms │         4420.68 ms │     no change │
│ QQuery 33    │  5800.89 ms │         5702.78 ms │     no change │
│ QQuery 34    │  5839.49 ms │         5929.93 ms │     no change │
│ QQuery 35    │  2056.08 ms │         2097.38 ms │     no change │
│ QQuery 36    │   120.68 ms │          120.81 ms │     no change │
│ QQuery 37    │    52.93 ms │           54.11 ms │     no change │
│ QQuery 38    │   121.11 ms │          119.85 ms │     no change │
│ QQuery 39    │   200.98 ms │          199.32 ms │     no change │
│ QQuery 40    │    44.11 ms │           45.91 ms │     no change │
│ QQuery 41    │    40.86 ms │           37.86 ms │ +1.08x faster │
│ QQuery 42    │    33.09 ms │           33.24 ms │     no change │
└──────────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 104002.83ms │
│ Total Time (alamb_update_arrow)   │  94794.99ms │
│ Average Time (HEAD)               │   2418.67ms │
│ Average Time (alamb_update_arrow) │   2204.53ms │
│ Queries Faster                    │          10 │
│ Queries Slower                    │           4 │
│ Queries with No Change            │          29 │
│ Queries with Failure              │           0 │
└───────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ alamb_update_arrow ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 167.49 ms │          168.72 ms │ no change │
│ QQuery 2     │  27.33 ms │           26.48 ms │ no change │
│ QQuery 3     │  44.74 ms │           44.68 ms │ no change │
│ QQuery 4     │  26.49 ms │           26.91 ms │ no change │
│ QQuery 5     │  73.09 ms │           73.93 ms │ no change │
│ QQuery 6     │  19.41 ms │           19.79 ms │ no change │
│ QQuery 7     │ 142.35 ms │          140.77 ms │ no change │
│ QQuery 8     │  32.60 ms │           31.96 ms │ no change │
│ QQuery 9     │  82.16 ms │           84.85 ms │ no change │
│ QQuery 10    │  57.53 ms │           57.88 ms │ no change │
│ QQuery 11    │  40.64 ms │           41.08 ms │ no change │
│ QQuery 12    │  50.99 ms │           51.21 ms │ no change │
│ QQuery 13    │  45.86 ms │           45.45 ms │ no change │
│ QQuery 14    │  13.10 ms │           13.49 ms │ no change │
│ QQuery 15    │  23.62 ms │           24.13 ms │ no change │
│ QQuery 16    │  23.29 ms │           23.84 ms │ no change │
│ QQuery 17    │ 143.40 ms │          144.07 ms │ no change │
│ QQuery 18    │ 324.45 ms │          313.69 ms │ no change │
│ QQuery 19    │  36.25 ms │           36.56 ms │ no change │
│ QQuery 20    │  48.43 ms │           48.81 ms │ no change │
│ QQuery 21    │ 220.49 ms │          221.06 ms │ no change │
│ QQuery 22    │  19.30 ms │           18.88 ms │ no change │
└──────────────┴───────────┴────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 1663.01ms │
│ Total Time (alamb_update_arrow)   │ 1658.24ms │
│ Average Time (HEAD)               │   75.59ms │
│ Average Time (alamb_update_arrow) │   75.37ms │
│ Queries Faster                    │         0 │
│ Queries Slower                    │         0 │
│ Queries with No Change            │        22 │
│ Queries with Failure              │         0 │
└───────────────────────────────────┴───────────┘

alamb added 3 commits August 21, 2025 14:40

WIP: Test upgrade to arrow 56.1.0

64d0c46

Adjust for new parquet sizes

edb86ae

Update for deprecated API

a604f03

github-actions bot added sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels Aug 21, 2025

alamb mentioned this pull request Aug 21, 2025

Release arrow-rs / parquet Minor version 56.1.0 (August 2025) apache/arrow-rs#7837

Open

5 tasks

alamb added 2 commits August 21, 2025 14:59

Update metadata size

7a0cafd

update

5b12a47

alamb commented Aug 21, 2025

View reviewed changes

update test

75c255e

github-actions bot added the physical-plan Changes to the physical-plan crate label Aug 21, 2025

alamb commented Aug 21, 2025

View reviewed changes

alamb changed the title ~~WIP: Test upgrade to arrow 56.1.0~~ WIP: Uupgrade to arrow 56.1.0 Aug 21, 2025

alamb changed the title ~~WIP: Uupgrade to arrow 56.1.0~~ WIP: Upgrade to arrow 56.1.0 Aug 21, 2025

alamb mentioned this pull request Aug 22, 2025

Testing: Try test optimize performance for coalesce #17193

Open

WIP: Upgrade to arrow 56.1.0 #17275

Are you sure you want to change the base?

WIP: Upgrade to arrow 56.1.0 #17275

Uh oh!

Conversation

alamb commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 21, 2025

Uh oh!

alamb commented Aug 22, 2025

Comparing HEAD and alamb_update_arrow

Benchmark clickbench_pushdown.json

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

Uh oh!

alamb commented Aug 21, 2025 •

edited

Loading