Testing: Try test optimize performance for coalesce #17193

zhuqi-lucas · 2025-08-14T15:11:53Z

Which issue does this PR close?

Try test optimize performance for coalesce

This is a follow-up testing for
#17105

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

…oalesce

…eam_arrow_coalesce

alamb · 2025-08-15T20:42:43Z

I will polish code and doc if we think this is the right direction.

Sounds good to me.

I am sorry I have somewhat lost track of the current status

Shall we polish up this PR then?

https://github.com/apache/arrow-rs/pull/8146/files

alamb

Thank you @zhuqi-lucas

alamb · 2025-08-15T20:43:25Z

datafusion/physical-plan/src/coalesce/mod.rs

+        // Limit not reached, push the entire batch
+        self.total_rows += batch.num_rows();
+
+        if batch.num_rows() >= self.biggest_coalesce_size {


or maybe we should migrate biggest_coalesce_size setting to the upstream coalsecer 🤔

Good suggestion!

Thank you @alamb , polished the upstream PR to support it now:

apache/arrow-rs#8146

zhuqi-lucas · 2025-08-16T06:05:13Z

I will polish code and doc if we think this is the right direction.

Sounds good to me.

I am sorry I have somewhat lost track of the current status

Shall we polish up this PR then?

https://github.com/apache/arrow-rs/pull/8146/files

Thank you @alamb , right, let me polish the upstream PR before this PR.

zhuqi-lucas · 2025-08-16T07:05:50Z

Polished the upstream PR to support it now:

apache/arrow-rs#8146

Dandandan · 2025-08-16T07:53:43Z

Very cool! Added a comment on the upstream PR, I think it makes sense to see if we can avoid the (small) regressions.

zhuqi-lucas · 2025-08-16T12:58:21Z

Very cool! Added a comment on the upstream PR, I think it makes sense to see if we can avoid the (small) regressions.

Thank you @Dandandan for review!

…rmance

…row-datafusion into test_optimize_performance

zhuqi-lucas · 2025-08-18T04:03:12Z

Updated to use latest upstream code:
apache/arrow-rs#8146

May be we can trigger a new benchmark to compare the performance result. cc @alamb @Dandandan , thanks!

alamb · 2025-08-18T13:03:54Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (5f660da) to 420a862 diff using: tpch_mem
Results will be posted here when complete

alamb · 2025-08-18T13:04:12Z

Updated to use latest upstream code: apache/arrow-rs#8146

May be we can trigger a new benchmark to compare the performance result. cc @alamb @Dandandan , thanks!

DOne. I really need to find some better way to get the benchmarks triggered other than having to do it manually

alamb · 2025-08-18T13:31:30Z

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 100.44 ms │                  76.49 ms │ +1.31x faster │
│ QQuery 2     │  20.50 ms │                  22.42 ms │  1.09x slower │
│ QQuery 3     │  32.56 ms │                  33.05 ms │     no change │
│ QQuery 4     │  18.49 ms │                  19.12 ms │     no change │
│ QQuery 5     │  49.28 ms │                  54.63 ms │  1.11x slower │
│ QQuery 6     │  11.89 ms │                  11.62 ms │     no change │
│ QQuery 7     │  86.73 ms │                  87.34 ms │     no change │
│ QQuery 8     │  23.99 ms │                  25.18 ms │     no change │
│ QQuery 9     │  53.77 ms │                  55.13 ms │     no change │
│ QQuery 10    │  40.00 ms │                  41.56 ms │     no change │
│ QQuery 11    │  37.86 ms │                  38.47 ms │     no change │
│ QQuery 12    │  29.81 ms │                  31.41 ms │  1.05x slower │
│ QQuery 13    │  25.98 ms │                  25.54 ms │     no change │
│ QQuery 14    │   9.86 ms │                  10.21 ms │     no change │
│ QQuery 15    │  18.85 ms │                  19.83 ms │  1.05x slower │
│ QQuery 16    │  17.29 ms │                  17.60 ms │     no change │
│ QQuery 17    │  92.83 ms │                  97.29 ms │     no change │
│ QQuery 18    │ 178.74 ms │                 153.65 ms │ +1.16x faster │
│ QQuery 19    │  24.08 ms │                  23.88 ms │     no change │
│ QQuery 20    │  31.44 ms │                  32.21 ms │     no change │
│ QQuery 21    │ 136.42 ms │                 137.31 ms │     no change │
│ QQuery 22    │  13.68 ms │                  14.62 ms │  1.07x slower │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1054.49ms │
│ Total Time (test_optimize_performance)   │ 1028.57ms │
│ Average Time (HEAD)                      │   47.93ms │
│ Average Time (test_optimize_performance) │   46.75ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         5 │
│ Queries with No Change                   │        15 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

zhuqi-lucas · 2025-08-18T13:42:56Z

Updated to use latest upstream code: apache/arrow-rs#8146
May be we can trigger a new benchmark to compare the performance result. cc @alamb @Dandandan , thanks!

DOne. I really need to find some better way to get the benchmarks triggered other than having to do it manually

Thank you @alamb , i agree, if we can run by CI, it will be perfect!

zhuqi-lucas · 2025-08-18T13:47:11Z

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 100.44 ms │                  76.49 ms │ +1.31x faster │
│ QQuery 2     │  20.50 ms │                  22.42 ms │  1.09x slower │
│ QQuery 3     │  32.56 ms │                  33.05 ms │     no change │
│ QQuery 4     │  18.49 ms │                  19.12 ms │     no change │
│ QQuery 5     │  49.28 ms │                  54.63 ms │  1.11x slower │
│ QQuery 6     │  11.89 ms │                  11.62 ms │     no change │
│ QQuery 7     │  86.73 ms │                  87.34 ms │     no change │
│ QQuery 8     │  23.99 ms │                  25.18 ms │     no change │
│ QQuery 9     │  53.77 ms │                  55.13 ms │     no change │
│ QQuery 10    │  40.00 ms │                  41.56 ms │     no change │
│ QQuery 11    │  37.86 ms │                  38.47 ms │     no change │
│ QQuery 12    │  29.81 ms │                  31.41 ms │  1.05x slower │
│ QQuery 13    │  25.98 ms │                  25.54 ms │     no change │
│ QQuery 14    │   9.86 ms │                  10.21 ms │     no change │
│ QQuery 15    │  18.85 ms │                  19.83 ms │  1.05x slower │
│ QQuery 16    │  17.29 ms │                  17.60 ms │     no change │
│ QQuery 17    │  92.83 ms │                  97.29 ms │     no change │
│ QQuery 18    │ 178.74 ms │                 153.65 ms │ +1.16x faster │
│ QQuery 19    │  24.08 ms │                  23.88 ms │     no change │
│ QQuery 20    │  31.44 ms │                  32.21 ms │     no change │
│ QQuery 21    │ 136.42 ms │                 137.31 ms │     no change │
│ QQuery 22    │  13.68 ms │                  14.62 ms │  1.07x slower │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1054.49ms │
│ Total Time (test_optimize_performance)   │ 1028.57ms │
│ Average Time (HEAD)                      │   47.93ms │
│ Average Time (test_optimize_performance) │   46.75ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         5 │
│ Queries with No Change                   │        15 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

The result is similar @alamb @Dandandan for latest change, average time is better, Q1 and Q18 query improvement is good, some queries very small regression.

alamb · 2025-08-18T13:59:09Z

Results look good to me -- I think the small changes are likely measurement noise.

Thank you for pushing this through

… batch support (#8146) # Which issue does this PR close? needed for: apache/datafusion#17193 # Rationale for this change ```rust // Large batch bypass optimization: // When biggest_coalesce_batch_size is configured and a batch exceeds this limit, // we can avoid expensive split-and-merge operations by passing it through directly. // // IMPORTANT: This optimization is OPTIONAL and only active when biggest_coalesce_batch_size // is explicitly set via with_biggest_coalesce_batch_size(Some(limit)). // If not set (None), ALL batches follow normal coalescing behavior regardless of size. // ============================================================================= // CASE 1: No buffer + large batch → Direct bypass // ============================================================================= // Example scenario (target_batch_size=1000, biggest_coalesce_batch_size=Some(500)): // Input sequence: [600, 1200, 300] // // With biggest_coalesce_batch_size=Some(500) (optimization enabled): // 600 → large batch detected! buffered_rows=0 → Case 1: direct bypass // → output: [600] (bypass, preserves large batch) // 1200 → large batch detected! buffered_rows=0 → Case 1: direct bypass // → output: [1200] (bypass, preserves large batch) // 300 → normal batch, buffer: [300] // Result: [600], [1200], [300] - large batches preserved, mixed sizes // ============================================================================= // CASE 2: Buffer too large + large batch → Flush first, then bypass // ============================================================================= // This case prevents creating extremely large merged batches that would // significantly exceed both target_batch_size and biggest_coalesce_batch_size. // // Example 1: Buffer exceeds limit before large batch arrives // target_batch_size=1000, biggest_coalesce_batch_size=Some(400) // Input: [350, 200, 800] // // Step 1: push_batch([350]) // → batch_size=350 <= 400, normal path // → buffer: [350], buffered_rows=350 // // Step 2: push_batch([200]) // → batch_size=200 <= 400, normal path // → buffer: [350, 200], buffered_rows=550 // // Step 3: push_batch([800]) // → batch_size=800 > 400, large batch path // → buffered_rows=550 > 400 → Case 2: flush first // → flush: output [550] (combined [350, 200]) // → then bypass: output [800] // Result: [550], [800] - buffer flushed to prevent oversized merge // // Example 2: Multiple small batches accumulate before large batch // target_batch_size=1000, biggest_coalesce_batch_size=Some(300) // Input: [150, 100, 80, 900] // // Step 1-3: Accumulate small batches // 150 → buffer: [150], buffered_rows=150 // 100 → buffer: [150, 100], buffered_rows=250 // 80 → buffer: [150, 100, 80], buffered_rows=330 // // Step 4: push_batch([900]) // → batch_size=900 > 300, large batch path // → buffered_rows=330 > 300 → Case 2: flush first // → flush: output [330] (combined [150, 100, 80]) // → then bypass: output [900] // Result: [330], [900] - prevents merge into [1230] which would be too large // ============================================================================= // CASE 3: Small buffer + large batch → Normal coalescing (no bypass) // ============================================================================= // When buffer is small enough, we still merge to maintain efficiency // Example: target_batch_size=1000, biggest_coalesce_batch_size=Some(500) // Input: [300, 1200] // // Step 1: push_batch([300]) // → batch_size=300 <= 500, normal path // → buffer: [300], buffered_rows=300 // // Step 2: push_batch([1200]) // → batch_size=1200 > 500, large batch path // → buffered_rows=300 <= 500 → Case 3: normal merge // → buffer: [300, 1200] (1500 total) // → 1500 > target_batch_size → split: output [1000], buffer [500] // Result: [1000], [500] - normal split/merge behavior maintained // ============================================================================= // Comparison: Default vs Optimized Behavior // ============================================================================= // target_batch_size=1000, biggest_coalesce_batch_size=Some(500) // Input: [600, 1200, 300] // // DEFAULT BEHAVIOR (biggest_coalesce_batch_size=None): // 600 → buffer: [600] // 1200 → buffer: [600, 1200] (1800 rows total) // → split: output [1000 rows], buffer [800 rows remaining] // 300 → buffer: [800, 300] (1100 rows total) // → split: output [1000 rows], buffer [100 rows remaining] // Result: [1000], [1000], [100] - all outputs respect target_batch_size // // OPTIMIZED BEHAVIOR (biggest_coalesce_batch_size=Some(500)): // 600 → Case 1: direct bypass → output: [600] // 1200 → Case 1: direct bypass → output: [1200] // 300 → normal path → buffer: [300] // Result: [600], [1200], [300] - large batches preserved // ============================================================================= // Benefits and Trade-offs // ============================================================================= // Benefits of the optimization: // - Large batches stay intact (better for downstream vectorized processing) // - Fewer split/merge operations (better CPU performance) // - More predictable memory usage patterns // - Maintains streaming efficiency while preserving batch boundaries // // Trade-offs: // - Output batch sizes become variable (not always target_batch_size) // - May produce smaller partial batches when flushing before large batches // - Requires tuning biggest_coalesce_batch_size parameter for optimal performance // TODO, for unsorted batches, we may can filter all large batches, and coalesce all // small batches together? ``` # What changes are included in this PR? Add more public API which is needed for apache datafusion. # Are these changes tested? yes Added unit test. # Are there any user-facing changes? No --------- Co-authored-by: Andrew Lamb <[email protected]>

alamb · 2025-08-19T17:59:15Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (5f660da) to 420a862 diff using: topk_tpch
Results will be posted here when complete

alamb · 2025-08-22T13:16:07Z

I have updated my benchmark machine on gcp so it supposedly is more consistent -- I am going to rerun the benchmarks on this PR to see if it looks better

Also, I think one the following PR is merged, we can do this one:

WIP: Upgrade to arrow 56.1.0 #17275

alamb · 2025-08-22T13:16:53Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (5f660da) to 420a862 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

zhuqi-lucas · 2025-08-22T13:18:25Z

I have updated my benchmark machine on gcp so it supposedly is more consistent -- I am going to rerun the benchmarks on this PR to see if it looks better

Also, I think one the following PR is merged, we can do this one:

WIP: Upgrade to arrow 56.1.0 #17275

Thank you @alamb , it makes sense, i will refactor after the PR merged.

alamb · 2025-08-22T14:11:40Z

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2685.21 ms │                2664.70 ms │ no change │
│ QQuery 1     │  1348.85 ms │                1290.72 ms │ no change │
│ QQuery 2     │  2477.94 ms │                2473.30 ms │ no change │
│ QQuery 3     │  1170.40 ms │                1195.86 ms │ no change │
│ QQuery 4     │  2242.24 ms │                2301.25 ms │ no change │
│ QQuery 5     │ 27334.81 ms │               27544.94 ms │ no change │
│ QQuery 6     │  4301.92 ms │                4136.17 ms │ no change │
│ QQuery 7     │  3723.66 ms │                3594.94 ms │ no change │
└──────────────┴─────────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 45285.04ms │
│ Total Time (test_optimize_performance)   │ 45201.88ms │
│ Average Time (HEAD)                      │  5660.63ms │
│ Average Time (test_optimize_performance) │  5650.24ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          0 │
│ Queries with No Change                   │          8 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.08 ms │                   2.13 ms │     no change │
│ QQuery 1     │    48.56 ms │                  48.97 ms │     no change │
│ QQuery 2     │   137.52 ms │                 134.41 ms │     no change │
│ QQuery 3     │   167.57 ms │                 155.62 ms │ +1.08x faster │
│ QQuery 4     │  1042.74 ms │                1055.35 ms │     no change │
│ QQuery 5     │  1572.99 ms │                1507.22 ms │     no change │
│ QQuery 6     │     2.12 ms │                   2.10 ms │     no change │
│ QQuery 7     │    57.13 ms │                  55.29 ms │     no change │
│ QQuery 8     │  1474.59 ms │                1426.73 ms │     no change │
│ QQuery 9     │  1860.90 ms │                1777.15 ms │     no change │
│ QQuery 10    │   397.56 ms │                 362.40 ms │ +1.10x faster │
│ QQuery 11    │   450.28 ms │                 411.08 ms │ +1.10x faster │
│ QQuery 12    │  1412.17 ms │                1297.97 ms │ +1.09x faster │
│ QQuery 13    │  2168.02 ms │                2072.30 ms │     no change │
│ QQuery 14    │  1292.44 ms │                1229.62 ms │     no change │
│ QQuery 15    │  1199.12 ms │                1225.69 ms │     no change │
│ QQuery 16    │  2724.09 ms │                2661.54 ms │     no change │
│ QQuery 17    │  2715.72 ms │                2649.10 ms │     no change │
│ QQuery 18    │  5400.64 ms │                4958.93 ms │ +1.09x faster │
│ QQuery 19    │   129.37 ms │                 125.23 ms │     no change │
│ QQuery 20    │  2092.01 ms │                1916.74 ms │ +1.09x faster │
│ QQuery 21    │  2416.15 ms │                2245.82 ms │ +1.08x faster │
│ QQuery 22    │  4138.71 ms │                3914.74 ms │ +1.06x faster │
│ QQuery 23    │ 14761.76 ms │               19908.31 ms │  1.35x slower │
│ QQuery 24    │   753.68 ms │                 696.32 ms │ +1.08x faster │
│ QQuery 25    │   534.43 ms │                 464.47 ms │ +1.15x faster │
│ QQuery 26    │   750.69 ms │                 681.99 ms │ +1.10x faster │
│ QQuery 27    │  2972.04 ms │                2792.18 ms │ +1.06x faster │
│ QQuery 28    │ 23351.69 ms │               23764.67 ms │     no change │
│ QQuery 29    │   990.11 ms │                 986.79 ms │     no change │
│ QQuery 30    │  1369.87 ms │                1309.31 ms │     no change │
│ QQuery 31    │  1343.90 ms │                1372.71 ms │     no change │
│ QQuery 32    │  4746.87 ms │                5037.21 ms │  1.06x slower │
│ QQuery 33    │  5944.18 ms │                5809.44 ms │     no change │
│ QQuery 34    │  6205.65 ms │                5776.77 ms │ +1.07x faster │
│ QQuery 35    │  2109.77 ms │                2004.97 ms │     no change │
│ QQuery 36    │   122.97 ms │                 116.79 ms │ +1.05x faster │
│ QQuery 37    │    55.51 ms │                  51.52 ms │ +1.08x faster │
│ QQuery 38    │   123.02 ms │                 120.74 ms │     no change │
│ QQuery 39    │   200.24 ms │                 191.99 ms │     no change │
│ QQuery 40    │    46.58 ms │                  43.01 ms │ +1.08x faster │
│ QQuery 41    │    40.79 ms │                  39.86 ms │     no change │
│ QQuery 42    │    33.04 ms │                  33.02 ms │     no change │
└──────────────┴─────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │  99359.23ms │
│ Total Time (test_optimize_performance)   │ 102438.21ms │
│ Average Time (HEAD)                      │   2310.68ms │
│ Average Time (test_optimize_performance) │   2382.28ms │
│ Queries Faster                           │          16 │
│ Queries Slower                           │           2 │
│ Queries with No Change                   │          25 │
│ Queries with Failure                     │           0 │
└──────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.89 ms │                 141.20 ms │ +1.23x faster │
│ QQuery 2     │  26.92 ms │                  28.28 ms │  1.05x slower │
│ QQuery 3     │  44.98 ms │                  45.73 ms │     no change │
│ QQuery 4     │  26.92 ms │                  28.25 ms │     no change │
│ QQuery 5     │  72.93 ms │                  83.34 ms │  1.14x slower │
│ QQuery 6     │  19.37 ms │                  19.34 ms │     no change │
│ QQuery 7     │ 148.24 ms │                 153.04 ms │     no change │
│ QQuery 8     │  30.91 ms │                  31.91 ms │     no change │
│ QQuery 9     │  80.96 ms │                  85.43 ms │  1.06x slower │
│ QQuery 10    │  58.14 ms │                  60.40 ms │     no change │
│ QQuery 11    │  40.39 ms │                  41.25 ms │     no change │
│ QQuery 12    │  50.52 ms │                  53.18 ms │  1.05x slower │
│ QQuery 13    │  44.75 ms │                  45.09 ms │     no change │
│ QQuery 14    │  12.90 ms │                  13.60 ms │  1.05x slower │
│ QQuery 15    │  23.72 ms │                  23.76 ms │     no change │
│ QQuery 16    │  23.52 ms │                  23.59 ms │     no change │
│ QQuery 17    │ 144.02 ms │                 146.16 ms │     no change │
│ QQuery 18    │ 315.19 ms │                 274.40 ms │ +1.15x faster │
│ QQuery 19    │  36.32 ms │                  38.11 ms │     no change │
│ QQuery 20    │  47.08 ms │                  49.90 ms │  1.06x slower │
│ QQuery 21    │ 217.78 ms │                 207.04 ms │     no change │
│ QQuery 22    │  19.33 ms │                  19.57 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1658.75ms │
│ Total Time (test_optimize_performance)   │ 1612.59ms │
│ Average Time (HEAD)                      │   75.40ms │
│ Average Time (test_optimize_performance) │   73.30ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        14 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

zhuqi-lucas · 2025-08-22T14:25:35Z

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2685.21 ms │                2664.70 ms │ no change │
│ QQuery 1     │  1348.85 ms │                1290.72 ms │ no change │
│ QQuery 2     │  2477.94 ms │                2473.30 ms │ no change │
│ QQuery 3     │  1170.40 ms │                1195.86 ms │ no change │
│ QQuery 4     │  2242.24 ms │                2301.25 ms │ no change │
│ QQuery 5     │ 27334.81 ms │               27544.94 ms │ no change │
│ QQuery 6     │  4301.92 ms │                4136.17 ms │ no change │
│ QQuery 7     │  3723.66 ms │                3594.94 ms │ no change │
└──────────────┴─────────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 45285.04ms │
│ Total Time (test_optimize_performance)   │ 45201.88ms │
│ Average Time (HEAD)                      │  5660.63ms │
│ Average Time (test_optimize_performance) │  5650.24ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          0 │
│ Queries with No Change                   │          8 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.08 ms │                   2.13 ms │     no change │
│ QQuery 1     │    48.56 ms │                  48.97 ms │     no change │
│ QQuery 2     │   137.52 ms │                 134.41 ms │     no change │
│ QQuery 3     │   167.57 ms │                 155.62 ms │ +1.08x faster │
│ QQuery 4     │  1042.74 ms │                1055.35 ms │     no change │
│ QQuery 5     │  1572.99 ms │                1507.22 ms │     no change │
│ QQuery 6     │     2.12 ms │                   2.10 ms │     no change │
│ QQuery 7     │    57.13 ms │                  55.29 ms │     no change │
│ QQuery 8     │  1474.59 ms │                1426.73 ms │     no change │
│ QQuery 9     │  1860.90 ms │                1777.15 ms │     no change │
│ QQuery 10    │   397.56 ms │                 362.40 ms │ +1.10x faster │
│ QQuery 11    │   450.28 ms │                 411.08 ms │ +1.10x faster │
│ QQuery 12    │  1412.17 ms │                1297.97 ms │ +1.09x faster │
│ QQuery 13    │  2168.02 ms │                2072.30 ms │     no change │
│ QQuery 14    │  1292.44 ms │                1229.62 ms │     no change │
│ QQuery 15    │  1199.12 ms │                1225.69 ms │     no change │
│ QQuery 16    │  2724.09 ms │                2661.54 ms │     no change │
│ QQuery 17    │  2715.72 ms │                2649.10 ms │     no change │
│ QQuery 18    │  5400.64 ms │                4958.93 ms │ +1.09x faster │
│ QQuery 19    │   129.37 ms │                 125.23 ms │     no change │
│ QQuery 20    │  2092.01 ms │                1916.74 ms │ +1.09x faster │
│ QQuery 21    │  2416.15 ms │                2245.82 ms │ +1.08x faster │
│ QQuery 22    │  4138.71 ms │                3914.74 ms │ +1.06x faster │
│ QQuery 23    │ 14761.76 ms │               19908.31 ms │  1.35x slower │
│ QQuery 24    │   753.68 ms │                 696.32 ms │ +1.08x faster │
│ QQuery 25    │   534.43 ms │                 464.47 ms │ +1.15x faster │
│ QQuery 26    │   750.69 ms │                 681.99 ms │ +1.10x faster │
│ QQuery 27    │  2972.04 ms │                2792.18 ms │ +1.06x faster │
│ QQuery 28    │ 23351.69 ms │               23764.67 ms │     no change │
│ QQuery 29    │   990.11 ms │                 986.79 ms │     no change │
│ QQuery 30    │  1369.87 ms │                1309.31 ms │     no change │
│ QQuery 31    │  1343.90 ms │                1372.71 ms │     no change │
│ QQuery 32    │  4746.87 ms │                5037.21 ms │  1.06x slower │
│ QQuery 33    │  5944.18 ms │                5809.44 ms │     no change │
│ QQuery 34    │  6205.65 ms │                5776.77 ms │ +1.07x faster │
│ QQuery 35    │  2109.77 ms │                2004.97 ms │     no change │
│ QQuery 36    │   122.97 ms │                 116.79 ms │ +1.05x faster │
│ QQuery 37    │    55.51 ms │                  51.52 ms │ +1.08x faster │
│ QQuery 38    │   123.02 ms │                 120.74 ms │     no change │
│ QQuery 39    │   200.24 ms │                 191.99 ms │     no change │
│ QQuery 40    │    46.58 ms │                  43.01 ms │ +1.08x faster │
│ QQuery 41    │    40.79 ms │                  39.86 ms │     no change │
│ QQuery 42    │    33.04 ms │                  33.02 ms │     no change │
└──────────────┴─────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │  99359.23ms │
│ Total Time (test_optimize_performance)   │ 102438.21ms │
│ Average Time (HEAD)                      │   2310.68ms │
│ Average Time (test_optimize_performance) │   2382.28ms │
│ Queries Faster                           │          16 │
│ Queries Slower                           │           2 │
│ Queries with No Change                   │          25 │
│ Queries with Failure                     │           0 │
└──────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.89 ms │                 141.20 ms │ +1.23x faster │
│ QQuery 2     │  26.92 ms │                  28.28 ms │  1.05x slower │
│ QQuery 3     │  44.98 ms │                  45.73 ms │     no change │
│ QQuery 4     │  26.92 ms │                  28.25 ms │     no change │
│ QQuery 5     │  72.93 ms │                  83.34 ms │  1.14x slower │
│ QQuery 6     │  19.37 ms │                  19.34 ms │     no change │
│ QQuery 7     │ 148.24 ms │                 153.04 ms │     no change │
│ QQuery 8     │  30.91 ms │                  31.91 ms │     no change │
│ QQuery 9     │  80.96 ms │                  85.43 ms │  1.06x slower │
│ QQuery 10    │  58.14 ms │                  60.40 ms │     no change │
│ QQuery 11    │  40.39 ms │                  41.25 ms │     no change │
│ QQuery 12    │  50.52 ms │                  53.18 ms │  1.05x slower │
│ QQuery 13    │  44.75 ms │                  45.09 ms │     no change │
│ QQuery 14    │  12.90 ms │                  13.60 ms │  1.05x slower │
│ QQuery 15    │  23.72 ms │                  23.76 ms │     no change │
│ QQuery 16    │  23.52 ms │                  23.59 ms │     no change │
│ QQuery 17    │ 144.02 ms │                 146.16 ms │     no change │
│ QQuery 18    │ 315.19 ms │                 274.40 ms │ +1.15x faster │
│ QQuery 19    │  36.32 ms │                  38.11 ms │     no change │
│ QQuery 20    │  47.08 ms │                  49.90 ms │  1.06x slower │
│ QQuery 21    │ 217.78 ms │                 207.04 ms │     no change │
│ QQuery 22    │  19.33 ms │                  19.57 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1658.75ms │
│ Total Time (test_optimize_performance)   │ 1612.59ms │
│ Average Time (HEAD)                      │   75.40ms │
│ Average Time (test_optimize_performance) │   73.30ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        14 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

Thank you @alamb , the result is amazing, 16 queries faster for clickbench!

But one query has regression:

│ QQuery 23    │ 14761.76 ms │               19908.31 ms │  1.35x slower │

I need to investigate this one. 🤔

…rmance

zhuqi-lucas · 2025-08-22T14:32:24Z

I merged upstream/main to this branch first before investigation.

alamb · 2025-08-23T11:41:02Z

I merged upstream/main to this branch first before investigation.

I also made some changes to my benchmark machine that hopefully will result in less noise. I'll rerun the benchmarks for this one

alamb · 2025-08-23T11:41:35Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (772b590) to f363e38 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-08-23T12:28:03Z

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2614.41 ms │                2725.73 ms │    no change │
│ QQuery 1     │  1260.75 ms │                1452.31 ms │ 1.15x slower │
│ QQuery 2     │  2295.26 ms │                2649.75 ms │ 1.15x slower │
│ QQuery 3     │  1184.59 ms │                1152.49 ms │    no change │
│ QQuery 4     │  2269.45 ms │                2242.90 ms │    no change │
│ QQuery 5     │ 27541.43 ms │               27751.51 ms │    no change │
│ QQuery 6     │  4130.19 ms │                4220.93 ms │    no change │
│ QQuery 7     │  3572.17 ms │                3513.42 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 44868.24ms │
│ Total Time (test_optimize_performance)   │ 45709.03ms │
│ Average Time (HEAD)                      │  5608.53ms │
│ Average Time (test_optimize_performance) │  5713.63ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          2 │
│ Queries with No Change                   │          6 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20 ms │                   2.18 ms │    no change │
│ QQuery 1     │    50.53 ms │                  48.87 ms │    no change │
│ QQuery 2     │   133.87 ms │                 140.28 ms │    no change │
│ QQuery 3     │   163.92 ms │                 161.85 ms │    no change │
│ QQuery 4     │  1021.77 ms │                1033.69 ms │    no change │
│ QQuery 5     │  1487.53 ms │                1517.93 ms │    no change │
│ QQuery 6     │     2.22 ms │                   2.16 ms │    no change │
│ QQuery 7     │    57.79 ms │                  55.07 ms │    no change │
│ QQuery 8     │  1441.62 ms │                1431.13 ms │    no change │
│ QQuery 9     │  1792.87 ms │                1836.50 ms │    no change │
│ QQuery 10    │   374.92 ms │                 363.98 ms │    no change │
│ QQuery 11    │   425.92 ms │                 421.32 ms │    no change │
│ QQuery 12    │  1326.52 ms │                1328.99 ms │    no change │
│ QQuery 13    │  2057.02 ms │                2120.92 ms │    no change │
│ QQuery 14    │  1249.08 ms │                1229.40 ms │    no change │
│ QQuery 15    │  1168.91 ms │                1198.23 ms │    no change │
│ QQuery 16    │  2566.31 ms │                2644.95 ms │    no change │
│ QQuery 17    │  2564.70 ms │                2631.72 ms │    no change │
│ QQuery 18    │  4723.51 ms │                4912.03 ms │    no change │
│ QQuery 19    │   127.87 ms │                 128.76 ms │    no change │
│ QQuery 20    │  1940.82 ms │                2009.35 ms │    no change │
│ QQuery 21    │  2251.53 ms │                2352.73 ms │    no change │
│ QQuery 22    │  3884.33 ms │                4004.28 ms │    no change │
│ QQuery 23    │ 13639.56 ms │               14193.82 ms │    no change │
│ QQuery 24    │   250.30 ms │                 243.78 ms │    no change │
│ QQuery 25    │   499.68 ms │                 486.95 ms │    no change │
│ QQuery 26    │   262.39 ms │                 260.13 ms │    no change │
│ QQuery 27    │  2868.81 ms │                2814.77 ms │    no change │
│ QQuery 28    │ 22939.29 ms │               25294.41 ms │ 1.10x slower │
│ QQuery 29    │   958.36 ms │                 985.74 ms │    no change │
│ QQuery 30    │  1297.41 ms │                1341.45 ms │    no change │
│ QQuery 31    │  1326.48 ms │                1372.63 ms │    no change │
│ QQuery 32    │  4328.02 ms │                5180.80 ms │ 1.20x slower │
│ QQuery 33    │  5453.75 ms │                6000.43 ms │ 1.10x slower │
│ QQuery 34    │  5608.67 ms │                5863.77 ms │    no change │
│ QQuery 35    │  1996.75 ms │                1994.46 ms │    no change │
│ QQuery 36    │   120.31 ms │                 124.06 ms │    no change │
│ QQuery 37    │    52.60 ms │                  52.76 ms │    no change │
│ QQuery 38    │   120.66 ms │                 124.79 ms │    no change │
│ QQuery 39    │   197.13 ms │                 195.29 ms │    no change │
│ QQuery 40    │    43.18 ms │                  43.54 ms │    no change │
│ QQuery 41    │    41.33 ms │                  41.13 ms │    no change │
│ QQuery 42    │    33.71 ms │                  33.58 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 92854.15ms │
│ Total Time (test_optimize_performance)   │ 98224.61ms │
│ Average Time (HEAD)                      │  2159.40ms │
│ Average Time (test_optimize_performance) │  2284.29ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          3 │
│ Queries with No Change                   │         40 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.79 ms │                 134.35 ms │ +1.29x faster │
│ QQuery 2     │  27.18 ms │                  27.18 ms │     no change │
│ QQuery 3     │  43.77 ms │                  45.16 ms │     no change │
│ QQuery 4     │  26.44 ms │                  27.45 ms │     no change │
│ QQuery 5     │  73.91 ms │                  83.45 ms │  1.13x slower │
│ QQuery 6     │  20.08 ms │                  18.92 ms │ +1.06x faster │
│ QQuery 7     │ 144.03 ms │                 152.34 ms │  1.06x slower │
│ QQuery 8     │  30.80 ms │                  32.89 ms │  1.07x slower │
│ QQuery 9     │  82.79 ms │                  84.74 ms │     no change │
│ QQuery 10    │  57.72 ms │                  61.64 ms │  1.07x slower │
│ QQuery 11    │  40.67 ms │                  42.59 ms │     no change │
│ QQuery 12    │  49.27 ms │                  52.10 ms │  1.06x slower │
│ QQuery 13    │  46.11 ms │                  47.02 ms │     no change │
│ QQuery 14    │  13.64 ms │                  13.76 ms │     no change │
│ QQuery 15    │  23.71 ms │                  24.16 ms │     no change │
│ QQuery 16    │  23.30 ms │                  24.44 ms │     no change │
│ QQuery 17    │ 145.84 ms │                 147.27 ms │     no change │
│ QQuery 18    │ 310.92 ms │                 264.90 ms │ +1.17x faster │
│ QQuery 19    │  36.55 ms │                  39.47 ms │  1.08x slower │
│ QQuery 20    │  48.07 ms │                  48.88 ms │     no change │
│ QQuery 21    │ 213.20 ms │                 205.80 ms │     no change │
│ QQuery 22    │  20.34 ms │                  19.41 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1652.13ms │
│ Total Time (test_optimize_performance)   │ 1597.93ms │
│ Average Time (HEAD)                      │   75.10ms │
│ Average Time (test_optimize_performance) │   72.63ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        13 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

zhuqi-lucas · 2025-08-23T12:32:15Z

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2614.41 ms │                2725.73 ms │    no change │
│ QQuery 1     │  1260.75 ms │                1452.31 ms │ 1.15x slower │
│ QQuery 2     │  2295.26 ms │                2649.75 ms │ 1.15x slower │
│ QQuery 3     │  1184.59 ms │                1152.49 ms │    no change │
│ QQuery 4     │  2269.45 ms │                2242.90 ms │    no change │
│ QQuery 5     │ 27541.43 ms │               27751.51 ms │    no change │
│ QQuery 6     │  4130.19 ms │                4220.93 ms │    no change │
│ QQuery 7     │  3572.17 ms │                3513.42 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 44868.24ms │
│ Total Time (test_optimize_performance)   │ 45709.03ms │
│ Average Time (HEAD)                      │  5608.53ms │
│ Average Time (test_optimize_performance) │  5713.63ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          2 │
│ Queries with No Change                   │          6 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20 ms │                   2.18 ms │    no change │
│ QQuery 1     │    50.53 ms │                  48.87 ms │    no change │
│ QQuery 2     │   133.87 ms │                 140.28 ms │    no change │
│ QQuery 3     │   163.92 ms │                 161.85 ms │    no change │
│ QQuery 4     │  1021.77 ms │                1033.69 ms │    no change │
│ QQuery 5     │  1487.53 ms │                1517.93 ms │    no change │
│ QQuery 6     │     2.22 ms │                   2.16 ms │    no change │
│ QQuery 7     │    57.79 ms │                  55.07 ms │    no change │
│ QQuery 8     │  1441.62 ms │                1431.13 ms │    no change │
│ QQuery 9     │  1792.87 ms │                1836.50 ms │    no change │
│ QQuery 10    │   374.92 ms │                 363.98 ms │    no change │
│ QQuery 11    │   425.92 ms │                 421.32 ms │    no change │
│ QQuery 12    │  1326.52 ms │                1328.99 ms │    no change │
│ QQuery 13    │  2057.02 ms │                2120.92 ms │    no change │
│ QQuery 14    │  1249.08 ms │                1229.40 ms │    no change │
│ QQuery 15    │  1168.91 ms │                1198.23 ms │    no change │
│ QQuery 16    │  2566.31 ms │                2644.95 ms │    no change │
│ QQuery 17    │  2564.70 ms │                2631.72 ms │    no change │
│ QQuery 18    │  4723.51 ms │                4912.03 ms │    no change │
│ QQuery 19    │   127.87 ms │                 128.76 ms │    no change │
│ QQuery 20    │  1940.82 ms │                2009.35 ms │    no change │
│ QQuery 21    │  2251.53 ms │                2352.73 ms │    no change │
│ QQuery 22    │  3884.33 ms │                4004.28 ms │    no change │
│ QQuery 23    │ 13639.56 ms │               14193.82 ms │    no change │
│ QQuery 24    │   250.30 ms │                 243.78 ms │    no change │
│ QQuery 25    │   499.68 ms │                 486.95 ms │    no change │
│ QQuery 26    │   262.39 ms │                 260.13 ms │    no change │
│ QQuery 27    │  2868.81 ms │                2814.77 ms │    no change │
│ QQuery 28    │ 22939.29 ms │               25294.41 ms │ 1.10x slower │
│ QQuery 29    │   958.36 ms │                 985.74 ms │    no change │
│ QQuery 30    │  1297.41 ms │                1341.45 ms │    no change │
│ QQuery 31    │  1326.48 ms │                1372.63 ms │    no change │
│ QQuery 32    │  4328.02 ms │                5180.80 ms │ 1.20x slower │
│ QQuery 33    │  5453.75 ms │                6000.43 ms │ 1.10x slower │
│ QQuery 34    │  5608.67 ms │                5863.77 ms │    no change │
│ QQuery 35    │  1996.75 ms │                1994.46 ms │    no change │
│ QQuery 36    │   120.31 ms │                 124.06 ms │    no change │
│ QQuery 37    │    52.60 ms │                  52.76 ms │    no change │
│ QQuery 38    │   120.66 ms │                 124.79 ms │    no change │
│ QQuery 39    │   197.13 ms │                 195.29 ms │    no change │
│ QQuery 40    │    43.18 ms │                  43.54 ms │    no change │
│ QQuery 41    │    41.33 ms │                  41.13 ms │    no change │
│ QQuery 42    │    33.71 ms │                  33.58 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 92854.15ms │
│ Total Time (test_optimize_performance)   │ 98224.61ms │
│ Average Time (HEAD)                      │  2159.40ms │
│ Average Time (test_optimize_performance) │  2284.29ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          3 │
│ Queries with No Change                   │         40 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.79 ms │                 134.35 ms │ +1.29x faster │
│ QQuery 2     │  27.18 ms │                  27.18 ms │     no change │
│ QQuery 3     │  43.77 ms │                  45.16 ms │     no change │
│ QQuery 4     │  26.44 ms │                  27.45 ms │     no change │
│ QQuery 5     │  73.91 ms │                  83.45 ms │  1.13x slower │
│ QQuery 6     │  20.08 ms │                  18.92 ms │ +1.06x faster │
│ QQuery 7     │ 144.03 ms │                 152.34 ms │  1.06x slower │
│ QQuery 8     │  30.80 ms │                  32.89 ms │  1.07x slower │
│ QQuery 9     │  82.79 ms │                  84.74 ms │     no change │
│ QQuery 10    │  57.72 ms │                  61.64 ms │  1.07x slower │
│ QQuery 11    │  40.67 ms │                  42.59 ms │     no change │
│ QQuery 12    │  49.27 ms │                  52.10 ms │  1.06x slower │
│ QQuery 13    │  46.11 ms │                  47.02 ms │     no change │
│ QQuery 14    │  13.64 ms │                  13.76 ms │     no change │
│ QQuery 15    │  23.71 ms │                  24.16 ms │     no change │
│ QQuery 16    │  23.30 ms │                  24.44 ms │     no change │
│ QQuery 17    │ 145.84 ms │                 147.27 ms │     no change │
│ QQuery 18    │ 310.92 ms │                 264.90 ms │ +1.17x faster │
│ QQuery 19    │  36.55 ms │                  39.47 ms │  1.08x slower │
│ QQuery 20    │  48.07 ms │                  48.88 ms │     no change │
│ QQuery 21    │ 213.20 ms │                 205.80 ms │     no change │
│ QQuery 22    │  20.34 ms │                  19.41 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1652.13ms │
│ Total Time (test_optimize_performance)   │ 1597.93ms │
│ Average Time (HEAD)                      │   75.10ms │
│ Average Time (test_optimize_performance) │   72.63ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        13 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

Thank you @alamb , it seems regression for clickbench for this PR. 🤔

I updated the branch again now since some PRs merged to main branch

alamb and others added 30 commits June 4, 2025 09:01

Pin to apache/arrow-rs#7597

9a161d2

Update pin

083931d

Use upstream BatchCoalescer

e79454f

Update the pin

4e8e1ce

Update tests

9e20973

Update rev

8918b3c

Update rev

49cb62e

New rev

140ee9c

New rev

5d5683c

New rev

f79dd09

cargo fmt

1c44c5c

update pin

ea8b700

Merge branch 'main' into alamb/test_upstream_coalesce

f2fc00b

Merge branch 'main' into alamb/test_upstream_coalesce

a36065e

Merge remote-tracking branch 'apache/main' into alamb/test_upstream_c…

423137a

…oalesce

Temp pin to apache/arrow-rs#7650

1c61513

Update plans for smaller parquet files

ed31ce1

Merge remote-tracking branch 'apache/main' into alamb/test_upstream_c…

5349c73

…oalesce

update pin

c5bb25e

Merge remote-tracking branch 'upstream/main' into revive_to_use_upstr…

2f94a22

…eam_arrow_coalesce

fix test

fe7e6a3

fix

0832ff4

Merge branch 'main' into revive_to_use_upstream_arrow_coalesce

396ef3c

fix test

7e6ced0

fix

3ccea48

fix

bac0197

Merge branch 'main' into revive_to_use_upstream_arrow_coalesce

64fc038

Address comments

8bbadaf

Merge remote-tracking branch 'upstream/main' into revive_to_use_upstr…

c88085b

…eam_arrow_coalesce

address new comments

940d49d

alamb reviewed Aug 15, 2025

View reviewed changes

zhuqi-lucas added 3 commits August 17, 2025 21:59

Merge remote-tracking branch 'upstream/main' into test_optimize_perfo…

d7cf005

…rmance

Merge branch 'test_optimize_performance' of github.com:zhuqi-lucas/ar…

82ed301

…row-datafusion into test_optimize_performance

update upstream

5f660da

Merge remote-tracking branch 'upstream/main' into test_optimize_perfo…

772b590

…rmance

Merge branch 'main' into test_optimize_performance

8dec409

Testing: Try test optimize performance for coalesce #17193

Are you sure you want to change the base?

Testing: Try test optimize performance for coalesce #17193

Uh oh!

Conversation

zhuqi-lucas commented Aug 14, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Aug 15, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas commented Aug 16, 2025

Uh oh!

zhuqi-lucas commented Aug 16, 2025

Uh oh!

Dandandan commented Aug 16, 2025

Uh oh!

zhuqi-lucas commented Aug 16, 2025

Uh oh!

zhuqi-lucas commented Aug 18, 2025

Uh oh!

alamb commented Aug 18, 2025

Uh oh!

alamb commented Aug 18, 2025

Uh oh!

alamb commented Aug 18, 2025

Uh oh!

zhuqi-lucas commented Aug 18, 2025

Uh oh!

zhuqi-lucas commented Aug 18, 2025

Uh oh!

alamb commented Aug 18, 2025

Uh oh!

alamb commented Aug 19, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

zhuqi-lucas commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

zhuqi-lucas commented Aug 22, 2025

Uh oh!

zhuqi-lucas commented Aug 22, 2025

Uh oh!

alamb commented Aug 23, 2025

Uh oh!

alamb commented Aug 23, 2025

Uh oh!

alamb commented Aug 23, 2025

Uh oh!

zhuqi-lucas commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zhuqi-lucas commented Aug 23, 2025 •

edited

Loading