Skip to content

Testing: Try test optimize performance for coalesce #17193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

zhuqi-lucas
Copy link
Contributor

Which issue does this PR close?

Try test optimize performance for coalesce

This is a follow-up testing for
#17105

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@alamb
Copy link
Contributor

alamb commented Aug 15, 2025

I will polish code and doc if we think this is the right direction.

Sounds good to me.

I am sorry I have somewhat lost track of the current status

Shall we polish up this PR then?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas

// Limit not reached, push the entire batch
self.total_rows += batch.num_rows();

if batch.num_rows() >= self.biggest_coalesce_size {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we should migrate biggest_coalesce_size setting to the upstream coalsecer 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb , polished the upstream PR to support it now:

apache/arrow-rs#8146

@zhuqi-lucas
Copy link
Contributor Author

I will polish code and doc if we think this is the right direction.

Sounds good to me.

I am sorry I have somewhat lost track of the current status

Shall we polish up this PR then?

Thank you @alamb , right, let me polish the upstream PR before this PR.

@zhuqi-lucas
Copy link
Contributor Author

Polished the upstream PR to support it now:

apache/arrow-rs#8146

@Dandandan
Copy link
Contributor

Very cool! Added a comment on the upstream PR, I think it makes sense to see if we can avoid the (small) regressions.

@zhuqi-lucas
Copy link
Contributor Author

Very cool! Added a comment on the upstream PR, I think it makes sense to see if we can avoid the (small) regressions.

Thank you @Dandandan for review!

@zhuqi-lucas
Copy link
Contributor Author

Updated to use latest upstream code:
apache/arrow-rs#8146

May be we can trigger a new benchmark to compare the performance result. cc @alamb @Dandandan , thanks!

@alamb
Copy link
Contributor

alamb commented Aug 18, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (5f660da) to 420a862 diff using: tpch_mem
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Aug 18, 2025

Updated to use latest upstream code: apache/arrow-rs#8146

May be we can trigger a new benchmark to compare the performance result. cc @alamb @Dandandan , thanks!

DOne. I really need to find some better way to get the benchmarks triggered other than having to do it manually

@alamb
Copy link
Contributor

alamb commented Aug 18, 2025

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 100.44 ms │                  76.49 ms │ +1.31x faster │
│ QQuery 2     │  20.50 ms │                  22.42 ms │  1.09x slower │
│ QQuery 3     │  32.56 ms │                  33.05 ms │     no change │
│ QQuery 4     │  18.49 ms │                  19.12 ms │     no change │
│ QQuery 5     │  49.28 ms │                  54.63 ms │  1.11x slower │
│ QQuery 6     │  11.89 ms │                  11.62 ms │     no change │
│ QQuery 7     │  86.73 ms │                  87.34 ms │     no change │
│ QQuery 8     │  23.99 ms │                  25.18 ms │     no change │
│ QQuery 9     │  53.77 ms │                  55.13 ms │     no change │
│ QQuery 10    │  40.00 ms │                  41.56 ms │     no change │
│ QQuery 11    │  37.86 ms │                  38.47 ms │     no change │
│ QQuery 12    │  29.81 ms │                  31.41 ms │  1.05x slower │
│ QQuery 13    │  25.98 ms │                  25.54 ms │     no change │
│ QQuery 14    │   9.86 ms │                  10.21 ms │     no change │
│ QQuery 15    │  18.85 ms │                  19.83 ms │  1.05x slower │
│ QQuery 16    │  17.29 ms │                  17.60 ms │     no change │
│ QQuery 17    │  92.83 ms │                  97.29 ms │     no change │
│ QQuery 18    │ 178.74 ms │                 153.65 ms │ +1.16x faster │
│ QQuery 19    │  24.08 ms │                  23.88 ms │     no change │
│ QQuery 20    │  31.44 ms │                  32.21 ms │     no change │
│ QQuery 21    │ 136.42 ms │                 137.31 ms │     no change │
│ QQuery 22    │  13.68 ms │                  14.62 ms │  1.07x slower │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1054.49ms │
│ Total Time (test_optimize_performance)   │ 1028.57ms │
│ Average Time (HEAD)                      │   47.93ms │
│ Average Time (test_optimize_performance) │   46.75ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         5 │
│ Queries with No Change                   │        15 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

@zhuqi-lucas
Copy link
Contributor Author

Updated to use latest upstream code: apache/arrow-rs#8146
May be we can trigger a new benchmark to compare the performance result. cc @alamb @Dandandan , thanks!

DOne. I really need to find some better way to get the benchmarks triggered other than having to do it manually

Thank you @alamb , i agree, if we can run by CI, it will be perfect!

@zhuqi-lucas
Copy link
Contributor Author

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 100.44 ms │                  76.49 ms │ +1.31x faster │
│ QQuery 2     │  20.50 ms │                  22.42 ms │  1.09x slower │
│ QQuery 3     │  32.56 ms │                  33.05 ms │     no change │
│ QQuery 4     │  18.49 ms │                  19.12 ms │     no change │
│ QQuery 5     │  49.28 ms │                  54.63 ms │  1.11x slower │
│ QQuery 6     │  11.89 ms │                  11.62 ms │     no change │
│ QQuery 7     │  86.73 ms │                  87.34 ms │     no change │
│ QQuery 8     │  23.99 ms │                  25.18 ms │     no change │
│ QQuery 9     │  53.77 ms │                  55.13 ms │     no change │
│ QQuery 10    │  40.00 ms │                  41.56 ms │     no change │
│ QQuery 11    │  37.86 ms │                  38.47 ms │     no change │
│ QQuery 12    │  29.81 ms │                  31.41 ms │  1.05x slower │
│ QQuery 13    │  25.98 ms │                  25.54 ms │     no change │
│ QQuery 14    │   9.86 ms │                  10.21 ms │     no change │
│ QQuery 15    │  18.85 ms │                  19.83 ms │  1.05x slower │
│ QQuery 16    │  17.29 ms │                  17.60 ms │     no change │
│ QQuery 17    │  92.83 ms │                  97.29 ms │     no change │
│ QQuery 18    │ 178.74 ms │                 153.65 ms │ +1.16x faster │
│ QQuery 19    │  24.08 ms │                  23.88 ms │     no change │
│ QQuery 20    │  31.44 ms │                  32.21 ms │     no change │
│ QQuery 21    │ 136.42 ms │                 137.31 ms │     no change │
│ QQuery 22    │  13.68 ms │                  14.62 ms │  1.07x slower │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1054.49ms │
│ Total Time (test_optimize_performance)   │ 1028.57ms │
│ Average Time (HEAD)                      │   47.93ms │
│ Average Time (test_optimize_performance) │   46.75ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         5 │
│ Queries with No Change                   │        15 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

The result is similar @alamb @Dandandan for latest change, average time is better, Q1 and Q18 query improvement is good, some queries very small regression.

@alamb
Copy link
Contributor

alamb commented Aug 18, 2025

Results look good to me -- I think the small changes are likely measurement noise.

Thank you for pushing this through

alamb added a commit to apache/arrow-rs that referenced this pull request Aug 19, 2025
… batch support (#8146)

# Which issue does this PR close?

needed for:
apache/datafusion#17193

# Rationale for this change
```rust
        // Large batch bypass optimization:
        // When biggest_coalesce_batch_size is configured and a batch exceeds this limit,
        // we can avoid expensive split-and-merge operations by passing it through directly.
        //
        // IMPORTANT: This optimization is OPTIONAL and only active when biggest_coalesce_batch_size
        // is explicitly set via with_biggest_coalesce_batch_size(Some(limit)).
        // If not set (None), ALL batches follow normal coalescing behavior regardless of size.

        // =============================================================================
        // CASE 1: No buffer + large batch → Direct bypass
        // =============================================================================
        // Example scenario (target_batch_size=1000, biggest_coalesce_batch_size=Some(500)):
        // Input sequence: [600, 1200, 300]
        //
        // With biggest_coalesce_batch_size=Some(500) (optimization enabled):
        //   600 → large batch detected! buffered_rows=0 → Case 1: direct bypass
        //        → output: [600] (bypass, preserves large batch)
        //   1200 → large batch detected! buffered_rows=0 → Case 1: direct bypass
        //         → output: [1200] (bypass, preserves large batch)
        //   300 → normal batch, buffer: [300]
        //   Result: [600], [1200], [300] - large batches preserved, mixed sizes

        // =============================================================================
        // CASE 2: Buffer too large + large batch → Flush first, then bypass
        // =============================================================================
        // This case prevents creating extremely large merged batches that would
        // significantly exceed both target_batch_size and biggest_coalesce_batch_size.
        //
        // Example 1: Buffer exceeds limit before large batch arrives
        // target_batch_size=1000, biggest_coalesce_batch_size=Some(400)
        // Input: [350, 200, 800]
        //
        // Step 1: push_batch([350])
        //   → batch_size=350 <= 400, normal path
        //   → buffer: [350], buffered_rows=350
        //
        // Step 2: push_batch([200])
        //   → batch_size=200 <= 400, normal path
        //   → buffer: [350, 200], buffered_rows=550
        //
        // Step 3: push_batch([800])
        //   → batch_size=800 > 400, large batch path
        //   → buffered_rows=550 > 400 → Case 2: flush first
        //   → flush: output [550] (combined [350, 200])
        //   → then bypass: output [800]
        //   Result: [550], [800] - buffer flushed to prevent oversized merge
        //
        // Example 2: Multiple small batches accumulate before large batch
        // target_batch_size=1000, biggest_coalesce_batch_size=Some(300)
        // Input: [150, 100, 80, 900]
        //
        // Step 1-3: Accumulate small batches
        //   150 → buffer: [150], buffered_rows=150
        //   100 → buffer: [150, 100], buffered_rows=250
        //   80  → buffer: [150, 100, 80], buffered_rows=330
        //
        // Step 4: push_batch([900])
        //   → batch_size=900 > 300, large batch path
        //   → buffered_rows=330 > 300 → Case 2: flush first
        //   → flush: output [330] (combined [150, 100, 80])
        //   → then bypass: output [900]
        //   Result: [330], [900] - prevents merge into [1230] which would be too large

        // =============================================================================
        // CASE 3: Small buffer + large batch → Normal coalescing (no bypass)
        // =============================================================================
        // When buffer is small enough, we still merge to maintain efficiency
        // Example: target_batch_size=1000, biggest_coalesce_batch_size=Some(500)
        // Input: [300, 1200]
        //
        // Step 1: push_batch([300])
        //   → batch_size=300 <= 500, normal path
        //   → buffer: [300], buffered_rows=300
        //
        // Step 2: push_batch([1200])
        //   → batch_size=1200 > 500, large batch path
        //   → buffered_rows=300 <= 500 → Case 3: normal merge
        //   → buffer: [300, 1200] (1500 total)
        //   → 1500 > target_batch_size → split: output [1000], buffer [500]
        //   Result: [1000], [500] - normal split/merge behavior maintained

        // =============================================================================
        // Comparison: Default vs Optimized Behavior
        // =============================================================================
        // target_batch_size=1000, biggest_coalesce_batch_size=Some(500)
        // Input: [600, 1200, 300]
        //
        // DEFAULT BEHAVIOR (biggest_coalesce_batch_size=None):
        //   600 → buffer: [600]
        //   1200 → buffer: [600, 1200] (1800 rows total)
        //         → split: output [1000 rows], buffer [800 rows remaining]
        //   300 → buffer: [800, 300] (1100 rows total)
        //        → split: output [1000 rows], buffer [100 rows remaining]
        //   Result: [1000], [1000], [100] - all outputs respect target_batch_size
        //
        // OPTIMIZED BEHAVIOR (biggest_coalesce_batch_size=Some(500)):
        //   600 → Case 1: direct bypass → output: [600]
        //   1200 → Case 1: direct bypass → output: [1200]
        //   300 → normal path → buffer: [300]
        //   Result: [600], [1200], [300] - large batches preserved

        // =============================================================================
        // Benefits and Trade-offs
        // =============================================================================
        // Benefits of the optimization:
        // - Large batches stay intact (better for downstream vectorized processing)
        // - Fewer split/merge operations (better CPU performance)
        // - More predictable memory usage patterns
        // - Maintains streaming efficiency while preserving batch boundaries
        //
        // Trade-offs:
        // - Output batch sizes become variable (not always target_batch_size)
        // - May produce smaller partial batches when flushing before large batches
        // - Requires tuning biggest_coalesce_batch_size parameter for optimal performance

        // TODO, for unsorted batches, we may can filter all large batches, and coalesce all
        // small batches together?
```

# What changes are included in this PR?

Add more public API which is needed for apache datafusion.

# Are these changes tested?

yes

Added unit test.
# Are there any user-facing changes?

No

---------

Co-authored-by: Andrew Lamb <[email protected]>
@alamb
Copy link
Contributor

alamb commented Aug 19, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (5f660da) to 420a862 diff using: topk_tpch
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Aug 22, 2025

I have updated my benchmark machine on gcp so it supposedly is more consistent -- I am going to rerun the benchmarks on this PR to see if it looks better

Also, I think one the following PR is merged, we can do this one:

@alamb
Copy link
Contributor

alamb commented Aug 22, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (5f660da) to 420a862 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@zhuqi-lucas
Copy link
Contributor Author

I have updated my benchmark machine on gcp so it supposedly is more consistent -- I am going to rerun the benchmarks on this PR to see if it looks better

Also, I think one the following PR is merged, we can do this one:

Thank you @alamb , it makes sense, i will refactor after the PR merged.

@alamb
Copy link
Contributor

alamb commented Aug 22, 2025

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2685.21 ms │                2664.70 ms │ no change │
│ QQuery 1     │  1348.85 ms │                1290.72 ms │ no change │
│ QQuery 2     │  2477.94 ms │                2473.30 ms │ no change │
│ QQuery 3     │  1170.40 ms │                1195.86 ms │ no change │
│ QQuery 4     │  2242.24 ms │                2301.25 ms │ no change │
│ QQuery 5     │ 27334.81 ms │               27544.94 ms │ no change │
│ QQuery 6     │  4301.92 ms │                4136.17 ms │ no change │
│ QQuery 7     │  3723.66 ms │                3594.94 ms │ no change │
└──────────────┴─────────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 45285.04ms │
│ Total Time (test_optimize_performance)   │ 45201.88ms │
│ Average Time (HEAD)                      │  5660.63ms │
│ Average Time (test_optimize_performance) │  5650.24ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          0 │
│ Queries with No Change                   │          8 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.08 ms │                   2.13 ms │     no change │
│ QQuery 1     │    48.56 ms │                  48.97 ms │     no change │
│ QQuery 2     │   137.52 ms │                 134.41 ms │     no change │
│ QQuery 3     │   167.57 ms │                 155.62 ms │ +1.08x faster │
│ QQuery 4     │  1042.74 ms │                1055.35 ms │     no change │
│ QQuery 5     │  1572.99 ms │                1507.22 ms │     no change │
│ QQuery 6     │     2.12 ms │                   2.10 ms │     no change │
│ QQuery 7     │    57.13 ms │                  55.29 ms │     no change │
│ QQuery 8     │  1474.59 ms │                1426.73 ms │     no change │
│ QQuery 9     │  1860.90 ms │                1777.15 ms │     no change │
│ QQuery 10    │   397.56 ms │                 362.40 ms │ +1.10x faster │
│ QQuery 11    │   450.28 ms │                 411.08 ms │ +1.10x faster │
│ QQuery 12    │  1412.17 ms │                1297.97 ms │ +1.09x faster │
│ QQuery 13    │  2168.02 ms │                2072.30 ms │     no change │
│ QQuery 14    │  1292.44 ms │                1229.62 ms │     no change │
│ QQuery 15    │  1199.12 ms │                1225.69 ms │     no change │
│ QQuery 16    │  2724.09 ms │                2661.54 ms │     no change │
│ QQuery 17    │  2715.72 ms │                2649.10 ms │     no change │
│ QQuery 18    │  5400.64 ms │                4958.93 ms │ +1.09x faster │
│ QQuery 19    │   129.37 ms │                 125.23 ms │     no change │
│ QQuery 20    │  2092.01 ms │                1916.74 ms │ +1.09x faster │
│ QQuery 21    │  2416.15 ms │                2245.82 ms │ +1.08x faster │
│ QQuery 22    │  4138.71 ms │                3914.74 ms │ +1.06x faster │
│ QQuery 23    │ 14761.76 ms │               19908.31 ms │  1.35x slower │
│ QQuery 24    │   753.68 ms │                 696.32 ms │ +1.08x faster │
│ QQuery 25    │   534.43 ms │                 464.47 ms │ +1.15x faster │
│ QQuery 26    │   750.69 ms │                 681.99 ms │ +1.10x faster │
│ QQuery 27    │  2972.04 ms │                2792.18 ms │ +1.06x faster │
│ QQuery 28    │ 23351.69 ms │               23764.67 ms │     no change │
│ QQuery 29    │   990.11 ms │                 986.79 ms │     no change │
│ QQuery 30    │  1369.87 ms │                1309.31 ms │     no change │
│ QQuery 31    │  1343.90 ms │                1372.71 ms │     no change │
│ QQuery 32    │  4746.87 ms │                5037.21 ms │  1.06x slower │
│ QQuery 33    │  5944.18 ms │                5809.44 ms │     no change │
│ QQuery 34    │  6205.65 ms │                5776.77 ms │ +1.07x faster │
│ QQuery 35    │  2109.77 ms │                2004.97 ms │     no change │
│ QQuery 36    │   122.97 ms │                 116.79 ms │ +1.05x faster │
│ QQuery 37    │    55.51 ms │                  51.52 ms │ +1.08x faster │
│ QQuery 38    │   123.02 ms │                 120.74 ms │     no change │
│ QQuery 39    │   200.24 ms │                 191.99 ms │     no change │
│ QQuery 40    │    46.58 ms │                  43.01 ms │ +1.08x faster │
│ QQuery 41    │    40.79 ms │                  39.86 ms │     no change │
│ QQuery 42    │    33.04 ms │                  33.02 ms │     no change │
└──────────────┴─────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │  99359.23ms │
│ Total Time (test_optimize_performance)   │ 102438.21ms │
│ Average Time (HEAD)                      │   2310.68ms │
│ Average Time (test_optimize_performance) │   2382.28ms │
│ Queries Faster                           │          16 │
│ Queries Slower                           │           2 │
│ Queries with No Change                   │          25 │
│ Queries with Failure                     │           0 │
└──────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.89 ms │                 141.20 ms │ +1.23x faster │
│ QQuery 2     │  26.92 ms │                  28.28 ms │  1.05x slower │
│ QQuery 3     │  44.98 ms │                  45.73 ms │     no change │
│ QQuery 4     │  26.92 ms │                  28.25 ms │     no change │
│ QQuery 5     │  72.93 ms │                  83.34 ms │  1.14x slower │
│ QQuery 6     │  19.37 ms │                  19.34 ms │     no change │
│ QQuery 7     │ 148.24 ms │                 153.04 ms │     no change │
│ QQuery 8     │  30.91 ms │                  31.91 ms │     no change │
│ QQuery 9     │  80.96 ms │                  85.43 ms │  1.06x slower │
│ QQuery 10    │  58.14 ms │                  60.40 ms │     no change │
│ QQuery 11    │  40.39 ms │                  41.25 ms │     no change │
│ QQuery 12    │  50.52 ms │                  53.18 ms │  1.05x slower │
│ QQuery 13    │  44.75 ms │                  45.09 ms │     no change │
│ QQuery 14    │  12.90 ms │                  13.60 ms │  1.05x slower │
│ QQuery 15    │  23.72 ms │                  23.76 ms │     no change │
│ QQuery 16    │  23.52 ms │                  23.59 ms │     no change │
│ QQuery 17    │ 144.02 ms │                 146.16 ms │     no change │
│ QQuery 18    │ 315.19 ms │                 274.40 ms │ +1.15x faster │
│ QQuery 19    │  36.32 ms │                  38.11 ms │     no change │
│ QQuery 20    │  47.08 ms │                  49.90 ms │  1.06x slower │
│ QQuery 21    │ 217.78 ms │                 207.04 ms │     no change │
│ QQuery 22    │  19.33 ms │                  19.57 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1658.75ms │
│ Total Time (test_optimize_performance)   │ 1612.59ms │
│ Average Time (HEAD)                      │   75.40ms │
│ Average Time (test_optimize_performance) │   73.30ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        14 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

@zhuqi-lucas
Copy link
Contributor Author

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0     │  2685.21 ms │                2664.70 ms │ no change │
│ QQuery 1     │  1348.85 ms │                1290.72 ms │ no change │
│ QQuery 2     │  2477.94 ms │                2473.30 ms │ no change │
│ QQuery 3     │  1170.40 ms │                1195.86 ms │ no change │
│ QQuery 4     │  2242.24 ms │                2301.25 ms │ no change │
│ QQuery 5     │ 27334.81 ms │               27544.94 ms │ no change │
│ QQuery 6     │  4301.92 ms │                4136.17 ms │ no change │
│ QQuery 7     │  3723.66 ms │                3594.94 ms │ no change │
└──────────────┴─────────────┴───────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 45285.04ms │
│ Total Time (test_optimize_performance)   │ 45201.88ms │
│ Average Time (HEAD)                      │  5660.63ms │
│ Average Time (test_optimize_performance) │  5650.24ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          0 │
│ Queries with No Change                   │          8 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.08 ms │                   2.13 ms │     no change │
│ QQuery 1     │    48.56 ms │                  48.97 ms │     no change │
│ QQuery 2     │   137.52 ms │                 134.41 ms │     no change │
│ QQuery 3     │   167.57 ms │                 155.62 ms │ +1.08x faster │
│ QQuery 4     │  1042.74 ms │                1055.35 ms │     no change │
│ QQuery 5     │  1572.99 ms │                1507.22 ms │     no change │
│ QQuery 6     │     2.12 ms │                   2.10 ms │     no change │
│ QQuery 7     │    57.13 ms │                  55.29 ms │     no change │
│ QQuery 8     │  1474.59 ms │                1426.73 ms │     no change │
│ QQuery 9     │  1860.90 ms │                1777.15 ms │     no change │
│ QQuery 10    │   397.56 ms │                 362.40 ms │ +1.10x faster │
│ QQuery 11    │   450.28 ms │                 411.08 ms │ +1.10x faster │
│ QQuery 12    │  1412.17 ms │                1297.97 ms │ +1.09x faster │
│ QQuery 13    │  2168.02 ms │                2072.30 ms │     no change │
│ QQuery 14    │  1292.44 ms │                1229.62 ms │     no change │
│ QQuery 15    │  1199.12 ms │                1225.69 ms │     no change │
│ QQuery 16    │  2724.09 ms │                2661.54 ms │     no change │
│ QQuery 17    │  2715.72 ms │                2649.10 ms │     no change │
│ QQuery 18    │  5400.64 ms │                4958.93 ms │ +1.09x faster │
│ QQuery 19    │   129.37 ms │                 125.23 ms │     no change │
│ QQuery 20    │  2092.01 ms │                1916.74 ms │ +1.09x faster │
│ QQuery 21    │  2416.15 ms │                2245.82 ms │ +1.08x faster │
│ QQuery 22    │  4138.71 ms │                3914.74 ms │ +1.06x faster │
│ QQuery 23    │ 14761.76 ms │               19908.31 ms │  1.35x slower │
│ QQuery 24    │   753.68 ms │                 696.32 ms │ +1.08x faster │
│ QQuery 25    │   534.43 ms │                 464.47 ms │ +1.15x faster │
│ QQuery 26    │   750.69 ms │                 681.99 ms │ +1.10x faster │
│ QQuery 27    │  2972.04 ms │                2792.18 ms │ +1.06x faster │
│ QQuery 28    │ 23351.69 ms │               23764.67 ms │     no change │
│ QQuery 29    │   990.11 ms │                 986.79 ms │     no change │
│ QQuery 30    │  1369.87 ms │                1309.31 ms │     no change │
│ QQuery 31    │  1343.90 ms │                1372.71 ms │     no change │
│ QQuery 32    │  4746.87 ms │                5037.21 ms │  1.06x slower │
│ QQuery 33    │  5944.18 ms │                5809.44 ms │     no change │
│ QQuery 34    │  6205.65 ms │                5776.77 ms │ +1.07x faster │
│ QQuery 35    │  2109.77 ms │                2004.97 ms │     no change │
│ QQuery 36    │   122.97 ms │                 116.79 ms │ +1.05x faster │
│ QQuery 37    │    55.51 ms │                  51.52 ms │ +1.08x faster │
│ QQuery 38    │   123.02 ms │                 120.74 ms │     no change │
│ QQuery 39    │   200.24 ms │                 191.99 ms │     no change │
│ QQuery 40    │    46.58 ms │                  43.01 ms │ +1.08x faster │
│ QQuery 41    │    40.79 ms │                  39.86 ms │     no change │
│ QQuery 42    │    33.04 ms │                  33.02 ms │     no change │
└──────────────┴─────────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │  99359.23ms │
│ Total Time (test_optimize_performance)   │ 102438.21ms │
│ Average Time (HEAD)                      │   2310.68ms │
│ Average Time (test_optimize_performance) │   2382.28ms │
│ Queries Faster                           │          16 │
│ Queries Slower                           │           2 │
│ Queries with No Change                   │          25 │
│ Queries with Failure                     │           0 │
└──────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.89 ms │                 141.20 ms │ +1.23x faster │
│ QQuery 2     │  26.92 ms │                  28.28 ms │  1.05x slower │
│ QQuery 3     │  44.98 ms │                  45.73 ms │     no change │
│ QQuery 4     │  26.92 ms │                  28.25 ms │     no change │
│ QQuery 5     │  72.93 ms │                  83.34 ms │  1.14x slower │
│ QQuery 6     │  19.37 ms │                  19.34 ms │     no change │
│ QQuery 7     │ 148.24 ms │                 153.04 ms │     no change │
│ QQuery 8     │  30.91 ms │                  31.91 ms │     no change │
│ QQuery 9     │  80.96 ms │                  85.43 ms │  1.06x slower │
│ QQuery 10    │  58.14 ms │                  60.40 ms │     no change │
│ QQuery 11    │  40.39 ms │                  41.25 ms │     no change │
│ QQuery 12    │  50.52 ms │                  53.18 ms │  1.05x slower │
│ QQuery 13    │  44.75 ms │                  45.09 ms │     no change │
│ QQuery 14    │  12.90 ms │                  13.60 ms │  1.05x slower │
│ QQuery 15    │  23.72 ms │                  23.76 ms │     no change │
│ QQuery 16    │  23.52 ms │                  23.59 ms │     no change │
│ QQuery 17    │ 144.02 ms │                 146.16 ms │     no change │
│ QQuery 18    │ 315.19 ms │                 274.40 ms │ +1.15x faster │
│ QQuery 19    │  36.32 ms │                  38.11 ms │     no change │
│ QQuery 20    │  47.08 ms │                  49.90 ms │  1.06x slower │
│ QQuery 21    │ 217.78 ms │                 207.04 ms │     no change │
│ QQuery 22    │  19.33 ms │                  19.57 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1658.75ms │
│ Total Time (test_optimize_performance)   │ 1612.59ms │
│ Average Time (HEAD)                      │   75.40ms │
│ Average Time (test_optimize_performance) │   73.30ms │
│ Queries Faster                           │         2 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        14 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

Thank you @alamb , the result is amazing, 16 queries faster for clickbench!

But one query has regression:

QQuery 2314761.76 ms │               19908.31 ms │  1.35x slower │

I need to investigate this one. 🤔

@zhuqi-lucas
Copy link
Contributor Author

I merged upstream/main to this branch first before investigation.

@alamb
Copy link
Contributor

alamb commented Aug 23, 2025

I merged upstream/main to this branch first before investigation.

I also made some changes to my benchmark machine that hopefully will result in less noise. I'll rerun the benchmarks for this one

@alamb
Copy link
Contributor

alamb commented Aug 23, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing test_optimize_performance (772b590) to f363e38 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Aug 23, 2025

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2614.41 ms │                2725.73 ms │    no change │
│ QQuery 1     │  1260.75 ms │                1452.31 ms │ 1.15x slower │
│ QQuery 2     │  2295.26 ms │                2649.75 ms │ 1.15x slower │
│ QQuery 3     │  1184.59 ms │                1152.49 ms │    no change │
│ QQuery 4     │  2269.45 ms │                2242.90 ms │    no change │
│ QQuery 5     │ 27541.43 ms │               27751.51 ms │    no change │
│ QQuery 6     │  4130.19 ms │                4220.93 ms │    no change │
│ QQuery 7     │  3572.17 ms │                3513.42 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 44868.24ms │
│ Total Time (test_optimize_performance)   │ 45709.03ms │
│ Average Time (HEAD)                      │  5608.53ms │
│ Average Time (test_optimize_performance) │  5713.63ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          2 │
│ Queries with No Change                   │          6 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20 ms │                   2.18 ms │    no change │
│ QQuery 1     │    50.53 ms │                  48.87 ms │    no change │
│ QQuery 2     │   133.87 ms │                 140.28 ms │    no change │
│ QQuery 3     │   163.92 ms │                 161.85 ms │    no change │
│ QQuery 4     │  1021.77 ms │                1033.69 ms │    no change │
│ QQuery 5     │  1487.53 ms │                1517.93 ms │    no change │
│ QQuery 6     │     2.22 ms │                   2.16 ms │    no change │
│ QQuery 7     │    57.79 ms │                  55.07 ms │    no change │
│ QQuery 8     │  1441.62 ms │                1431.13 ms │    no change │
│ QQuery 9     │  1792.87 ms │                1836.50 ms │    no change │
│ QQuery 10    │   374.92 ms │                 363.98 ms │    no change │
│ QQuery 11    │   425.92 ms │                 421.32 ms │    no change │
│ QQuery 12    │  1326.52 ms │                1328.99 ms │    no change │
│ QQuery 13    │  2057.02 ms │                2120.92 ms │    no change │
│ QQuery 14    │  1249.08 ms │                1229.40 ms │    no change │
│ QQuery 15    │  1168.91 ms │                1198.23 ms │    no change │
│ QQuery 16    │  2566.31 ms │                2644.95 ms │    no change │
│ QQuery 17    │  2564.70 ms │                2631.72 ms │    no change │
│ QQuery 18    │  4723.51 ms │                4912.03 ms │    no change │
│ QQuery 19    │   127.87 ms │                 128.76 ms │    no change │
│ QQuery 20    │  1940.82 ms │                2009.35 ms │    no change │
│ QQuery 21    │  2251.53 ms │                2352.73 ms │    no change │
│ QQuery 22    │  3884.33 ms │                4004.28 ms │    no change │
│ QQuery 23    │ 13639.56 ms │               14193.82 ms │    no change │
│ QQuery 24    │   250.30 ms │                 243.78 ms │    no change │
│ QQuery 25    │   499.68 ms │                 486.95 ms │    no change │
│ QQuery 26    │   262.39 ms │                 260.13 ms │    no change │
│ QQuery 27    │  2868.81 ms │                2814.77 ms │    no change │
│ QQuery 28    │ 22939.29 ms │               25294.41 ms │ 1.10x slower │
│ QQuery 29    │   958.36 ms │                 985.74 ms │    no change │
│ QQuery 30    │  1297.41 ms │                1341.45 ms │    no change │
│ QQuery 31    │  1326.48 ms │                1372.63 ms │    no change │
│ QQuery 32    │  4328.02 ms │                5180.80 ms │ 1.20x slower │
│ QQuery 33    │  5453.75 ms │                6000.43 ms │ 1.10x slower │
│ QQuery 34    │  5608.67 ms │                5863.77 ms │    no change │
│ QQuery 35    │  1996.75 ms │                1994.46 ms │    no change │
│ QQuery 36    │   120.31 ms │                 124.06 ms │    no change │
│ QQuery 37    │    52.60 ms │                  52.76 ms │    no change │
│ QQuery 38    │   120.66 ms │                 124.79 ms │    no change │
│ QQuery 39    │   197.13 ms │                 195.29 ms │    no change │
│ QQuery 40    │    43.18 ms │                  43.54 ms │    no change │
│ QQuery 41    │    41.33 ms │                  41.13 ms │    no change │
│ QQuery 42    │    33.71 ms │                  33.58 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 92854.15ms │
│ Total Time (test_optimize_performance)   │ 98224.61ms │
│ Average Time (HEAD)                      │  2159.40ms │
│ Average Time (test_optimize_performance) │  2284.29ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          3 │
│ Queries with No Change                   │         40 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.79 ms │                 134.35 ms │ +1.29x faster │
│ QQuery 2     │  27.18 ms │                  27.18 ms │     no change │
│ QQuery 3     │  43.77 ms │                  45.16 ms │     no change │
│ QQuery 4     │  26.44 ms │                  27.45 ms │     no change │
│ QQuery 5     │  73.91 ms │                  83.45 ms │  1.13x slower │
│ QQuery 6     │  20.08 ms │                  18.92 ms │ +1.06x faster │
│ QQuery 7     │ 144.03 ms │                 152.34 ms │  1.06x slower │
│ QQuery 8     │  30.80 ms │                  32.89 ms │  1.07x slower │
│ QQuery 9     │  82.79 ms │                  84.74 ms │     no change │
│ QQuery 10    │  57.72 ms │                  61.64 ms │  1.07x slower │
│ QQuery 11    │  40.67 ms │                  42.59 ms │     no change │
│ QQuery 12    │  49.27 ms │                  52.10 ms │  1.06x slower │
│ QQuery 13    │  46.11 ms │                  47.02 ms │     no change │
│ QQuery 14    │  13.64 ms │                  13.76 ms │     no change │
│ QQuery 15    │  23.71 ms │                  24.16 ms │     no change │
│ QQuery 16    │  23.30 ms │                  24.44 ms │     no change │
│ QQuery 17    │ 145.84 ms │                 147.27 ms │     no change │
│ QQuery 18    │ 310.92 ms │                 264.90 ms │ +1.17x faster │
│ QQuery 19    │  36.55 ms │                  39.47 ms │  1.08x slower │
│ QQuery 20    │  48.07 ms │                  48.88 ms │     no change │
│ QQuery 21    │ 213.20 ms │                 205.80 ms │     no change │
│ QQuery 22    │  20.34 ms │                  19.41 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1652.13ms │
│ Total Time (test_optimize_performance)   │ 1597.93ms │
│ Average Time (HEAD)                      │   75.10ms │
│ Average Time (test_optimize_performance) │   72.63ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        13 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Aug 23, 2025

🤖: Benchmark completed

Details

Comparing HEAD and test_optimize_performance
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2614.41 ms │                2725.73 ms │    no change │
│ QQuery 1     │  1260.75 ms │                1452.31 ms │ 1.15x slower │
│ QQuery 2     │  2295.26 ms │                2649.75 ms │ 1.15x slower │
│ QQuery 3     │  1184.59 ms │                1152.49 ms │    no change │
│ QQuery 4     │  2269.45 ms │                2242.90 ms │    no change │
│ QQuery 5     │ 27541.43 ms │               27751.51 ms │    no change │
│ QQuery 6     │  4130.19 ms │                4220.93 ms │    no change │
│ QQuery 7     │  3572.17 ms │                3513.42 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 44868.24ms │
│ Total Time (test_optimize_performance)   │ 45709.03ms │
│ Average Time (HEAD)                      │  5608.53ms │
│ Average Time (test_optimize_performance) │  5713.63ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          2 │
│ Queries with No Change                   │          6 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ test_optimize_performance ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20 ms │                   2.18 ms │    no change │
│ QQuery 1     │    50.53 ms │                  48.87 ms │    no change │
│ QQuery 2     │   133.87 ms │                 140.28 ms │    no change │
│ QQuery 3     │   163.92 ms │                 161.85 ms │    no change │
│ QQuery 4     │  1021.77 ms │                1033.69 ms │    no change │
│ QQuery 5     │  1487.53 ms │                1517.93 ms │    no change │
│ QQuery 6     │     2.22 ms │                   2.16 ms │    no change │
│ QQuery 7     │    57.79 ms │                  55.07 ms │    no change │
│ QQuery 8     │  1441.62 ms │                1431.13 ms │    no change │
│ QQuery 9     │  1792.87 ms │                1836.50 ms │    no change │
│ QQuery 10    │   374.92 ms │                 363.98 ms │    no change │
│ QQuery 11    │   425.92 ms │                 421.32 ms │    no change │
│ QQuery 12    │  1326.52 ms │                1328.99 ms │    no change │
│ QQuery 13    │  2057.02 ms │                2120.92 ms │    no change │
│ QQuery 14    │  1249.08 ms │                1229.40 ms │    no change │
│ QQuery 15    │  1168.91 ms │                1198.23 ms │    no change │
│ QQuery 16    │  2566.31 ms │                2644.95 ms │    no change │
│ QQuery 17    │  2564.70 ms │                2631.72 ms │    no change │
│ QQuery 18    │  4723.51 ms │                4912.03 ms │    no change │
│ QQuery 19    │   127.87 ms │                 128.76 ms │    no change │
│ QQuery 20    │  1940.82 ms │                2009.35 ms │    no change │
│ QQuery 21    │  2251.53 ms │                2352.73 ms │    no change │
│ QQuery 22    │  3884.33 ms │                4004.28 ms │    no change │
│ QQuery 23    │ 13639.56 ms │               14193.82 ms │    no change │
│ QQuery 24    │   250.30 ms │                 243.78 ms │    no change │
│ QQuery 25    │   499.68 ms │                 486.95 ms │    no change │
│ QQuery 26    │   262.39 ms │                 260.13 ms │    no change │
│ QQuery 27    │  2868.81 ms │                2814.77 ms │    no change │
│ QQuery 28    │ 22939.29 ms │               25294.41 ms │ 1.10x slower │
│ QQuery 29    │   958.36 ms │                 985.74 ms │    no change │
│ QQuery 30    │  1297.41 ms │                1341.45 ms │    no change │
│ QQuery 31    │  1326.48 ms │                1372.63 ms │    no change │
│ QQuery 32    │  4328.02 ms │                5180.80 ms │ 1.20x slower │
│ QQuery 33    │  5453.75 ms │                6000.43 ms │ 1.10x slower │
│ QQuery 34    │  5608.67 ms │                5863.77 ms │    no change │
│ QQuery 35    │  1996.75 ms │                1994.46 ms │    no change │
│ QQuery 36    │   120.31 ms │                 124.06 ms │    no change │
│ QQuery 37    │    52.60 ms │                  52.76 ms │    no change │
│ QQuery 38    │   120.66 ms │                 124.79 ms │    no change │
│ QQuery 39    │   197.13 ms │                 195.29 ms │    no change │
│ QQuery 40    │    43.18 ms │                  43.54 ms │    no change │
│ QQuery 41    │    41.33 ms │                  41.13 ms │    no change │
│ QQuery 42    │    33.71 ms │                  33.58 ms │    no change │
└──────────────┴─────────────┴───────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 92854.15ms │
│ Total Time (test_optimize_performance)   │ 98224.61ms │
│ Average Time (HEAD)                      │  2159.40ms │
│ Average Time (test_optimize_performance) │  2284.29ms │
│ Queries Faster                           │          0 │
│ Queries Slower                           │          3 │
│ Queries with No Change                   │         40 │
│ Queries with Failure                     │          0 │
└──────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ test_optimize_performance ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 173.79 ms │                 134.35 ms │ +1.29x faster │
│ QQuery 2     │  27.18 ms │                  27.18 ms │     no change │
│ QQuery 3     │  43.77 ms │                  45.16 ms │     no change │
│ QQuery 4     │  26.44 ms │                  27.45 ms │     no change │
│ QQuery 5     │  73.91 ms │                  83.45 ms │  1.13x slower │
│ QQuery 6     │  20.08 ms │                  18.92 ms │ +1.06x faster │
│ QQuery 7     │ 144.03 ms │                 152.34 ms │  1.06x slower │
│ QQuery 8     │  30.80 ms │                  32.89 ms │  1.07x slower │
│ QQuery 9     │  82.79 ms │                  84.74 ms │     no change │
│ QQuery 10    │  57.72 ms │                  61.64 ms │  1.07x slower │
│ QQuery 11    │  40.67 ms │                  42.59 ms │     no change │
│ QQuery 12    │  49.27 ms │                  52.10 ms │  1.06x slower │
│ QQuery 13    │  46.11 ms │                  47.02 ms │     no change │
│ QQuery 14    │  13.64 ms │                  13.76 ms │     no change │
│ QQuery 15    │  23.71 ms │                  24.16 ms │     no change │
│ QQuery 16    │  23.30 ms │                  24.44 ms │     no change │
│ QQuery 17    │ 145.84 ms │                 147.27 ms │     no change │
│ QQuery 18    │ 310.92 ms │                 264.90 ms │ +1.17x faster │
│ QQuery 19    │  36.55 ms │                  39.47 ms │  1.08x slower │
│ QQuery 20    │  48.07 ms │                  48.88 ms │     no change │
│ QQuery 21    │ 213.20 ms │                 205.80 ms │     no change │
│ QQuery 22    │  20.34 ms │                  19.41 ms │     no change │
└──────────────┴───────────┴───────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                        ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                        │ 1652.13ms │
│ Total Time (test_optimize_performance)   │ 1597.93ms │
│ Average Time (HEAD)                      │   75.10ms │
│ Average Time (test_optimize_performance) │   72.63ms │
│ Queries Faster                           │         3 │
│ Queries Slower                           │         6 │
│ Queries with No Change                   │        13 │
│ Queries with Failure                     │         0 │
└──────────────────────────────────────────┴───────────┘

Thank you @alamb , it seems regression for clickbench for this PR. 🤔

I updated the branch again now since some PRs merged to main branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants