bench: replace wall-clock timer with per-process CPU timer #1732

Raimo33 · 2025-09-02T14:58:38Z

Goal

This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

real-or-random · 2025-09-02T15:05:16Z

Just some quick comments:

[x] remove the number of runs (count) in favor of simpler cleaner approach with just number of iterations (iter).

I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

[x] remove min and max statistics in favor of simpler approach with just avg.

I think min and max are useful. For constant-time code, you can also compare min. And max gives you an idea if there were fluctuations or not.

[x] remove needless fixed point conversion in favor of simpler floating point divisions.

Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D

real-or-random · 2025-09-02T15:06:45Z

It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches.

Raimo33 · 2025-09-02T15:09:22Z

I think min and max just complicate things. let me explain:
first of all, as it is right now, they don't even measure the min and max, they just measure the min/max of the averages of all runs. aka not the absolute. Furthermore, in order to have them, one would need to run all the iterations 10 times more. benchmarks are already slow, adding this min/max slows them by 10 fold. imho it's completely unnecessary. @real-or-random

sipa · 2025-09-02T15:15:13Z

If we're going to rework this, I'd suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:

StQ1: the average of all samples between 1st and 3rd octile
StQ2: the average of all samples between 3rd and 5th octile
StQ3: the average of all samples between 5th and 7th octile

Raimo33 · 2025-09-02T15:23:30Z

I think there's a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count).

therefore count is only useful to extrapolate min and max

Raimo33 · 2025-09-02T15:33:42Z

Well, okay, that has a history; see #689. It's debatable if it makes sense to avoid floating point math, but as long as it doesn't get in your way here, it's a cool thing to keep it. :D

I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren't even in the hot path, they're outside the benchmarks.

sipa · 2025-09-02T15:42:33Z

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

Raimo33 · 2025-09-02T15:57:38Z

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

Raimo33 · 2025-09-02T17:32:13Z

by the way, gettimeofday() is officially discouraged since 2008 in favor of clock_gettime(). The POSIX standard marks it as obsolescent but still provides the API for backward compatibility.

Raimo33 · 2025-09-02T19:45:56Z

even though the manual says that CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core, modern CPUs have largely addressed this issue. So I think it is fair to compile CLOCK_PROCESS_CPUTIME_ID even though we don't have the guarantee that the user has pinned the benchmarking process to a core. The worst case scenario is a unreliable benchmark, which the current repo has anyways.

I added a line in the README.md for best practices to run the benchmarks.

I also tried adding a function to pin the process to a core directly in C, but there's no standard POSIX compliant way to do so. There is pthread_setaffinity_np() on linux, where 'np' stands for 'not portable'

real-or-random · 2025-09-02T20:47:36Z

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

The point is exactly having a simple way of verifying that there's indeed no interference. Getting rid of sources of variance is hard to get right, and it's impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

I like the stabilized quartiles idea.

Raimo33 · 2025-09-02T20:52:37Z

I like the stabilized quartiles idea.

tbh it scares me a bit, will see what I can do. Maybe in a future PR.

CMakeLists.txt

README.md

src/bench.h

real-or-random · 2025-09-08T08:43:36Z

The point is exactly having a simple way of verifying that there's indeed no interference. Getting rid of sources of variance is hard to get right, and it's impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

fine, but at least let's make it optional. I don't like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.

Not sure. I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

src/bench.h

Raimo33 · 2025-09-09T09:11:26Z

I've applied the recommended changes and split the PR into multiple commits for simpler diffs

l0rinc · 2025-09-09T17:09:04Z

I haven't investigated this in detail but couldn't find any related issue: were there any efforts to use https://nanobench.ankerl.com like we do in Core instead? That will likely need some adjustments on the benchmarks themselves, but at least we wouldn't be rolling our own framework.

Raimo33 · 2025-09-09T17:51:14Z

not familiar with nano bench, but I can see it uses std::chrono. which uses CLOCK_MONOTONIC (our fallback) under the hood, not TSC.

also the user would need to have a c++ compiler, not a big real but I think this repo needs to strictly adhere to c89.

Raimo33 · 2025-09-09T18:02:17Z

I realized the windows version I implemented was using a unprecise clock. drafting. will fix.

This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.

Raimo33 · 2025-09-09T23:12:57Z

I changed windows clock to QueryPerformanceCounter() which has microsecond precision.
I also removed if checks from clock_gettime() as it can only fail for programming errors.

real-or-random · 2025-09-10T06:26:01Z

I haven't investigated this in detail but couldn't find any related issue: were there any efforts to use nanobench.ankerl.com like we do in Core instead?

nanobench is a framework for benchmarking C++ but this is C. Of course, we could still call our C from C++, but adding C++ to the code base seems a bit overkill to me if if's "just" for the purpose of benchmarks. I agree that using an existing framework may be a good idea, but I'm not aware of any in pure C.

l0rinc · 2025-09-10T21:57:37Z

but adding C++ to the code base seems a bit overkill to me if if's "just" for the purpose of benchmarks

The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren't strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves.
The main client of this library is C++ code, it doesn't seem far-fetched to me to test that.

real-or-random · 2025-09-15T18:36:56Z

but adding C++ to the code base seems a bit overkill to me if if's "just" for the purpose of benchmarks

The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren't strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn't seem far-fetched to me to test that.

Sorry, I didn't want to be discouraging. I don't think at all that nanobench is a far-fetched idea. And I truly agree that reinventing the wheel is not great.

In the end, what counts is the maintenance burden. My main concern with C++ is that I'm not sure how fluent our regular contributors are in C++. (I'm not, but luckily I'm not the only one here.) If the bit of C++ is harder to touch (for us) than this code, then nanobench is not a good idea. If, on the other hand, we have people who can maintain C++, or if we get support from the Core contributors who are familiar with nanobench, then this can be a win.

I'd be happy to see a demo/WIP PR but if you feel that this is a lot of work, then it might be a good idea to see what other contributors and maintainers think.

Raimo33 marked this pull request as draft September 2, 2025 14:58

Raimo33 mentioned this pull request Sep 2, 2025

[WIP] Add intel simd #1703

Draft

2 tasks

Raimo33 force-pushed the benchmark-precise branch from 6aff035 to d456fad Compare September 2, 2025 15:22

Raimo33 force-pushed the benchmark-precise branch 3 times, most recently from 97e5264 to 254a014 Compare September 2, 2025 16:08

Raimo33 changed the title ~~[WIP] Refactor benchmark~~ [WIP] refactor: remove system interference from benchmarks Sep 2, 2025

Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from 1d9d6d0 to 4c9a074 Compare September 2, 2025 16:26

Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from ddeaede to 71dff3f Compare September 2, 2025 17:45

Raimo33 force-pushed the benchmark-precise branch 4 times, most recently from 3e43c75 to ef9e40e Compare September 2, 2025 20:31

Raimo33 force-pushed the benchmark-precise branch from ef9e40e to 66745f7 Compare September 2, 2025 23:56

Raimo33 marked this pull request as ready for review September 2, 2025 23:56

real-or-random reviewed Sep 3, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

src/bench.h Outdated Show resolved Hide resolved

Raimo33 force-pushed the benchmark-precise branch from f41048e to bc556d8 Compare September 6, 2025 21:57

Raimo33 force-pushed the benchmark-precise branch from bc556d8 to b13faa5 Compare September 8, 2025 10:33

Raimo33 changed the title ~~[WIP] refactor: remove system interference from benchmarks~~ bench: replace wall-clock timer with per-process CPU timer Sep 8, 2025

Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from 5540c89 to f9d2dde Compare September 8, 2025 12:02

sipa reviewed Sep 8, 2025

View reviewed changes

src/bench.h Outdated Show resolved Hide resolved

sipa reviewed Sep 8, 2025

View reviewed changes

src/bench.h Outdated Show resolved Hide resolved

Raimo33 force-pushed the benchmark-precise branch 2 times, most recently from a5eb2db to bbc8bb7 Compare September 8, 2025 16:04

real-or-random reviewed Sep 9, 2025

View reviewed changes

src/bench.h Outdated Show resolved Hide resolved

Raimo33 force-pushed the benchmark-precise branch from bbc8bb7 to 6c09cf3 Compare September 9, 2025 09:10

Raimo33 force-pushed the benchmark-precise branch from 6c09cf3 to 576f79f Compare September 9, 2025 09:16

real-or-random added the tweak/refactor label Sep 9, 2025

Raimo33 marked this pull request as draft September 9, 2025 18:02

Raimo33 force-pushed the benchmark-precise branch from 576f79f to 8242f55 Compare September 9, 2025 18:54

Raimo33 force-pushed the benchmark-precise branch from 8242f55 to 6ec4255 Compare September 9, 2025 23:10

Raimo33 marked this pull request as ready for review September 10, 2025 09:51

Raimo33 added 2 commits September 10, 2025 11:56

bench: print clock info

af6c524

docs: add benchmarking best practices

3e58a91

kmk142789 approved these changes Sep 16, 2025

View reviewed changes

bench: replace wall-clock timer with per-process CPU timer #1732

Are you sure you want to change the base?

bench: replace wall-clock timer with per-process CPU timer #1732

Conversation

Raimo33 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Uh oh!

real-or-random commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

real-or-random commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

sipa commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

sipa commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

real-or-random commented Sep 2, 2025

Uh oh!

Raimo33 commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

real-or-random commented Sep 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Raimo33 commented Sep 9, 2025

Uh oh!

l0rinc commented Sep 9, 2025

Uh oh!

Raimo33 commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Sep 9, 2025

Uh oh!

real-or-random commented Sep 10, 2025

Uh oh!

l0rinc commented Sep 10, 2025

Uh oh!

real-or-random commented Sep 15, 2025

Uh oh!

Uh oh!

Raimo33 commented Sep 2, 2025 •

edited

Loading

real-or-random commented Sep 2, 2025 •

edited

Loading

sipa commented Sep 2, 2025 •

edited

Loading

Raimo33 commented Sep 2, 2025 •

edited

Loading

sipa commented Sep 2, 2025 •

edited

Loading

Raimo33 commented Sep 2, 2025 •

edited

Loading

Raimo33 commented Sep 9, 2025 •

edited

Loading

Raimo33 commented Sep 9, 2025 •

edited

Loading