Skip to content

Conversation

@AlexGuteniev
Copy link
Contributor

Towards #625, specifically #625 (comment) items 5, 6, 7

Follow up to #5769

⚙️ Optimization

Similarly to transform, we can map bool functor to underlying integer vector functor. Although we have to do the operation on integer of bits after this.

identity looking like ideal functor to use, as bools are already boolean, and we just need to apply the algorithm. We can also support logical_not, which we have already mapped, although it is obscure to use.

Sure identity is C++20, but that's fine to optimize only new standard modes.

As the algorithm uses one range, we can handle any offset without introducing too much complexity. Some complexity is still introduced by this though.

As the resulting flow is not very simple, but is the same for all three, the traits approach from the vector algorithms seems suitable.

Vectorization

Auto vectorization does not work here. @Alcaro suggested that from abstract machine view point the memory load would be invented if they happen past the early return point.

We can vectorize manually. It will be inefficient if early return happens really early, but may be a good idea overall. Not this time though.

⏱️ Benchmark results

Benchmark Before After Speedup
meow_of<alg::any_, content::zeros_then_ones>/64 26.8 ns 1.42 ns 18.9
meow_of<alg::any_, content::zeros_then_ones>/4096 1767 ns 26.3 ns 67.2
meow_of<alg::any_, content::zeros_then_ones>/65536 25751 ns 253 ns 102
meow_of<alg::any_, content::ones_then_zeros, not_>/64 27.0 ns 1.40 ns 19.3
meow_of<alg::any_, content::ones_then_zeros, not_>/4096 1859 ns 32.7 ns 56.9
meow_of<alg::any_, content::ones_then_zeros, not_>/65536 27850 ns 257 ns 108
meow_of<alg::all_, content::ones_then_zeros>/64 33.4 ns 1.39 ns 24
meow_of<alg::all_, content::ones_then_zeros>/4096 1919 ns 27.0 ns 71
meow_of<alg::all_, content::ones_then_zeros>/65536 30449 ns 251 ns 121
meow_of<alg::none_, content::zeros_then_ones>/64 28.7 ns 1.45 ns 19.8
meow_of<alg::none_, content::zeros_then_ones>/4096 1916 ns 25.9 ns 74
meow_of<alg::none_, content::zeros_then_ones>/65536 29903 ns 248 ns 121

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner October 24, 2025 17:27
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Oct 24, 2025
@AlexGuteniev AlexGuteniev changed the title Optimize all_of / any_of / none_of for vector<bool> with some predicates Optimize all_of / any_of / none_of for vector<bool> with some predicates Oct 24, 2025
@StephanTLavavej StephanTLavavej added the performance Must go faster label Oct 24, 2025
@StephanTLavavej StephanTLavavej self-assigned this Oct 24, 2025
template <alg Alg, content Content, class Pred = identity>
void meow_of(benchmark::State& state) {
const auto size = static_cast<size_t>(state.range(0));
vector<bool, not_highly_aligned_allocator<bool>> source(size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not currently vectorized, so not_highly_aligned_allocator currently serves no purpose, as not-overaligning only matters for vectorization, but:

  • It can be vectorized manually in the future
  • gcc with -O3 would vectorize this, so the target compilers someday also might

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Status: Initial Review

Development

Successfully merging this pull request may close these issues.

3 participants