-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[WIP] Add intel simd #1703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WIP] Add intel simd #1703
Conversation
To precompute simd constants at the start, the best solution I found was doing something like this: #ifdef __SSE2__
static __m128i _128_vec_ones;
#endif
CONSTRUCTOR void simd_init(void)
{
#ifdef __SSE2__
_128_vec_ones = _mm_set1_epi8('1');
#endif
} where |
I'm constantly getting these warnings. Apparently they're harmless since I always use warning: cast increases required alignment of target type [-Wcast-align]
653 | _mm256_storeu_si256((__m256i *)r->v, out); The only fixes I found are:
|
f1524a9
to
ba4646b
Compare
I built this and ran the benchmarks on my machine ( Signing became faster, but verification became slower. After looking at the bench_internal results, I figured that the culprit is
This currently saves ~0.5% in Some more comments:
|
Beware that benchmarks are completely unreliable as of right now. see #1701 |
b050834
to
ac1cb71
Compare
d04b9f4
to
1127dce
Compare
890b234
to
d6b3f65
Compare
This adds avx and avx2 intrinsics support to the library in general, as discussed in #1700, wherever it yields an improvement as per the benchmarks.
Why not sse and avx512?
arm has different SIMD instruction set; it would be nice to have a separate PR implementing that as well. Maybe after this is merged...
Tasks
-mavx
,-mavx2
,-mno-avx
,-mno-avx2
when building for amd64TODO: precompute
)Commits
I've split this PR into multiple commits with the following criteria:
Test & Benchmark
To reproduce the following results I temporarily added 3 scripts for building, testing, benchmarking as well as a jupyter notebook to visualize results. You can verify yourself by running:
./simd-build.sh && ./simd-test.sh && ./simd-bench.sh
and executing the notebook as is.The baseline is compiled with
"-O3 -mavx -mavx2 -U__AVX__ -U__AVX2__"
so that spontaneous gcc vectorization is allowed, but my manual vectorization is not compiled.Results