Skip to content

Conversation

bremoran
Copy link

@bremoran bremoran commented Sep 10, 2025

On 32-bit architectures, each call to mld_keccakf1600_xor_bytes incurs an overhead. For example, on Arm v7-M and Arm v8-M and using the optimised bit interleave from xkcp xoring a lane into the state incurs an overhead of 37 instructions. Any time an incomplete lane is xored into the state, this penalty is paid twice. This PR ensures that only full lanes are xored into the state.

Fixes #445

@bremoran bremoran requested a review from a team as a code owner September 10, 2025 13:36
@rod-chapman
Copy link
Contributor

Please provide a description for this PR. What is the point of this refactoring? What benefit does it bring? Please provide CBMC proof harness and Makefile for any new functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Low performance in mld_H
2 participants