You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Zeroing the stack is a huge part of the total cost of cross-compartment
calls. Running on Sail (reporting retired instructions), the
compartment-switcher benchmark (from #37) reports (stack size,
call+return, call, return):
0x100 213 127 86
0x200 405 223 182
0x400 789 415 374
0x800 1557 799 758
0x1000 3093 1567 1526
If we skip stack zeroing, the numbers are 95, 67, 28 for all of them.
This means that, even with tiny stacks, zeroing is more than half of the
cost of a compartment switch and that cost grows with larger
compartments.
We currently require three instructions per store in the zeroing loop.
This commit unrolls the loop so that we zero 32 bytes at a time as long
as it can, then 16 bytes. The ABI mandates 16-byte alignment for the
stack, so we now force the top and bottom of the callee's stack chunk to
be 16-byte aligned and assume that we're under attack if this is not the
case. It turned out that the loader was not guaranteeing this (though
it happened to be true for everything in the repo except the allocator
benchmark, which is a special snowflake). The loader now rounds all
[trusted] stack allocations up to a multiple of 16 bytes.
The results are now:
0x100 166 106 60
0x200 262 154 108
0x400 454 250 204
0x800 838 442 396
0x1000 1606 826 780
512 bytes is probably the smallest stack size that makes sense for a
thread and here we see almost a 25% speedup. This has a fairly
noticeable impact on the test suite too:
Before:
Test runner: All tests finished in 2385962 cycles
After:
Test runner: All tests finished in 2082193 cycles
0 commit comments