-
-
Notifications
You must be signed in to change notification settings - Fork 1k
perf: soft revert of the performance regressions introduced in #1469 #2885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{ | ||
// MUST BE A METHOD AND AGGRESSIVELY INLINED, OTHERWISE THE JIT WILL NOT ELIMINATE THE BRANCH | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
private static bool IsUnalignedSafe() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it is a static readonly
field? Theoretically, doesn't the JIT evaluate those only once and uses them like a const
later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would think, but nope, here's disasm:
; Method C:CopyWithAlignmentFallback2(byref,byref,uint):float (FullOpts)
G_M000_IG01: ;; offset=0x0000
sub rsp, 40
G_M000_IG02: ;; offset=0x0004
test byte ptr [(reloc 0x7ffb0f9b41d8)], 1
je SHORT G_M000_IG08
G_M000_IG03: ;; offset=0x000D
cmp byte ptr [(reloc 0x7ffb0f6fb090)], 0
jne SHORT G_M000_IG06
G_M000_IG04: ;; offset=0x0016
vmovss xmm0, dword ptr [reloc @RWD00]
G_M000_IG05: ;; offset=0x001E
add rsp, 40
ret
G_M000_IG06: ;; offset=0x0023
vxorps xmm0, xmm0, xmm0
G_M000_IG07: ;; offset=0x0027
add rsp, 40
ret
G_M000_IG08: ;; offset=0x002C
mov rcx, 0x7FFB0F9B4168
call CORINFO_HELP_GET_NONGCSTATIC_BASE
jmp SHORT G_M000_IG03
RWD00 dd 3F800000h ; 1
While this one is:
; Method C:CopyWithAlignmentFallback2(byref,byref,uint):float (FullOpts)
G_M000_IG01: ;; offset=0x0000
G_M000_IG02: ;; offset=0x0000
vxorps xmm0, xmm0, xmm0
G_M000_IG03: ;; offset=0x0004
ret
; Total bytes of code: 5
Godbolt shows similar overhead, so this is not a config issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one of those things that it does when doing PGO and tiering?
Anyway, not that important.
fixed (void* src = &source, dst = &destination) | ||
Buffer.MemoryCopy(src, dst, byteCount, byteCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for cases where we are sure the memory is aligned (like those where we control the type and ensure proper layouts), you also have the option of Unsafe.CopyBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose Buffer.MemoryCopy because it's more straightforward than creating two spans and using Span.CopyTo
between them. But as you said, they map to the same method, so if span does not provide anything else compared to buffer I think that this is kind of bikeshedd-y ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While Unsafe.CopyBlock
is an intrinsic, it is not optimized for the wide array of ranges one might have when making copies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these utility methods, it's enough to use Buffer.MemoryCopy
, yes.
My point is at those call sites where we could be using a Span
. Maybe we have more of those in the future, as more of the API is converted to use/accept spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
PR Details
Calls to
Utilities.CopyMemory
mapping toSystem.Buffer.MemoryCopy
were swapped forUnsafe.CopyBlockUnaligned
in #1469 without clarification.Those two methods are not equivalent,
CopyBlockUnaligned
is safer but significantly slower compared toMemoryCopy
as it does not have any fast path based on the allocation sizes.As for the safety of the unaligned version, under x86/64 and recent ARM, reading or writing data outside of their natural alignment do not cause any issues besides slowdown. I've ensured that other platforms, like ARM7 and connected devices' memory, keep using the unaligned version.
The vast majority of our codebase ensures that fields and allocations are aligned, so using operations that expect to operate on aligned memory provides a significant performance improvement.
Scenes with a significant amount of draw calls rely on memcopy to fill in graphics commands and shader parameters.
In one of the heavier sections I have at my disposal, frames can take up to 23 millisecond to prepare on the CPU side, introducing this change cuts it down to 14 millisecond.
Here's the Profiler sessions.zip
More information in
dotnet/runtime#1650 (comment)
dotnet/runtime#80068 (comment)
Related Issue
N/A
Types of changes
Checklist