Skip to content

Conversation

Eideren
Copy link
Collaborator

@Eideren Eideren commented Sep 11, 2025

PR Details

Calls to Utilities.CopyMemory mapping to System.Buffer.MemoryCopy were swapped for Unsafe.CopyBlockUnaligned in #1469 without clarification.
Those two methods are not equivalent, CopyBlockUnaligned is safer but significantly slower compared to MemoryCopy as it does not have any fast path based on the allocation sizes.

As for the safety of the unaligned version, under x86/64 and recent ARM, reading or writing data outside of their natural alignment do not cause any issues besides slowdown. I've ensured that other platforms, like ARM7 and connected devices' memory, keep using the unaligned version.

The vast majority of our codebase ensures that fields and allocations are aligned, so using operations that expect to operate on aligned memory provides a significant performance improvement.

Scenes with a significant amount of draw calls rely on memcopy to fill in graphics commands and shader parameters.
In one of the heavier sections I have at my disposal, frames can take up to 23 millisecond to prepare on the CPU side, introducing this change cuts it down to 14 millisecond.
Here's the Profiler sessions.zip

More information in
dotnet/runtime#1650 (comment)
dotnet/runtime#80068 (comment)

Related Issue

N/A

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My change requires a change to the documentation.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I have built and run the editor to try this change out.

@Eideren Eideren merged commit ce72d51 into stride3d:master Sep 29, 2025
9 checks passed
@Eideren Eideren deleted the revert_unaligned branch September 29, 2025 12:51
{
// MUST BE A METHOD AND AGGRESSIVELY INLINED, OTHERWISE THE JIT WILL NOT ELIMINATE THE BRANCH
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsUnalignedSafe() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it is a static readonly field? Theoretically, doesn't the JIT evaluate those only once and uses them like a const later on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would think, but nope, here's disasm:

; Method C:CopyWithAlignmentFallback2(byref,byref,uint):float (FullOpts)
G_M000_IG01:                ;; offset=0x0000
       sub      rsp, 40

G_M000_IG02:                ;; offset=0x0004
       test     byte  ptr [(reloc 0x7ffb0f9b41d8)], 1
       je       SHORT G_M000_IG08

G_M000_IG03:                ;; offset=0x000D
       cmp      byte  ptr [(reloc 0x7ffb0f6fb090)], 0
       jne      SHORT G_M000_IG06

G_M000_IG04:                ;; offset=0x0016
       vmovss   xmm0, dword ptr [reloc @RWD00]

G_M000_IG05:                ;; offset=0x001E
       add      rsp, 40
       ret      

G_M000_IG06:                ;; offset=0x0023
       vxorps   xmm0, xmm0, xmm0

G_M000_IG07:                ;; offset=0x0027
       add      rsp, 40
       ret      

G_M000_IG08:                ;; offset=0x002C
       mov      rcx, 0x7FFB0F9B4168
       call     CORINFO_HELP_GET_NONGCSTATIC_BASE
       jmp      SHORT G_M000_IG03
RWD00  	dd	3F800000h		;         1

While this one is:

; Method C:CopyWithAlignmentFallback2(byref,byref,uint):float (FullOpts)
G_M000_IG01:                ;; offset=0x0000

G_M000_IG02:                ;; offset=0x0000
       vxorps   xmm0, xmm0, xmm0

G_M000_IG03:                ;; offset=0x0004
       ret      
; Total bytes of code: 5

Godbolt shows similar overhead, so this is not a config issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one of those things that it does when doing PGO and tiering?
Anyway, not that important.

Comment on lines +48 to +49
fixed (void* src = &source, dst = &destination)
Buffer.MemoryCopy(src, dst, byteCount, byteCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Span<T>.CopyTo() in these cases. It seems it does a Buffer.Memmove internally, the same as Buffer.MemoryCopy (link).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for cases where we are sure the memory is aligned (like those where we control the type and ensure proper layouts), you also have the option of Unsafe.CopyBlock

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose Buffer.MemoryCopy because it's more straightforward than creating two spans and using Span.CopyTo between them. But as you said, they map to the same method, so if span does not provide anything else compared to buffer I think that this is kind of bikeshedd-y ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While Unsafe.CopyBlock is an intrinsic, it is not optimized for the wide array of ranges one might have when making copies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these utility methods, it's enough to use Buffer.MemoryCopy, yes.
My point is at those call sites where we could be using a Span. Maybe we have more of those in the future, as more of the API is converted to use/accept spans.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants