-
Notifications
You must be signed in to change notification settings - Fork 63
Automatically mark objc2-metal
as safe
#792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'm trying this out on our codebase, and stuck in a hard place between using this or #773 as a basis for a change pending other changes in the upcoming release. A few function thoughts:
EDIT: I pushed the fixes at metal-safe...Traverse-Research:objc2:metal-safe |
That's because there's now two checks: Both objc2/crates/header-translator/src/name_translation.rs Lines 449 to 462 in 9bd895f
Thanks, I pulled that in. |
Ah! If you extend that function, the half-assed exceptions from my commit can go again. It's surely not complete. |
I've looked through the documentation on Metal resources, and I think the correct choice is to mark (basically) all resource ( I have implemented this in the latest commit, please let me know what you think? |
# SAFETY: Modifying residency is safe, it's effectively the same as | ||
# controlling what's in the L1/L2/L3 cache on the CPU. | ||
# protocol.MTLResidencySet.methods.requestResidency.unsafe = false | ||
# protocol.MTLResidencySet.methods.endResidency.unsafe = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarijnS95 could you verify this?
As I understand it, residency sets are only an optimization knob? If a resource is needed, but isn't resident, it's still automatically made resident, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madsmtm sure, there are two stages to it: the methods quoted here are an optimization knob to make things resident ahead of time. Everything in a residency set attached to a command buffer (or queue before a command buffer was committed to it) will be made/kept resident when the command buffer is supposedly going to be running...
Caution
Or at least so I thought, requestResidency
is quite clear that it performs some work upfront to make commit:
faster but endResidency
directly says it makes things no longer resident which -to me- seem to defeat the advantage of attaching it to command buffers. I don't know what will happen if that's called on a MTLResidencySet
that's attached to a command buffer that's Enqueued
but not Completed
.
What also counts is what resources are in the set at any given point in time. As far as I know the state after commit:
is not buffered, so if anyone removes a resource and commits it again while a MTLCommandBuffer
is still in flight that uses said resource, things could go kaboom as well when something in the system wants to reclaim memory (by calling endResidency
?).
In the end I don't exactly understand the significance of these functions. Maybe we're asking too complicated questions, the documentation is quite terse.
I'm not confident what residency means on Unified Memory architectures (unless page files exist and are used in times of high memory pressure); perhaps buffers that "exist" in memory may not be in the MMU for the GPU (perhaps per-process)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested things a bit by modifying this example to create a MTLResidencySet
and add _terrainTextures
to that instead of using useResource:usage:
, roughly:
MTLResidencySetDescriptor *setDescriptor = [[MTLResidencySetDescriptor alloc] init];
NSError* error;
id<MTLResidencySet> residencySet = [device newResidencySetWithDescriptor:setDescriptor error:&error];
if (!residencySet) { NSLog(@"Failed to create residence set, error %@", error); }
for (int i = 0; i < _terrainTextures.size(); i++) {
[residencySet addAllocation: _terrainTextures[i].diffSpecTextureArray];
[residencySet addAllocation: _terrainTextures[i].normalTextureArray];
}
[residencySet commit];
[commandQueue addResidencySet: residencySet];
Things didn't work without doing either one of these, using both didn't change the output. Re-reading, I guess that's what is stated in the documentation, so that probably makes sense. It also matches what the Metal debugger showed.
I tried calling both endResidency
on each frame, it didn't change the result, but the CPU usage grew from ~30% to ~50%, so that seems to only be an optimization.
In any case, pretty sure we can consider residency and useResource:usage:
as safe, we can move the unsafety of not having a resource bound to setTexture:atIndex:
and similar command encoder functions?
# These affect lifetime safety, and can cause use-after-free if used incorrectly. | ||
class.MTLCommandBufferDescriptor.methods."setRetainedReferences:".unsafe = true | ||
protocol.MTLCommandQueue.methods.commandBufferWithUnretainedReferences.unsafe = true | ||
protocol.MTLIOCommandQueue.methods.commandBufferWithUnretainedReferences.unsafe = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarijnS95 pointed out in #773 (comment):
- Lifetime safety: This used to be tracked by Metal if not opted out but that already wasn't possible when manually uploading
gpuReourceID()
values, and is removed now in Metal 4.
Which is just.. Ugh! Not yet sure how we'll resolve that. Maybe all command buffer and command encoder methods need to be unsafe? Maybe it'll be fine since most of them already are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all dispatches on the GPU (shader dispatches, draws, copy commands, ...) should likely be unsafe because of resource lifetimes 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the docs for commandBufferWithUnretainedReferences
, it says:
Returns a command buffer from the command queue that doesn’t maintain strong references to resources.
So it sounds like the things that aren't retained only applies to MTLResource
and subprotocols, and that it'd still keep a reference to any MTLFence
, MTLCommandBufferHandler
, MTLSamplerState
etc. that you pass to the buffer / its encoders?
As such, I think we can again get around this by marking MTLResource
as unsafe (and probably add another safety requirement to the docs)?
I think this PR is ready now. Overall, I believe that marking I've gotten a bit lost between all our threads on this, but I think I've answered all your concerns @MarijnS95? WDYT about this PR in it's current state (if you look past all the bounds-safety stuff)? (I strongly suspect there's still more to be done here, but I'm honestly reaching my limit of the amount of unsafe code I can review, I find myself glazing over things by now - so while this PR is probably unsound in a few places, I think it's enough of an improvement that it'd be worth it to merge it now anyhow). |
This has a lot of false positives, but it's probably better than nothing.
To not consider MTLIndexType and similar as bounds-affecting.
These need to be synchronized.
And mark `commandBufferWithUnretainedReferences` as safe.
#783 relaxed our safety policy to allow us to mark things as automatically safe. This would be very useful to do in
objc2-metal
as well, which is what this PR does.There are a lot of wrinkles to that though, since Metal is fairly low-level, and there's a lot of seemingly innocuous things that can cause memory safety issues down the line, it effectively allows doing arbitrary things on another processor (the GPU). One example of this is
MTLFunction
, which amounts to anunsafe extern "C" fn()
that the GPU will gladly call with whatever arguments you specify, and thus we have to mark functions taking it as unsafe.In a similar vein, all
MTLResource
s are marked as unsafe, as accesses to these are unsynchronized, can be unretained, and need to be bound before use.MTLBuffer
is also untyped. See also the more broad questions raised in #773 (comment).For now, I'd like to postpone the discussion of bounds safety, I've used
unsafe-default-safety.not-bounds-affecting = true
to do that. It still has a lot of false positives, I have fixed some of those in this PR, but we'll have to deal with the rest later. I have opened #793 to track this.Replaces #773, #774 and #790. Closes #685. This is important for gfx-rs/wgpu#5641 and for making Metal 4 nicer to use.