-
Notifications
You must be signed in to change notification settings - Fork 494
UCT/CUDA/CUDA_COPY: Fixed sending memory allocated on user context. #10956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a3d203a to
0186c56
Compare
4a03d17 to
5c81a20
Compare
5c81a20 to
ed0779e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR fixes a critical bug in the cuda_copy transport where sending memory allocated on user-created CUDA contexts (via cuCtxCreate) would fail when no active primary device context existed. The fix introduces a two-tier context activation strategy: the transport first attempts to activate the device's primary context, but if that context is inactive (returning UCS_ERR_NO_DEVICE), it falls back to querying and pushing the context associated with the specific memory pointer using cuPointerGetAttribute. This approach preserves backward compatibility for memory on primary contexts while extending support to user-created contexts. The changes span the core cuda_copy_ep.c implementation (adding the fallback mechanism and a new context state struct) and two test files that ensure proper CUDA context lifecycle management during cleanup.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| src/uct/cuda/cuda_copy/cuda_copy_ep.c | 4/5 | Implements fallback mechanism to push user-created contexts when primary context is inactive, adds uct_cuda_copy_ep_ctx_t struct to bundle context state, and updates cleanup logic to handle both context types |
| test/gtest/uct/cuda/test_switch_cuda_device.cc | 4/5 | Refactors test to explicitly clear all primary and current contexts before creating user context, moves context destruction to cleanup override for proper resource lifecycle |
| test/gtest/uct/test_device.cc | 5/5 | Reorders cleanup to call uct_test::cleanup() before popping CUDA context, ensuring UCT resources are destroyed while context is still active |
Confidence score: 4/5
- This PR addresses a well-defined bug with a reasonable fallback approach, but introduces moderate complexity in context management logic that requires careful review
- Score reflects the added complexity in
cuda_copy_ep.cwhere the dual-path context activation (primary vs. user context) increases cognitive load and potential for edge cases, plus the test changes that aggressively clear all contexts could potentially impact other tests running in the same process - Pay close attention to
src/uct/cuda/cuda_copy/cuda_copy_ep.clines 241-252 (fallback path), lines 191-216 (memory-based context push), andtest/gtest/uct/cuda/test_switch_cuda_device.cclines 213-233 (context clearing loops) to ensure the context lifecycle is correctly managed in all scenarios
Sequence Diagram
sequenceDiagram
participant User
participant EP as cuda_copy_ep
participant Ctx as uct_cuda_copy_ctx_rsc_get
participant CUDA as CUDA Driver API
participant Primary as Primary Context
participant Memory as Memory Context
User->>EP: put_zcopy/get_zcopy (with user ctx memory)
EP->>EP: uct_cuda_copy_post_cuda_async_copy
EP->>EP: uct_cuda_copy_ep_get_ctx
EP->>EP: uct_cuda_copy_get_mem_types
EP->>Ctx: uct_cuda_copy_ctx_rsc_get(sys_dev, cuda_deviceptr)
alt sys_dev != UNKNOWN
Ctx->>CUDA: uct_cuda_base_get_cuda_device(sys_dev)
CUDA-->>Ctx: cuda_device
Ctx->>Primary: uct_cuda_copy_push_ctx(cuda_device)
alt Primary context inactive (ERR_NO_DEVICE)
Note over Ctx: Fallback for user-created context
Ctx->>Memory: uct_cuda_copy_ep_push_memory_ctx(cuda_deviceptr)
Memory->>CUDA: cuPointerGetAttribute(CONTEXT)
CUDA-->>Memory: cuda_context
Memory->>CUDA: cuCtxPushCurrent(cuda_context)
Memory-->>Ctx: cuda_context, CU_DEVICE_INVALID
else Primary context active
Primary-->>Ctx: cuda_device, NULL
end
else sys_dev == UNKNOWN
Note over Ctx: No specific device requested
Ctx->>CUDA: uct_cuda_base_ctx_get_id
alt No current context
Ctx->>Primary: uct_cuda_primary_ctx_push_first_active
Primary->>CUDA: cuDeviceGetCount & iterate devices
Primary->>CUDA: cuDevicePrimaryCtxRetain
Primary->>CUDA: cuCtxPushCurrent
Primary-->>Ctx: cuda_device
end
end
Ctx->>CUDA: uct_cuda_base_ctx_get_id
CUDA-->>Ctx: ctx_id
Ctx->>Ctx: uct_cuda_base_ctx_rsc_get(ctx_id)
Ctx-->>EP: cuda_device, cuda_context, ctx_rsc
EP->>CUDA: cuMemcpyAsync (on stream)
EP->>CUDA: cuEventRecord
EP->>EP: uct_cuda_primary_ctx_pop_and_release
EP-->>User: UCS_INPROGRESS
3 files reviewed, no comments
What?
Fixed sending memory allocated on user context.
Why?
cuda_copytransport fails to send memory allocated on the context created usingcuCtxCreate, if there is no active primary device context.