Skip to content

Conversation

@rakhmets
Copy link
Contributor

What?

Fixed sending memory allocated on user context.

Why?

cuda_copy transport fails to send memory allocated on the context created using cuCtxCreate, if there is no active primary device context.

@rakhmets rakhmets force-pushed the topic/cuda-copy-ctx-create-fix branch from a3d203a to 0186c56 Compare October 16, 2025 19:02
@rakhmets rakhmets marked this pull request as ready for review October 16, 2025 19:03
@rakhmets rakhmets added the WIP-DNM Work in progress / Do not review label Oct 17, 2025
@rakhmets rakhmets force-pushed the topic/cuda-copy-ctx-create-fix branch 7 times, most recently from 4a03d17 to 5c81a20 Compare October 17, 2025 15:49
@rakhmets rakhmets force-pushed the topic/cuda-copy-ctx-create-fix branch from 5c81a20 to ed0779e Compare October 17, 2025 16:00
@rakhmets rakhmets removed the WIP-DNM Work in progress / Do not review label Oct 20, 2025
@rakhmets rakhmets requested a review from brminich October 22, 2025 09:25
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes a critical bug in the cuda_copy transport where sending memory allocated on user-created CUDA contexts (via cuCtxCreate) would fail when no active primary device context existed. The fix introduces a two-tier context activation strategy: the transport first attempts to activate the device's primary context, but if that context is inactive (returning UCS_ERR_NO_DEVICE), it falls back to querying and pushing the context associated with the specific memory pointer using cuPointerGetAttribute. This approach preserves backward compatibility for memory on primary contexts while extending support to user-created contexts. The changes span the core cuda_copy_ep.c implementation (adding the fallback mechanism and a new context state struct) and two test files that ensure proper CUDA context lifecycle management during cleanup.

Important Files Changed

Filename Score Overview
src/uct/cuda/cuda_copy/cuda_copy_ep.c 4/5 Implements fallback mechanism to push user-created contexts when primary context is inactive, adds uct_cuda_copy_ep_ctx_t struct to bundle context state, and updates cleanup logic to handle both context types
test/gtest/uct/cuda/test_switch_cuda_device.cc 4/5 Refactors test to explicitly clear all primary and current contexts before creating user context, moves context destruction to cleanup override for proper resource lifecycle
test/gtest/uct/test_device.cc 5/5 Reorders cleanup to call uct_test::cleanup() before popping CUDA context, ensuring UCT resources are destroyed while context is still active

Confidence score: 4/5

  • This PR addresses a well-defined bug with a reasonable fallback approach, but introduces moderate complexity in context management logic that requires careful review
  • Score reflects the added complexity in cuda_copy_ep.c where the dual-path context activation (primary vs. user context) increases cognitive load and potential for edge cases, plus the test changes that aggressively clear all contexts could potentially impact other tests running in the same process
  • Pay close attention to src/uct/cuda/cuda_copy/cuda_copy_ep.c lines 241-252 (fallback path), lines 191-216 (memory-based context push), and test/gtest/uct/cuda/test_switch_cuda_device.cc lines 213-233 (context clearing loops) to ensure the context lifecycle is correctly managed in all scenarios

Sequence Diagram

sequenceDiagram
    participant User
    participant EP as cuda_copy_ep
    participant Ctx as uct_cuda_copy_ctx_rsc_get
    participant CUDA as CUDA Driver API
    participant Primary as Primary Context
    participant Memory as Memory Context

    User->>EP: put_zcopy/get_zcopy (with user ctx memory)
    EP->>EP: uct_cuda_copy_post_cuda_async_copy
    EP->>EP: uct_cuda_copy_ep_get_ctx
    EP->>EP: uct_cuda_copy_get_mem_types
    EP->>Ctx: uct_cuda_copy_ctx_rsc_get(sys_dev, cuda_deviceptr)
    
    alt sys_dev != UNKNOWN
        Ctx->>CUDA: uct_cuda_base_get_cuda_device(sys_dev)
        CUDA-->>Ctx: cuda_device
        Ctx->>Primary: uct_cuda_copy_push_ctx(cuda_device)
        
        alt Primary context inactive (ERR_NO_DEVICE)
            Note over Ctx: Fallback for user-created context
            Ctx->>Memory: uct_cuda_copy_ep_push_memory_ctx(cuda_deviceptr)
            Memory->>CUDA: cuPointerGetAttribute(CONTEXT)
            CUDA-->>Memory: cuda_context
            Memory->>CUDA: cuCtxPushCurrent(cuda_context)
            Memory-->>Ctx: cuda_context, CU_DEVICE_INVALID
        else Primary context active
            Primary-->>Ctx: cuda_device, NULL
        end
    else sys_dev == UNKNOWN
        Note over Ctx: No specific device requested
        Ctx->>CUDA: uct_cuda_base_ctx_get_id
        alt No current context
            Ctx->>Primary: uct_cuda_primary_ctx_push_first_active
            Primary->>CUDA: cuDeviceGetCount & iterate devices
            Primary->>CUDA: cuDevicePrimaryCtxRetain
            Primary->>CUDA: cuCtxPushCurrent
            Primary-->>Ctx: cuda_device
        end
    end
    
    Ctx->>CUDA: uct_cuda_base_ctx_get_id
    CUDA-->>Ctx: ctx_id
    Ctx->>Ctx: uct_cuda_base_ctx_rsc_get(ctx_id)
    Ctx-->>EP: cuda_device, cuda_context, ctx_rsc
    
    EP->>CUDA: cuMemcpyAsync (on stream)
    EP->>CUDA: cuEventRecord
    EP->>EP: uct_cuda_primary_ctx_pop_and_release
    EP-->>User: UCS_INPROGRESS
Loading

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@rakhmets rakhmets requested a review from brminich November 6, 2025 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants