Skip to content

Conversation

Alexey-Rivkin
Copy link
Contributor

@Alexey-Rivkin Alexey-Rivkin commented Aug 7, 2025

What?

Add CUDA 13 support across build/CI/tests.
Skip CUDA gtests when CUDA memory type is unsupported (patch by @iyastreb).

Why?

Issue #10787

How?

  • Bump to CUDA 13 in CI configs.
  • Build new release images.
  • Verified in PR, Performance, and Release pipelines (no actual release pushed).

@Alexey-Rivkin
Copy link
Contributor Author

/azp run perf

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Alexey-Rivkin
Copy link
Contributor Author

UCX test failures with CUDA 13

Tests / AddressSanitizer

test_switch_cuda_device.cc:584: Failure
Log link

Go

no active cuda primary context for memory allocation
Log link

Performance test

Segmentation fault: invalid permissions for mapped object
Log link

@Alexey-Rivkin
Copy link
Contributor Author

/azp run perf

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Alexey-Rivkin Alexey-Rivkin marked this pull request as draft August 13, 2025 11:11
iyastreb
iyastreb previously approved these changes Aug 13, 2025
@dpressle
Copy link

/azp run perf

Copy link

Commenter does not have sufficient privileges for PR 10788 in repo openucx/ucx

@yosefe
Copy link
Contributor

yosefe commented Aug 18, 2025

/azp run perf

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MrBr-github
Copy link

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MrBr-github
Copy link

/azp help

Copy link

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@MrBr-github
Copy link

/azp run perf

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MrBr-github
Copy link

/azp run perf

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MrBr-github
Copy link

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Alexey-Rivkin Alexey-Rivkin force-pushed the cuda_13 branch 3 times, most recently from 652f7d0 to 7edf8a1 Compare September 2, 2025 09:27
@Alexey-Rivkin
Copy link
Contributor Author

/azp run perf

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Why:
try_load_cuda_env now sets have_cuda=/usr/local/cuda if local CUDA is found
load_cuda_env previously required have_cuda=="yes", causing false failures
Fail only when have_cuda=="no", preserving existing module-load behavior and fixing test with local CUDA

Signed-off-by: Alexey Rivkin <[email protected]>
@Alexey-Rivkin Alexey-Rivkin marked this pull request as ready for review September 16, 2025 18:28
Alexey-Rivkin added a commit to Alexey-Rivkin/ucx that referenced this pull request Sep 16, 2025
(ported from PR openucx#10788)

Signed-off-by: Alexey Rivkin <[email protected]>
Signed-off-by: Alexey Rivkin <[email protected]>
Alexey-Rivkin added a commit to Alexey-Rivkin/ucx that referenced this pull request Sep 17, 2025
(ported from PR openucx#10788)

Signed-off-by: Alexey Rivkin <[email protected]>
@yosefe yosefe enabled auto-merge (squash) September 17, 2025 12:34
Alexey-Rivkin added a commit to Alexey-Rivkin/ucx that referenced this pull request Sep 17, 2025
(ported from PR openucx#10788)

Signed-off-by: Alexey Rivkin <[email protected]>
@yosefe yosefe merged commit 29831d3 into openucx:master Sep 17, 2025
191 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants