Skip to content

Conversation

@akashveramd
Copy link
Collaborator

Addressing following issues in this PR-

  • Running Torchtitan ROCm workflow on cron schedule & only when push to Main branch. CUDA workflow will run as is.
  • Refactor Torchtitan test run to address older PR comment Enable ROCm CI support #1786 (comment)

@akashveramd akashveramd self-assigned this Nov 11, 2025
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 11, 2025
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I guess we can't verify ROCm continues to work because this PR removes the per-PR test. Let's merge & wait to see if things are OK.

Context: We are making ROCm test to run only for (1) merging commits to main, and (2) periodically testing every 6 hours, but stopping it for (3) PR submission/updates. The reasons are

  1. ROCm CI resource has limited capacity #1786 (comment)
  2. For some permission reasons, PRs from outside repo / contributors without write access can't run ROCm tests (whereas they can run CUDA).

@tianyu-l tianyu-l merged commit 2f9b44d into main Nov 12, 2025
6 of 7 checks passed
@tianyu-l tianyu-l deleted the av_rocm_change_cron_main branch November 12, 2025 01:10
@akashveramd
Copy link
Collaborator Author

akashveramd commented Nov 12, 2025

@tianyu-l: There was some ongoing work on .github/workflows/integration_test_8gpu_features.yaml to run ROCm workflow on condition. Because currently the workflow uses 'matrix' at a wrong location, which is an error and due to which the workflow didn't ran for both CUDA & ROCm. Once I fix that, the workflow will run on condition. We'll have to undo the merge for now.

@tianyu-l
Copy link
Contributor

@akashveramd hmm OK, I thought the cuda test passed, so I merged it. Could you help submit a revert PR so I can stamp? Thanks!

tianyu-l added a commit that referenced this pull request Nov 12, 2025
…& push to Main branch only" (#2017)

Reverts PR: #2016
Addressing following issues in this PR-
- Running Torchtitan ROCm workflow on cron schedule & only when push to
Main branch. CUDA workflow will run as is.
- Refactor Torchtitan test run to address older PR comment
#1786 (comment)

Co-authored-by: tianyu-l <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm CLA Signed This label is managed by the Meta Open Source bot. module: rocm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants