Skip to content

Conversation

zxd1997066
Copy link
Contributor

@zxd1997066 zxd1997066 commented Aug 20, 2025

This PR intends to add some ported distributed cases in torch-xpu-ops CI.

  • Add ZE_AFFINITY_MASK to ensure using Xelink.
  • Add CCL_ROOT for Xelink, this WA can be removed after oneCCL upgrade to 2021.16.2
  • Increase distributed test time limit. Currently, the test part needs about 1 hour after add ported cases.

disable_e2e
disable_ut

@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_cases branch 6 times, most recently from a498bbd to 5f55483 Compare August 22, 2025 15:28
@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_cases branch from 5f55483 to ef62eaa Compare August 26, 2025 09:45
Copy link
Contributor

@daisyden daisyden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_cases branch 2 times, most recently from 90423a8 to a6b85e2 Compare August 29, 2025 02:44
@chuanqi129
Copy link
Contributor

@zxd1997066 please rebase the PR against with latest code base

@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_cases branch 9 times, most recently from bf75fbe to bec9de4 Compare September 10, 2025 07:51
@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_cases branch 8 times, most recently from b51ef6c to 809331e Compare September 15, 2025 01:16
@zxd1997066 zxd1997066 force-pushed the xiangdong/dist_cases branch 3 times, most recently from ed537ec to ede1a41 Compare September 16, 2025 03:37
@chuanqi129 chuanqi129 added this pull request to the merge queue Sep 16, 2025
Merged via the queue into main with commit df1d7ad Sep 16, 2025
13 checks passed
@chuanqi129 chuanqi129 deleted the xiangdong/dist_cases branch September 16, 2025 08:57
mengfei25 pushed a commit that referenced this pull request Sep 17, 2025
This PR intends to add some ported distributed cases in torch-xpu-ops
CI.
- Add ZE_AFFINITY_MASK to ensure using Xelink.
- Add CCL_ROOT for Xelink, this WA can be removed after oneCCL upgrade
to 2021.16.2
- Increase distributed test time limit. Currently, the test part needs
about 1 hour after add ported cases.

disable_e2e
disable_ut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants