You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a follow-up to #1867 , this PR includes tests for the FlightRecorder
on XCCL, as well as moving some definitions from
ProcessGroupXCCL::Options to Backend::Options.
These tests are largely based on
`pytorch/test/distributed/test_c10d_nccl.py`, but doesn't include some
tests:
- `test_short_json` since json dumps are not supported in
ProcessGroupXCCL
- `test_trace_while_all_works_retired`: `_wait_for_pending_works` isn't
supported by XCCL
- `test_trace_while_active`: XCCL hangs when op is called on only one
rank
- `test_trace_while_stuck`: XCCL hangs when op is called on only one
rank
---------
Co-authored-by: Yu, Guangye <[email protected]>
0 commit comments