Skip to content

Conversation

arun-chandran-edarath
Copy link
Contributor

What?

Fix for #10761

@yosefe
Copy link
Contributor

yosefe commented Aug 26, 2025

/azp run

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@yosefe
Copy link
Contributor

yosefe commented Aug 26, 2025

/azp run perf

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arun-chandran-edarath
Copy link
Contributor Author

@yosefe

seems ignoring the ARM bit and casting the difference to integer and then doing the comparison is performing better. New macro is given below

#define FIX2_UCT_MM_EP_IS_ABLE_TO_SEND(_head, _tail, _fifo_size)
ucs_likely(((int64_t)(((_head) & ~UCT_MM_IFACE_FIFO_HEAD_EVENT_ARMED) - (_tail))) < (int64_t)(_fifo_size))

The attached test.c.txt compares the performance of the macro as well as the correctness please check.

I will upload a new diff.

[
test.c.txt
](url)

@jakemoroni
Copy link

Want me to give this a try on my setup?

@arun-chandran-edarath
Copy link
Contributor Author

Want me to give this a try on my setup?
@jakemoroni
Yes, please! I would appreciate it.

@jakemoroni
Copy link

This seems to fix the issue for me 👍

It is still unclear to me how the tail can end up greater than the head without any wraparound occurring, but it does work. Thanks!

cc @hjelmn

@arun-chandran-edarath
Copy link
Contributor Author

This seems to fix the issue for me 👍

It is still unclear to me how the tail can end up greater than the head without any wraparound occurring, but it does work. Thanks!

cc @hjelmn

@jakemoroni Thank you so much for the verification.

I have explained in this reply #10761 (comment) about, how the sender's view of receiver_tail can overtake the sender's view of receiver_head (tail from receiver's view becomes bigger than head only in case of wraparound. head and tail are variables in the receiver and shared to every other rank(sender) via shared memory). Please have a look at the provided code comments.

@hjelmn
Copy link
Contributor

hjelmn commented Sep 22, 2025

Can you squish this down the minimal number of commits? I can take a look and review once that is complete.

I ask because this seems like it fixes another commit on this PR: "TEST/MM: Fix compile error"

@arun-chandran-edarath
Copy link
Contributor Author

I have squashed the commits, please take a look.

Copy link
Contributor

@hjelmn hjelmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Scanning through the code I see UCT_MM_IFACE_FIFO_HEAD_EVENT_ARMED is only ever used with the head so the original math was definitely wrong. We also tested the fix on one of our systems and it fixes the issue.

@hjelmn
Copy link
Contributor

hjelmn commented Sep 23, 2025

@yosefe Please merge.

@brminich brminich requested a review from tvegas1 September 23, 2025 18:43
* increments once every 1 ns. On this timescale, the signed63 midpoint (2^62)
* is ~4.61e18 ticks (~146 years). Over 5 years, head would advance by
* ~1.5768e17 ticks (~3.4% of that midpoint), which is far from any wraparound
* edge case.
Copy link
Contributor

@tvegas1 tvegas1 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can specify in comment that this code cannot work in case of wrap around (as it originally was). Or maybe that only case where head < tail is when head is stale and in this case we intend to return 0?

@tvegas1
Copy link
Contributor

tvegas1 commented Sep 24, 2025

To summarize my understanding for later referral.

With two writers A and B, A is slow, B is very fast:

  1. A reads head from ctl area, checks if it is able to send.
  2. In the meantime, B performs same check and acquires head (CAS, A has now a stale head), writes.
  3. The reader even has time to consume and update tail in ctl area.
  4. Step 2. and 3. can repeat.
  5. Now A is still in the checks, updates cached_tail and compares it again with its stale head , and finds head < cached_tail.

So head can indeed be observed as being below cached_tail, and the checking code must return 0 in that case. After #10485, this was no longer the case.

check_case(c);
}

} No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: add newline at end of file

@yosefe yosefe merged commit 6b2e361 into openucx:master Sep 25, 2025
141 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants