MCTP Bridge Polling #85

faizana-nvidia · 2025-07-02T18:58:11Z

MCTP Bridge Endpoint poll

Reference : DSP0236 section 8.17.6 Reclaiming EIDs from hot-plug devices

The bus owner/bridge can detect a removed device or devices by validating the EIDs that are presently allocated to endpoints that are directly on the bus and identifying which EIDs are missing. It can do this by attempting to access each endpoint that the bridge has listed in its routing table as being a device that is directly on the particular bus. Attempting to access each endpoint can be accomplished by issuing the Get Endpoint ID command.

This PR aims to extend mctpd as bus owner to enhance bridge support feature by incorporating downstream endpoint polling.
One way to detect the accessibility of its downstream devices is for bus owner/bridge to probes on each bridge's downstream endpoint using the Get Endpoint ID command.

Once its established that endpoint path is accessible, we would stop polling to that eid and publish the eid endpoint object

This PR is based discussions over MCTP Bridge Support : #71

jk-ozlabs · 2025-07-03T01:38:11Z

Bridges allocate EIDs for downstream endpoints but don't create peer structures until endpoints are actively discovered

Why not just store the allocated pool range on the bridge peer? This would be a considerably simpler change, and means the seaprate net->resv_eid_set doesn't get out-of-sync with the peer data.

faizana-nvidia · 2025-07-04T05:45:35Z

Bridges allocate EIDs for downstream endpoints but don't create peer structures until endpoints are actively discovered

Why not just store the allocated pool range on the bridge peer? This would be a considerably simpler change, and means the seaprate net->resv_eid_set doesn't get out-of-sync with the peer data.

For now we are storing the start of pool and pool size (in some sense it represents allocated pool EID range), would this requirement still needed?

jk-ozlabs · 2025-07-04T07:30:06Z

For now we are storing the start of pool and pool size (in some sense it represents allocated pool EID range), would this requirement still needed?

Not sure I understand the question - if the struct peer stores the allocated pool range, then we wouldn't need to store them elsewhere.

In your eccc13f change, the peer->pool_start and peer->pool_size already soft-of represent this, but you don't have a clear indication of whether those represent an actual allocation, or a "proposed" pool. If you can clear up that, then you can just use those fields to represent the allocation data directly.

faizana-nvidia · 2025-07-04T11:15:29Z

For now we are storing the start of pool and pool size (in some sense it represents allocated pool EID range), would this requirement still needed?

Not sure I understand the question - if the struct peer stores the allocated pool range, then we wouldn't need to store them elsewhere.

In your eccc13f change, the peer->pool_start and peer->pool_size already soft-of represent this, but you don't have a clear indication of whether those represent an actual allocation, or a "proposed" pool. If you can clear up that, then you can just use those fields to represent the allocation data directly.

I see, I think I understand what your ask is here, peer->pool_start and peer->pool_size should represent "allocated" information (coming as response to ALLOCATE_EID command whatever bridge's final say is). We are initially storing the "proposed" pool start information and minimum of "proposed" pool size and actual pool_size (response from SET ENDPOINT ID) over these two vars and request is sent for ALLOCATION based on the final proposed data, but once we get a response, whatever has been allocated by bridge, we are keeping it as pool_start and pool_size inside peer. Then later marking this range as reserved in the new reserve_eid_set per network.

But I see your point now, our bridge could have any pool_size as actual size coming in response to SET_ENDPOINT_ID but the proposed size could differ and so could finally allocated size too(response of ALLOCATE_ENDPOINT_ID)

May be we can introduce a new var representing actual_pool_size (response of SET_ENDPOINT ID) and call out these two vars, peer->pool_start and peer->pool_size as allocated data while network reserved_eid_set continues to interact with these vars as it has been doing.

faizana-nvidia · 2025-07-04T11:42:16Z

I also have added a TODO comment here todo

in our current implementation of Busowner pool and local eid assignment, we don't have proper check to see, if localEID which is being set via mctp tool (mctp addr add <eid> dev <interface>) for any network, has already been used by Busowner when it assigned dynamic/static EID to any MCTP device/endpoint.

Reference: I have two interfaces mctpusb0 and mctpspi0_2. I try to add another local EID to mctspi0_2 (EID 20) . Here EID 20 was already used up via Bridge Downstream EID under mctpusb0 thus the error message, I still see mctp link map retaining the eid 20 even though we had error in setting local eid (not seen in mctp tool but mctpd) Not sure if this was intentionally done?

mctpd[22871]: Adding local eid 20 net 1
mctpd[22871]: mctpd: Local eid 20 net 1 already exists?  <-----------
mctpd[22871]: mctpd: Error handling netlink update         <-----------
mctpd[22871]: 1 changes:
mctpd[22871]:   0 ADD_EID      ifindex   4 (mctpspi0_2          ) eid  20 old_net    0 old_up 5
mctpd[22871]: linkmap
mctpd[22871]:    1: lo, net 1 up local addrs []
mctpd[22871]:    4: mctpspi0_2, net 1 up local addrs [12, 20]
mctpd[22871]:    7: mctpusb0, net 1 up local addrs [8]

$mctp addr show
eid 12 net 1 dev mctpspi0_2
eid 20 net 1 dev mctpspi0_2
eid 8 net 1 dev mctpusb0

Since we are getting into the reservation of EIDs, our new localEID could very well be from the reserved EID space. So should we introduce restriction of such manner in mctp tool/mctpd?

jk-ozlabs · 2025-07-07T01:12:30Z

May be we can introduce a new var representing actual_pool_size (response of SET_ENDPOINT ID) and call out these two vars, peer->pool_start and peer->pool_size as allocated data while network reserved_eid_set continues to interact with these vars as it has been doing.

Yes, that's along the lines of what I was thinking. But maybe with the names adjusted: pool_start & pool_size would represent actually-allocated values, req_pool_size and req_pool_start for the (temporarily-valid) request details.

And then as an improvement: since the req_ values are transient, is there any better way to pass them from the Set Endpoint ID response handler to the Allocate Endpoint IDs command? (ie., rather than setting transient values on the struct peer)

I also have added a TODO comment here

There's not much we can do about that case; mctpd isn't (currently) managing that local EID, so there is not much we can do on conflicts - as you have noted.

So should we introduce restriction of such manner in mctp tool/mctpd?

No, not while mctpd is not responsible for assigning local EIDs.

I see the best way to prevent this case is a facility to explicitly configure mctpd with the range of dynamically-allocatable EIDs, and it's up to the platform integrator to ensure that that range does not conflict with the potential set of local EIDs.

(that's one of the reasons for the configuration file support, for exactly this use-case)

faizana-nvidia · 2025-07-07T19:45:01Z

And then as an improvement: since the req_ values are transient, is there any better way to pass them from the Set Endpoint ID response handler to the Allocate Endpoint IDs command? (ie., rather than setting transient values on the struct peer)

To add further onto this, we don't know how many birdges on what bindings would turn up on network. Predicting their numbers would rather be tedious, so would be passing each pool size to their respective Allocate EndpointID. So in worse case scenario assuming 255 bridges per network(similar to peer numbers) we can have a separate array structure representing bridge data/pool_size which would be filled right after SET_ENDPOINT_ID.
One another way (though not tested) is somehow find the capacity of bridge just before asking ALLOCATE_ENDPOINT_ID. Unfortunately we need to know this because we want to avoid asking endpoints number in request which would be more that Bridge's dynamic pool size.

Table 23 – Allocate Endpoint IDs message

An error completion code (ERROR_INVALID_DATA should be returned) shall be returned if the number of EIDs being allocated (Number of Endpoint IDs) exceeds the Dynamic Endpoint ID Pool size. (This error condition does not apply to when the number of endpoint IDs passed in the request is 0x00).

We can also think this response of pool_size in SET_EDNPOINT_ID (transient) to be the capacity of bridge while the allocated pool size and start representing allocated values

I also have added a TODO comment here

There's not much we can do about that case; mctpd isn't (currently) managing that local EID, so there is not much we can do on conflicts - as you have noted.

So should we introduce restriction of such manner in mctp tool/mctpd?

No, not while mctpd is not responsible for assigning local EIDs.

I see the best way to prevent this case is a facility to explicitly configure mctpd with the range of dynamically-allocatable EIDs, and it's up to the platform integrator to ensure that that range does not conflict with the potential set of local EIDs.

(that's one of the reasons for the configuration file support, for exactly this use-case)

Acknowledged.

jk-ozlabs · 2025-07-08T00:55:04Z

we can have a separate array structure representing bridge data/pool_size which would be filled right after SET_ENDPOINT_ID.

If there is no way to better handle the transient requested peer size, I would prefer to just store it on the struct peer.

One another way (though not tested) is somehow find the capacity of bridge just before asking ALLOCATE_ENDPOINT_ID

We have the capacity of the brige, from the Set Endpoint ID response, no? As you say:

We can also think this response of pool_size in SET_EDNPOINT_ID (transient) to be the capacity of bridge while the allocated pool size and start representing allocated values

I would suggest:

record the requested pool size from the Set Endpoint ID response
attempt to allocate that requested pool size (pending any configured limits), by querying the struct peer array
record the allocated pool range on that peer
perform Allocate Endpoint IDs from that allocated pool

If the allocation is not possible in (2), just fail for now. We can look at fallbacks in a follow-up.

I think this it fine to do without a new dbus call, but am open to ideas about why we may need one.

faizana-nvidia · 2025-07-09T06:05:59Z

we can have a separate array structure representing bridge data/pool_size which would be filled right after SET_ENDPOINT_ID.

If there is no way to better handle the transient requested peer size, I would prefer to just store it on the struct peer.

One another way (though not tested) is somehow find the capacity of bridge just before asking ALLOCATE_ENDPOINT_ID

We have the capacity of the brige, from the Set Endpoint ID response, no? As you say:

We can also think this response of pool_size in SET_EDNPOINT_ID (transient) to be the capacity of bridge while the allocated pool size and start representing allocated values

I would suggest:

record the requested pool size from the Set Endpoint ID response

attempt to allocate that requested pool size (pending any configured limits), by querying the struct peer array

record the allocated pool range on that peer

perform Allocate Endpoint IDs from that allocated pool

If the allocation is not possible in (2), just fail for now. We can look at fallbacks in a follow-up.

I think this it fine to do without a new dbus call, but am open to ideas about why we may need one.

Please let me know if I understood you correctly on steps

Fetch out the transient pool size (via SET_ENDPOINT_ID).
Try to allocate (find the pool sized chunked of peer structure which are available to be later used by downstream EID via get_pool_start())
If found then update the peer structure with req_pool_size = capacity and req_pool_start = get_pool_start()
Also mark all such eids to be reserved
From dbus api AssignBridgeStatic get pool_size and try to allocate that size with req_pool_start and whatever is coming as response from ALLOCATE_EID command update peer structure with alloc_pool_size = response(pool_size) and alloc_pool_start=response(pool_start).

Introduces req_pool_size and req_pool_start, updates pool_size -> alloc_pool_size and pool_start ->alloc_pool_start.

If we are going by this then do we have to get pool_start argument from dbus method AssignBridgeStatic if we have already decided what pool start will be from req_pool_start?

Or do you mean not to use any new dbus method AssignBridgeStatic at all?

I think this it fine to do without a new dbus call, but am open to ideas about why we may need one.

jk-ozlabs · 2025-07-09T08:54:51Z

Since the allocation mechanism is now in #71, I'll shift this discussion to there.

faizana-nvidia · 2025-07-23T02:49:03Z

Hello Jeremy,
I've rebased the PR and updated the logic based on other PR 71 comments. Please help me with the review.

jk-ozlabs · 2025-07-23T02:51:00Z

I'm going to hold off on reviewing this one until we have the core bridge support done, given that's a hard dependency for this.

faizana-nvidia · 2025-07-23T02:52:32Z

I'm going to hold off on reviewing this one until we have the core bridge support done, given that's a hard dependency for this.

understood, sounds good

faizana-nvidia · 2025-08-15T18:24:10Z

Hello Jeremy,
I have some questions on discussed approach of Polling (discussion) mechanism for MCTP Bridge downstream endpoint management.
To give you a better picture on MCTP Network, we have MCU as MCTP Bridge(over USB) managing GPUs at its downstream

graph LR
    A["HMC/BMC"] --> B["MCTP Bridge"]
    B --> C["GPU1"]
    B --> D["GPU2"]

I'm aware that in the (discussed) polling approach, mctpd doesn't care about state of devices behind the bridge once they have been discovered.
In our platform many of our applications such as Nvidia NSM (used for GPU Telemetry), relies directly on Event Subscription model or something similar which creates a requirement to explicitly do a re-subscribe to endpoints again when they undergoing reset. Owing to this and among other similar reasons, this current PR (MCTP Bridge Polling )proposed for keeping the polling active throughout the existence of MCTP Bridge itself and inform to other d-bus endpoint object monitoring applications about case where MCTP device has fallen off the network. This would greatly help these applications to initiate their initial discovery sequence as early/quickly as possible without losing out valuable telemetry data and critical events since they would be made aware of such reset via MCTPD.

If this seems like a legit reason to you for which we can think of keeping continuous polling, I have further question on implementation

The particular use case which might not get solved even with Polling is fast reset of downstream endpoints such as GPUs themselves which happens in time-window of couple of milliseconds (100-200 msec) since polling window is not be narrow enough to capture such fast reset and from MCTP perspective endpoint never gotten off the MCTP network. In such case NSM would not even get notified about this fast-reset of GPU.

Need some help with figuring out how do we expect endpoint to notify or BusOwner to become aware of such fast reset use case.

I'm open to any further guidance/opinion on Polling mechanism itself, or should MCTP even care about this etc. Or may be help me understand this better on the philosophy of mctpd not caring about state of device downstream to bridge or direct-peer itself.

jk-ozlabs · 2025-08-18T01:29:40Z

I'm open to any further guidance/opinion on Polling mechanism itself, or should MCTP even care about this etc. Or may be help me understand this better on the philosophy of mctpd not caring about state of device downstream to bridge or direct-peer itself.

The main design point here is that mctpd manages the MCTP transport itself, so is only concerned with the state of connectivity to managed MCTP devices. It does not manage the general state of those devices, like your scenario above.

For example, in the fast-reset case: nothing has changed in mctpd's view of the network. The device has been reset, but it sounds like it has restored some MCTP state, in that it is still using its assigned EID - so no connectivity has been changed.

(or does a reset clear the EID? in which case the reset would be visible to mctpd?)

From what you have mentioned though, the issue there seems to be that the device is retaining some state (the EID) but not other state (the event subscriptions).

We do not perform ongoing polling of the MCTP devices, as we cede that responsibility to upper-level applications once connectivity has been established. If those applications detect a loss of connectivity, they call Recover to restore mctpd's responsibility to check (and potentially re-establish) connectivity.

This means that we do not detect endpoints that may disappear but have no upper-level applications interacting with them. This is a deliberate decision - if nothing is communicating with them, do we care that we have connectivity?

For the reset-detection in general though, some upper-layer protocols have mechanisms to detect that, which would typically re-trigger some form of setup.

faizana-nvidia · 2025-08-19T12:47:39Z

Thank you for the clarification, your words do make sense to me.

(or does a reset clear the EID? in which case the reset would be visible to mctpd?)

for our platform devices, yes it does but it's given back same EID by the Bridge, meaning once device is out of reset it responds to next MCTP command (GET_ENDPOINT_ID).

Reset would be visible to mctpdif any application triggers a Recover call right?

For the reset-detection in general though, some upper-layer protocols have mechanisms to detect that, which would typically re-trigger some form of setup.

Well this is the problem I'm currently facing, Nvidia NSM doesn't have a direct mechanism of checking reset that too which is happening that quick (100msec). Even if they do something like polling to keep in check of state of device this too would be not that frequent to find such quick reset.

Had this been a direct endpoint (non Bridge) to BMC, would MCTP Discovery Notify command be the right way for endpoint to rely on to instate its reset, as in that case eid would be reset for the endpoint and it would be freshly added endpoint to MCTP network? If so, is there any plan to include this support on MCTPD? From your previous comments and other issue discussion I suppose your take is different when it comes to Discovery Notify and when it is to be used.

jk-ozlabs · 2025-08-25T08:48:08Z

for our platform devices, yes it does but it's given back same EID by the Bridge, meaning once device is out of reset it responds to next MCTP command (GET_ENDPOINT_ID).

Ah, so there's no direct connection, I see.

Had this been a direct endpoint (non Bridge) to BMC, would MCTP Discovery Notify command be the right way for endpoint to rely on to instate its reset, as in that case eid would be reset for the endpoint and it would be freshly added endpoint to MCTP network?

No, the discovery notify command is only between the endpoint and its direct bus owner. From DSP 0236 § 12.15 "Discovery Notify":

This message should only be sent from endpoints to the bus owner for the bus that the endpoint is on

To me, the spec reads as if the intention for Discovery Notify is to provide a bus-hotplug mechanism when the physical bus does not already provide it.

Even without the direct connection, it sounds like you're attempting to use the MCTP control protocol to communicate upper-layer protocol state. I would think that that device state (ie, the event subscriptions and/or reset condition) really belong as part of that upper-layer protocol.

For example, NVMe-MI has facilities to indicate that a subsystem has been reset (see the "NVM Subsystem Reset Occurred" event), perhaps you could do something similar here?

[I am not familiar with NSM though]

faizana-nvidia · 2025-08-25T19:00:48Z

for our platform devices, yes it does but it's given back same EID by the Bridge, meaning once device is out of reset it responds to next MCTP command (GET_ENDPOINT_ID).

Ah, so there's no direct connection, I see.

Had this been a direct endpoint (non Bridge) to BMC, would MCTP Discovery Notify command be the right way for endpoint to rely on to instate its reset, as in that case eid would be reset for the endpoint and it would be freshly added endpoint to MCTP network?

No, the discovery notify command is only between the endpoint and its direct bus owner. From DSP 0236 § 12.15 "Discovery Notify":

This message should only be sent from endpoints to the bus owner for the bus that the endpoint is on

To me, the spec reads as if the intention for Discovery Notify is to provide a bus-hotplug mechanism when the physical bus does not already provide it.

Even without the direct connection, it sounds like you're attempting to use the MCTP control protocol to communicate upper-layer protocol state. I would think that that device state (ie, the event subscriptions and/or reset condition) really belong as part of that upper-layer protocol.

For example, NVMe-MI has facilities to indicate that a subsystem has been reset (see the "NVM Subsystem Reset Occurred" event), perhaps you could do something similar here?

[I am not familiar with NSM though]

Thank you for clarification (I agree with you, may be there lies some other way to handle this use-case, I'll go over NVMe-MI implementation and NVM Subsystem Reset Occurred event).

I'll update the PR with rebase and some more modification. Please hold on the review till then.

faizana-nvidia · 2025-08-28T19:41:10Z

Hello Jeremy,

Updated the PR with last discussion on stopping of polling. It's up for review now.

Sorry to be reluctant on this, but is there any plan to support Discovery Notify for direct peer to BMC as bus owner specifically for some binding where we don't explicitly have methods like I2C bus detection to identify endpoint presence?

jk-ozlabs · 2025-08-29T00:49:20Z

Thanks for the updates.

You now seem to have a bunch of fixes/changes to the bridging implementation, which don't belong in the polling feature addition.

Can you separate those into a different PR please?

faizana-nvidia · 2025-08-29T06:49:53Z

Thanks for the updates.

You now seem to have a bunch of fixes/changes to the bridging implementation, which don't belong in the polling feature addition.

Can you separate those into a different PR please?

Created a new pr for tracking any other fixes needed for bridge support.
#110

But keeping these commits in this PR too for addressing to test cases, will rebase once other PR gets approved

faizana-nvidia · 2025-09-10T08:20:23Z

rebased the PR

jk-ozlabs

Mostly looks good, but a few comments inline.

Also, I would suggest we have a way to disable polling, for deployments that may be controlling the endpoint presence manually. Perhaps by setting the poll time to zero?

jk-ozlabs · 2025-09-17T10:00:58Z

src/mctpd.c

+					 peer_tostr_short(peer), iid,
+					 MCTP_CTRL_CMD_GET_ENDPOINT_ID);
+	if (rc)
+		goto out;


minor, but flipping this might be a little more clear, as you would no longer need the goto path

if (!rc) { *resp_eid = ... } free(buf); return rc;

Ack, yup could do that.

jk-ozlabs · 2025-09-17T10:01:27Z

src/mctpd.c

+
+	struct {
+		sd_event_source **sources;
+	} poll;


This is a bit of an ambiguous name...

hmm, I tried to keep the name which could mean what this structure gonna be get involved into as in bridge endpoint poll, similar to recovery structure above.. Could you suggest me better one please?

jk-ozlabs · 2025-09-17T10:04:43Z

src/mctpd.c

+	if (rc < 0)
+		goto reschedule;
+
+	goto exit;


please only use gotos for a linear cleanup path. this is getting too spaghetti-like with the branches back to earlier points in the function

I can improve this.

jk-ozlabs · 2025-09-17T10:05:59Z

src/mctpd.c

+
+	if (!bridge) {
+		warnx("Bridge (eid %d) removed?", bridge->eid);
+		goto exit;


No need to use the cleanup path (of exit) if there is nothing to clean up

jk-ozlabs · 2025-09-17T10:07:36Z

src/mctpd.c

+	int i;
+	ctx = bridge->ctx;
+	sd_event_source **sources =
+		calloc(pool_size, sizeof(sd_event_source *));


Minor, but this is a little messy. Please make the declarations roughly reverse-christmas-tree format (where possible, but feel free to not do so if actual ordering is necessary).

Then, separate the declarations from code, by splitting that ctx assignment & calloc out into the function body.

understood, will update

jk-ozlabs · 2025-09-17T10:11:29Z

src/mctpd.c

+		pctx->poll_eid = pool_start + i;
+		rc = sd_event_add_time_relative(
+			ctx->event, &bridge->poll.sources[i], CLOCK_MONOTONIC,
+			ctx->endpoint_poll, 0, peer_endpoint_poll, pctx);


Where does this poll source (and pctx) get freed, in cases where the poll never completes?

jk-ozlabs · 2025-09-17T10:13:53Z

src/mctpd.c

+	goto exit;
+
+reschedule:
+	if (!bridge) {


how is this possible?

right, should not be here, above check is enough, will correct this.

jk-ozlabs · 2025-09-17T10:15:28Z

src/mctpd.c

 		sd_event_source_unref(peer->recovery.source);
 	}

+	if (peer->pool_size) {


Ah, here's where you're freeing it.

Just merge these two changes together. It does not make sense including the first but not the second.

ACK, I should remove pctx too here

jk-ozlabs · 2025-09-17T10:16:31Z

src/mctpd.c

+	n = lookup_net(bridge->ctx, bridge->net);
+	pool_start = bridge->pool_start;
+	pool_end = bridge->pool_start + bridge->pool_size - 1;
+	for (ep = pool_end; ep >= pool_start; ep--) {


Any reason for doing this backwards?

It was a miss judgement from my end, as the earlier implementation was removing the peers while shifting the peers array to up front their indices which would have caused missing of removal of some peer among the range [pool start - pool end] but I see that we are now directly getting index to eid mapping via net structure thus locating individual peer is simpler and would not see the above stated issue anymore, so I suppose I could reorder this from start to end.

faizana-nvidia · 2025-10-03T18:59:17Z

Sorry for the delay in addressing the review comments. I'll push the changes soon, meanwhile could you help me with my responses

faizana-nvidia

I found one problem here, looks like kernel is not deleting the routes by itself for gateway routes. And we would avoid deleting routes if have_route flag is not set.

Should we consider removing from mctpd for now?

static int del_interface(struct link *link)
{
struct ctx *ctx = link->ctx;
int ifindex = link->ifindex;
if (ctx->verbose) {
fprintf(stderr, "Deleting interface #%d\n", ifindex);
}
for (size_t i = 0; i < ctx->num_peers;) {
struct peer *p = ctx->peers[i];
if (p->state == REMOTE && p->phys.ifindex == ifindex) {
int rc;
// Linux removes routes to deleted links, so no need
// to request removal.
p->have_neigh = false;
p->have_route = false;

faizana-nvidia · 2025-10-13T15:37:29Z

Sorry for the delay in addressing the review comments. I'll push the changes soon, meanwhile could you help me with my responses

Addressed the comments now, open for review.

Refering to DSP0236 sec 8.17.6 Reclaiming EIDs from hot-plug devices Requirement: The bus owner/bridge needs to periodically poll GET_ENDPOINT_ID control command to check if downstream endpoint is accessible. Once it's established that endpoin is accessible, polling needs to stop. Introduce new endpoint polling configuration to be used as periodic interval for sending poll command GET_ENDPOINT_ID by bus owner/bridge. Disable bridge poll via setting poll time as zero. Signed-off-by: Faizan Ali <[email protected]>

Since downstream endpoints are behind the bridge, available query_get_endpoint_id() won't be able to redirect the GET_ENDPOINT_ID command packet behind the bridge physical address of downstream endpoint would be same as bridge's own. But we can do direct query to endpoint since routes have already been layed out when bridge was allocated the eid pool space. Signed-off-by: Faizan Ali <[email protected]>

Implement endpoint periodic polling mechanism to validate bridged endpoint accessiblity. Begin polling as soon as gateway routes are created. Stop polling once it's established that endpoint path is accessible. Publish peer path once downstream endpoint responds to send poll command. Signed-off-by: Faizan Ali <[email protected]>

Remove all downstream endpoints as well if bridge is being removed. Also stop endpoint periodic polling. Signed-off-by: Faizan Ali <[email protected]>

Added new bus-owner configuration for periodic polling of bridged endpoints. Signed-off-by: Faizan Ali <[email protected]>

Currently find_endpoint incorrectly tries to find physical address for sent EID. This fails in case EID is a gatewayed endpoint because don't have neighbors or physical address. For gateway routes, return the gateway's physical address instead. The gateway endpoint will then forward the message to the correct bridged endpoint internally via looking into its bridged endpoint list. Signed-off-by: Faizan Ali <[email protected]>

Test that bridge polling discovers downstream endpoints and creates endpoints d-bus object. Test that polling stops once downstream endpoint is discovered. and continues unless endpoint reponds to send polling command GET_ENDPOINT_ID. Test that once brige endpoint is removed, downstream endpoints associated to the brige also gets removed too. Signed-off-by: Faizan Ali <[email protected]>

Signed-off-by: Faizan Ali <[email protected]>

faizana-nvidia · 2025-10-22T10:54:59Z

Hello @jk-ozlabs ,
Is there a way to increase the time frame for unit tests beyond 30sec (may be by few seconds) as my test cases on an average is taking close to 30 sec (overhead being added for test polling) to complete which is why seeing failure in CI (imeout in test case)

jk-ozlabs · 2025-10-23T01:57:34Z

Is there a way to increase the time frame for unit tests beyond 30sec

Possibly, but I think the better approach here would be to decouple from wall-clock time in the test framework. There has been some work on that recently, but I don't think we're at a point where that's solved.

I'll take a look; if that's going to be longer-term then we may just want to temporarily increase the timeout...

faizana-nvidia · 2025-10-30T05:19:02Z

Is there a way to increase the time frame for unit tests beyond 30sec

Possibly, but I think the better approach here would be to decouple from wall-clock time in the test framework. There has been some work on that recently, but I don't think we're at a point where that's solved.

I'll take a look; if that's going to be longer-term then we may just want to temporarily increase the timeout...

Hello @jk-ozlabs

Gentle reminder, Will this block us for re-review this PR?

khangng-ampere · 2025-10-31T03:09:32Z

FWIW, I rebased this PR on top of my test timer implementation at #123 (mctpd: replace sd-event timers), made the following changes: (change sd_event to mctp_ops.sd_event) and add autojump_clock to tests that take to long to run:

Diff

diff --git a/src/mctpd.c b/src/mctpd.c
index f649ddfd13..e1c22383bd 100644
--- a/src/mctpd.c
+++ b/src/mctpd.c
@@ -5154,8 +5154,8 @@
        return rc < 0 ? rc : 0;

 reschedule:
-       rc = sd_event_source_set_time_relative(bridge->poll.sources[idx],
-                                              bridge->ctx->endpoint_poll);
+       rc = mctp_ops.sd_event.source_set_time_relative(
+               bridge->poll.sources[idx], bridge->ctx->endpoint_poll);
        if (rc >= 0) {
                rc = sd_event_source_set_enabled(bridge->poll.sources[idx],
                                                 SD_EVENT_ONESHOT);
@@ -5193,7 +5193,7 @@

                pctx->bridge = bridge;
                pctx->poll_eid = pool_start + i;
-               rc = sd_event_add_time_relative(
+               rc = mctp_ops.sd_event.add_time_relative(
                        ctx->event, &bridge->poll.sources[i], CLOCK_MONOTONIC,
                        ctx->endpoint_poll, 0, peer_endpoint_poll, pctx);
                if (rc < 0) {
diff --git a/tests/test_mctpd.py b/tests/test_mctpd.py
index b7d863d17f..e1b82471f8 100644
--- a/tests/test_mctpd.py
+++ b/tests/test_mctpd.py
@@ -1293,7 +1293,7 @@

 """ Test that we use endpoint poll interval from the config and
     that we discover bridged endpoints via polling"""
-async def test_bridged_endpoint_poll(dbus, sysnet, nursery):
+async def test_bridged_endpoint_poll(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]
@@ -1347,7 +1347,7 @@

 """ Test that all downstream endpoints are removed when the bridge
     endpoint is removed"""
-async def test_bridged_endpoint_remove(dbus, sysnet, nursery):
+async def test_bridged_endpoint_remove(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]
@@ -1397,7 +1397,7 @@
     assert res == 0

 """ Test that polling stops once endponit has been discovered """
-async def test_bridged_endpoint_poll_stop(dbus, sysnet, nursery):
+async def test_bridged_endpoint_poll_stop(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]
@@ -1452,7 +1452,7 @@
     assert res == 0

 """ Test that polling continues until the endpoint is discovered """
-async def test_bridged_endpoint_poll_continue(dbus, sysnet, nursery):
+async def test_bridged_endpoint_poll_continue(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]

Tests should take significantly shorter. Though I am not sure if this approach is approved by Jeremy :D (I changed the timerfd to sd_event based API)

faizana-nvidia · 2025-10-31T03:50:49Z

FWIW, I rebased this PR on top of my test timer implementation at #123 (mctpd: replace sd-event timers), made the following changes: (change sd_event to mctp_ops.sd_event) and add autojump_clock to tests that take to long to run:

Diff

diff --git a/src/mctpd.c b/src/mctpd.c
index f649ddfd13..e1c22383bd 100644
--- a/src/mctpd.c
+++ b/src/mctpd.c
@@ -5154,8 +5154,8 @@
        return rc < 0 ? rc : 0;

 reschedule:
-       rc = sd_event_source_set_time_relative(bridge->poll.sources[idx],
-                                              bridge->ctx->endpoint_poll);
+       rc = mctp_ops.sd_event.source_set_time_relative(
+               bridge->poll.sources[idx], bridge->ctx->endpoint_poll);
        if (rc >= 0) {
                rc = sd_event_source_set_enabled(bridge->poll.sources[idx],
                                                 SD_EVENT_ONESHOT);
@@ -5193,7 +5193,7 @@

                pctx->bridge = bridge;
                pctx->poll_eid = pool_start + i;
-               rc = sd_event_add_time_relative(
+               rc = mctp_ops.sd_event.add_time_relative(
                        ctx->event, &bridge->poll.sources[i], CLOCK_MONOTONIC,
                        ctx->endpoint_poll, 0, peer_endpoint_poll, pctx);
                if (rc < 0) {
diff --git a/tests/test_mctpd.py b/tests/test_mctpd.py
index b7d863d17f..e1b82471f8 100644
--- a/tests/test_mctpd.py
+++ b/tests/test_mctpd.py
@@ -1293,7 +1293,7 @@

 """ Test that we use endpoint poll interval from the config and
     that we discover bridged endpoints via polling"""
-async def test_bridged_endpoint_poll(dbus, sysnet, nursery):
+async def test_bridged_endpoint_poll(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]
@@ -1347,7 +1347,7 @@

 """ Test that all downstream endpoints are removed when the bridge
     endpoint is removed"""
-async def test_bridged_endpoint_remove(dbus, sysnet, nursery):
+async def test_bridged_endpoint_remove(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]
@@ -1397,7 +1397,7 @@
     assert res == 0

 """ Test that polling stops once endponit has been discovered """
-async def test_bridged_endpoint_poll_stop(dbus, sysnet, nursery):
+async def test_bridged_endpoint_poll_stop(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]
@@ -1452,7 +1452,7 @@
     assert res == 0

 """ Test that polling continues until the endpoint is discovered """
-async def test_bridged_endpoint_poll_continue(dbus, sysnet, nursery):
+async def test_bridged_endpoint_poll_continue(dbus, sysnet, nursery, autojump_clock):
     poll_interval = 2500
     config = f"""
     [bus-owner]

Tests should take significantly shorter. Though I am not sure if this approach is approved by Jeremy :D (I changed the timerfd to sd_event based API)

Thank you @khangng-ampere , I'll go through the changes you shared and test out the the timer.

jk-ozlabs · 2025-11-06T05:38:22Z

OK, the timers support has been merged. Let me know how you go with that.

We may want to discuss interactions with Get Routing Table Entries too.

faizana-nvidia mentioned this pull request Jul 4, 2025

Mctp bridge support #71

Merged

faizana-nvidia force-pushed the mctp-bridge-poll branch 2 times, most recently from 2746a86 to a472892 Compare July 22, 2025 19:59

faizana-nvidia changed the title ~~MCTP Bridge Polling and Downstream EID reservation~~ MCTP Bridge Polling Jul 23, 2025

faizana-nvidia force-pushed the mctp-bridge-poll branch 2 times, most recently from fdaa069 to 79e5e46 Compare August 28, 2025 19:32

faizana-nvidia force-pushed the mctp-bridge-poll branch 2 times, most recently from cf98296 to c892110 Compare August 28, 2025 20:01

faizana-nvidia mentioned this pull request Sep 4, 2025

Minor: fix bridge implementation #110

Merged

jk-ozlabs requested changes Sep 17, 2025

View reviewed changes

faizana-nvidia force-pushed the mctp-bridge-poll branch 3 times, most recently from 44989cb to cea1507 Compare October 6, 2025 17:59

faizana-nvidia commented Oct 6, 2025

View reviewed changes

faizana-nvidia force-pushed the mctp-bridge-poll branch 4 times, most recently from 4c4363e to 9192c09 Compare October 12, 2025 18:02

faizana-nvidia force-pushed the mctp-bridge-poll branch 2 times, most recently from f44c87f to 56f84d0 Compare October 16, 2025 06:40

faizana-nvidia added 8 commits October 22, 2025 15:58

mctpd: remove downstream peer and stop polling

1485838

Remove all downstream endpoints as well if bridge is being removed. Also stop endpoint periodic polling. Signed-off-by: Faizan Ali <[email protected]>

docs: mctpd: update periodic polling interval

6bedb25

Added new bus-owner configuration for periodic polling of bridged endpoints. Signed-off-by: Faizan Ali <[email protected]>

CHANGELOG:Add entry for downstream endpoint bridge polling

12bc987

Signed-off-by: Faizan Ali <[email protected]>

faizana-nvidia force-pushed the mctp-bridge-poll branch from 56f84d0 to 12bc987 Compare October 22, 2025 10:50

jk-ozlabs mentioned this pull request Oct 31, 2025

mctpd: implement routing table polling #128

Draft

MCTP Bridge Polling #85

Are you sure you want to change the base?

MCTP Bridge Polling #85

Uh oh!

Conversation

faizana-nvidia commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jk-ozlabs commented Jul 3, 2025

Uh oh!

faizana-nvidia commented Jul 4, 2025

Uh oh!

jk-ozlabs commented Jul 4, 2025

Uh oh!

faizana-nvidia commented Jul 4, 2025

Uh oh!

faizana-nvidia commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jk-ozlabs commented Jul 7, 2025

Uh oh!

faizana-nvidia commented Jul 7, 2025

Uh oh!

jk-ozlabs commented Jul 8, 2025

Uh oh!

faizana-nvidia commented Jul 9, 2025

Uh oh!

jk-ozlabs commented Jul 9, 2025

Uh oh!

faizana-nvidia commented Jul 23, 2025

Uh oh!

jk-ozlabs commented Jul 23, 2025

Uh oh!

faizana-nvidia commented Jul 23, 2025

Uh oh!

faizana-nvidia commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jk-ozlabs commented Aug 18, 2025

Uh oh!

faizana-nvidia commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jk-ozlabs commented Aug 25, 2025

Uh oh!

faizana-nvidia commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

faizana-nvidia commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jk-ozlabs commented Aug 29, 2025

Uh oh!

faizana-nvidia commented Aug 29, 2025

Uh oh!

faizana-nvidia commented Sep 10, 2025

Uh oh!

jk-ozlabs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

faizana-nvidia commented Jul 2, 2025 •

edited

Loading

faizana-nvidia commented Jul 4, 2025 •

edited

Loading

faizana-nvidia commented Aug 15, 2025 •

edited

Loading

faizana-nvidia commented Aug 19, 2025 •

edited

Loading

faizana-nvidia commented Aug 25, 2025 •

edited

Loading

faizana-nvidia commented Aug 28, 2025 •

edited

Loading

faizana-nvidia left a comment •

edited

Loading