Skip to content

Conversation

ff137
Copy link
Contributor

@ff137 ff137 commented Apr 29, 2025

In summary:
💥 Breaking change: EventBus.notify() is now non-blocking, and all subscribed coroutines will run as background tasks.

Previously, subscribed handlers were run sequentially within the notify call, and all had to complete for notify to return.


This means that some of the more complex handlers, e.g. revocation-registry setup, which included chained notifications, can now complete earlier, while the remaining notifications queue off asynchronously. This makes it a major performance improvement to a core component. But comes with the "breaking change" caveat, because any flow that immediately expected to issue a credential using a revocable cred-def that was just created, will now need to wait a bit.

That's the only scenario that broke with this change: timing issues with POST /credential-definition returning earlier, while resources were created asynchronously. So, to cater for this "backward compatibility" issue:

  • ✨ Added a wait_for_active_revocation_registry utility method, which can be configured (skipped if desired) using a wait_for_revocation_setup boolean field in both POST /credential-definition endpoints.

  • 🔧 Added two new env vars:

    • REVOCATION_REGISTRY_CREATION_TIMEOUT, to configure revocation-setup timeouts (default 60s).
    • MAX_ACTIVE_EVENT_BUS_TASKS, to configure the queue size for the new event bus task queue (default 50).
  • 🐛 Exception handler added to AnonCredsRevocation._create_credential_helper, so that if "No active registry" error occurs, it will continue retrying a few times before raising. This was necessary for the Indy case, where waiting for an active registry record was not enough, as there was still some timing issue there.

  • 🎨 Also reorganizes some code (mainly constants), in order to help with cyclic import issues that came up, and fixes some "coroutine not awaited" warnings in tests, which tripped up PR test results.


❓ Question: The EventBus.notify method can now actually be made synchronous, because it just adds coroutines to a queue. That would be a secondary breaking change, because plugins or other libraries would no longer "await" the notify method. Maybe that change may as well be included now; maybe it can be done separately?


Original PR description:

This PR modifies the EventBus.notify method to now "fire and forget", instead of blocking the thread making the notify call.

This is achieved by moving event processors to background tasks, using the existing TaskQueue implementation.

This is especially helpful for event processors that involve ledger operations, which can take quite long. The function making the notify call never expects any result - it just wants to fire off the message and move on - and this PR will now allow that.


Side note: The refactored revocation registry management PR (#3831) introduces more granular events for making the state machine events recoverable. Without this change, the events are chained together. One .notify() continuously activates the next -- so if "notification complete" logs were added right after each .notify call, you will only see the very first notify completing once the inner-most nested notify call was complete. #3831 depends on events being completed as quickly as possible, so they can be monitored / recovered properly

@swcurran
Copy link
Contributor

Shoot — sorry, didn’t see that this was draft. Should have waited for the tests to complete before merging in main. Doh...

@ff137
Copy link
Contributor Author

ff137 commented Sep 9, 2025

Note: the PR tests in acapy_agent/protocols/present_proof/dif/tests/* are taking exceptionally long to complete.
(Not related to these changes.)

Full pytest run usually takes ~3 minutes, and it currently takes around 25 minutes, since about 2 hours ago.
Not sure why, but presumably there's some network load. And it's probably good at some point to mock the network calls for these unit tests, so the network is not a blocker for PRs

@ff137 ff137 marked this pull request as ready for review September 9, 2025 18:12
@ff137 ff137 requested review from a team, dbluhm and jamshale September 9, 2025 18:13
@ff137
Copy link
Contributor Author

ff137 commented Sep 9, 2025

The problem with scenario tests is that now the revocation registries are not guaranteed to be ready by the time the tests tries to use them.

I'll see what can be done about this. I presume that creating a revocable cred def would previously wait for events associated with rev-reg-def's to be complete before returning. So that behaviour can maybe be preserved somehow.

@ff137
Copy link
Contributor Author

ff137 commented Sep 9, 2025

EventBus.notify can now be made synchronous, but it would require updates to plugins that call it (would be breaking change if async is removed, since it shouldn't be awaited). So, I just added a todo

@ff137 ff137 marked this pull request as draft September 9, 2025 22:02
@ff137 ff137 force-pushed the improve-event-bus-notify branch from 6c682f6 to db9f4e4 Compare September 18, 2025 09:31
@ff137 ff137 marked this pull request as ready for review September 18, 2025 11:24
@ff137 ff137 force-pushed the improve-event-bus-notify branch 3 times, most recently from 1d3b8b6 to 849b88b Compare September 18, 2025 15:11
@ff137 ff137 marked this pull request as draft September 18, 2025 19:18
- Streamlined the process of notifying subscribers by using a list comprehension for creating partials.
- Added logging for cases where no subscribers are found for an event.
- Enhanced error handling by logging exceptions that occur during the scheduling of notification tasks.
- Introduced a new method to log exceptions in background notification tasks.
- Non-blocking event notifications.

This refactor aims to improve code performance and maintainability

Signed-off-by: ff137 <[email protected]>
Signed-off-by: ff137 <[email protected]>
Signed-off-by: ff137 <[email protected]>
CredDefPostRequestSchema can now accept a `wait_for_revocation_setup` boolean, to configure whether the endpoint should only return successfully if corresponding revocation resources are created within some timeout.

The default behaviour now maintains backwards compatibility, where previously the EventBus.notify method was blocking, and sequentially chained the completion of revocation resources, before the cred_def_post endpoint returned.

Signed-off-by: ff137 <[email protected]>
@ff137 ff137 force-pushed the improve-event-bus-notify branch from 849b88b to cbce1af Compare September 19, 2025 09:02
@ff137 ff137 marked this pull request as ready for review September 19, 2025 09:30
@ff137 ff137 force-pushed the improve-event-bus-notify branch from 429ed43 to 7a5b44d Compare September 19, 2025 14:33
@ff137 ff137 changed the title ♻️ Refactor EventBus notify method ♻️ 💥 Refactor EventBus notify method Sep 19, 2025
@ff137
Copy link
Contributor Author

ff137 commented Sep 19, 2025

Finally, took a long time to fix the tests, but now it's actually ready for review!

@ff137 ff137 requested review from a team September 19, 2025 19:26
jamshale
jamshale previously approved these changes Sep 22, 2025
Copy link
Contributor

@jamshale jamshale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice 👍

I also think we should change it to not be asynchronous in a separate task later. Maybe as a targeted release. Just to make the plugin updates more isolated and if there is any potential problems with the work around this it will be easier to find.

We also want to get the Kanon profile stuff completed. I'm not sure how many merge conflicts this will cause with it, but if it's ready now, it might be better to merge that first.

@ff137
Copy link
Contributor Author

ff137 commented Sep 22, 2025

I also think we should change it to not be asynchronous in a separate task later. Maybe as a targeted release. Just to make the plugin updates more isolated and if there is any potential problems with the work around this it will be easier to find.

We also want to get the Kanon profile stuff completed. I'm not sure how many merge conflicts this will cause with it, but if it's ready now, it might be better to merge that first.

Agreed on both 👍
Good to narrow the scope and do async def -> def change separately
And I think good to merge the Kanon stuff first. It'll be a smaller headache for me to fix merge conflicts here, than yet again making them fix merge conflicts there. So I'm happy to wait for that merge first.

Copy link

Copy link
Member

@esune esune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass, changes look great - much cleaner code in many sections. Will look out for a rebase after the other PR is merged for final review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants