Skip to content

Commit 67cf2d6

Browse files
dulinrileyfacebook-github-bot
authored andcommitted
Add unhandled supervision error hook to crash the client (#1637)
Summary: Part of #1209 Make two variants of the "actor_states_monitor" watchdog. One version for Owned ActorMesh, which will send a message to the owner if it exists, and one version for Ref ActorMesh which will not. This way, Ref actor meshes will generate liveness exceptions without propagation, and Owned actor meshes will send a SupervisionFailureMessage to its owning actor. Since every Owned mesh is also doing this, events will always reach the client if they aren't handled. Add a `monarch.actor.unhandled_fault_hook` function which is called when an unhandled supervision error reaches the client. It takes one argument, a MeshFailure object, and is expected to somehow halt the process. By default it calls `sys.exit(1)` after logging the error. Raising an exception is not sufficient, as it is called outside of a Python thread (by a tokio task). Note that propagation will not happen if an ActorMesh and all endpoints are unreachable and garbage collected, but the actors are still running something that generates an error. We'll want to fix this eventually. Reviewed By: mariusae Differential Revision: D85163744
1 parent f1b4e8d commit 67cf2d6

File tree

6 files changed

+389
-97
lines changed

6 files changed

+389
-97
lines changed

0 commit comments

Comments
 (0)