Skip to content

Conversation

bizwark
Copy link
Contributor

@bizwark bizwark commented May 7, 2025

There are different threads running in the rag background workers. Only the main thread should receive the SIGTERM signal because the signal handler for it has an assert that checks for this condition, and that signal handler is part of the BackgroundWorker module. So disable signals before creating/running any new threads, and re-enable them only when we poll wait_latch. Signals received while other threads are active (and have signals blocked) will set the signals to pending, and they will be delivered once we poll. The async that polls wait_latch is tied to the thread where it was created, and that is safe to run the SIGTERM signal handler from pgrx v0.14.1.

@bizwark bizwark requested review from MMeent and tristan957 May 7, 2025 17:48
Copy link

@MMeent MMeent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is anywhere close to safe, but don't have enough of an understanding of the underlying primitives to make a solid argument.

Presumably, a better approach would be to have a separate thread exclusively used for this blocking wait_latch operation.

@bizwark
Copy link
Contributor Author

bizwark commented May 8, 2025

Having a separate thread for wait_latch might not fix the problem, because even if we left signals enabled for that thread it would mean that the signal handler executed in a non-main thread, which is what is causing the panic. Right now I think that async is tied to the main thread, which is why it's ok for the signal to fire in that context.

Would it be any better if I disabled only SIGTERM and not all signals?

Copy link

@tristan957 tristan957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs SAFETY comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants