-
Notifications
You must be signed in to change notification settings - Fork 19
WIP: Fix that EINTR should be ignored on zmq_recv or propagated to corresponding caller
#135
base: master
Are you sure you want to change the base?
Conversation
|
This PR is related to #134 |
…sg_recv' + Readded debug-printing
…'t raised in 'event_loop' but in 'recv_all' - so added retry in this spot too
|
Haha okay... After running my set of applications that get |
|
Update after testing more: I caught the I don't know if this kind of malformed message can happen in other cases using ZMQ, but I think that this happening is strictly better than needing to restart the socket and context. The question is if the kind of malformed message is something the |
|
Sorry for the delay in replying. I like the approach, but there is a problem with the solution. I'm inclined to just catch and ignore sigint in the blocking parts of zmq when calling a non-blocking operation (send ~block:false / recv ~block_false / getsocketopt / setsocketopt), but that may cause other challenges. However, I'm unsure why the EINTR even occurs - i mean - who is sending signals to the application? Is there some background monitoring check that's trying to terminate the application, or someone pressing ctrl^c? |
You mean to catch Is this the reason why my call to I havn't observed that the socket became unuseable from my tests
I havn't observed this behaviour of ZMQ before now. The specific setup that triggers Edit: What kind of "background monitoring check" are you thinking about? |
Yes, but not the first receive. If the first part of a message is received, its important to receive all remaining parts of the message, or the next call to
That seems very likely. I think the solution is to catch let recv_all_wrapper (f : ?block:bool -> _ t -> _) =
(* Wrapper function to catch EINTR and just try again *)
let rec retry_on_eintr f =
try f () with Unix.EINTR with retry_on_eintr f
in
(* Once the first message part is received all remaining message parts can
be received without blocking. *)
let rec loop socket accu =
if retry_on_eintr (fun () -> has_more socket) then
let msg = retry_on_eintr (fun () -> f socket) in
loop socket (msg :: accu)
else
accu
in
fun ?block socket ->
let first = f ?block socket in
List.rev (loop socket [first])With this change, no read data will be thrown away so it will be safe to catch intr on upper level and retry the read.
I think some tests are needed to understand the "correct" way of handling EINTR to avoid the process become stuck because the signals are ignored in a wrong way.
Maybe some sort of process monitor that runs to ensure that your process is still running, e.g. by sending a SIGCONT or just signal |
|
-And to add. When sending a multipart message EAGAIN handling has the same problem as with EINTR, that it may leave half sent messages. So that will also need to be fixed somehow, and I'm starting to think that the use of a handler (kinda like a continuation) would be the best solution - but it will require some larger changes in the async code paths as well to handle that in a reliable way. |
|
The handler would look something like: let send_all =
let cont ?block socket msgs () =
match !msgs with
| [] -> ()
| [msg] ->
send ?block ~more:false socket msg;
msgs := []
| msg :: rest ->
send ?block ~more:true socket msg;
msgs := rest;
cont ?block socket msgs
in
fun ?block socket msgs ->
cont ?block socket (ref msgs)
This will return a continuation that can be called repeatably. I dont know if I like the mutable state in there but its a suggestion. To use this do: ...
(* My user-defined hander of EAGAIN or INTR *)
let rec retry f =
try f () with _ -> retry f
in
retry (send_all socket [list; of; messages]) |
|
Here is a snippet of when logging the signals on the process reading from the ipc pub-socket with .. these signals didn't specifically cause ZMQ to raise |
|
Ahh. yes. good old sigalarm. Maybe your application has an alarm set? Any signal delivered to the application will make ZMQ raise EINTR. Also see the zmq test here used to test raising of EINTR |
|
I'm still inclined to just eat EINTER and always try again. However, that does not solve problems with EAGAIN. Stay tuned. |
|
Ok. Since this organization is inactive, I've created a detached fork on https://github.com/andersfugmann/ocaml-zmq. I suggest moving the discussion there. |
This works for when getting
EINTRonzmq_recv.I guess that we would want to only ignore this error on this specific call, but that information is not present in the code at that point. A solution could be to somehow propagate the
EINTRto the user thread that lead to the error. The user then knows that the last call e.g. was arecvand can ignore the error.From reading the code in
zmq-deferred/src/socket.ml- I'm currently unsure of how this could be done in the best way. I'm thinking that the corresponding callback int.receiversshould be woken up with theEINTRexception - so either communicate this exception via a new field intor change the interface of allreceiversto take anexn optionas argument. To make the semantics consistent - allsendersshould also get this error propagated.But how to know what
receiver/senderthat causedEINTR...Update after testing
As retrying
recvonEINTRworks as implemented now, without giving the error to the user - the question is if we ever want to propagateEINTRinstead of auto-retrying. This would demand a more complex solution, as discussed earlier.Personally I currently think that it doesn't make much sense to propagate
EINTR. I havn't tried a case where e.g. Ctrl-c didn't work on the process using ZMQ, so that signal is propagated fine. I don't know how we would even know what signal lead toEINTR- which is pretty essential for the user if they need to act upon it...Also, currently
ocaml-zmqalready retries reading from socket in certain cases.Therefore I think this simple fix seems like the right one.
As an aside: Maybe some users would be interested in knowing if
EINTR/ retry has happened (I for one would be interested) - so maybe a user-setable 'logger' could be added toDeferred.T.Todo before merge
EINTRerror propagated to therecvcallerrecv_allafter retrying followingEINTREINTRhappening on'zmq_msg_recv'