-
Notifications
You must be signed in to change notification settings - Fork 715
Allow interrupting blocking instructions #1930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow interrupting blocking instructions #1930
Conversation
ad7582f
to
3c2f060
Compare
I tested different combinations of blocking from main thread or from spawn thread, interpreter or aot, with or without hw bound check enabled. |
@eloparco Thanks a lot, the code is much better now, I will look into it soon. |
dea3508
to
f6ef4e7
Compare
f6ef4e7
to
185cd4d
Compare
core/iwasm/aot/aot_runtime.c
Outdated
#ifdef OS_ENABLE_BLOCK_INSN_INTERRUPT | ||
} | ||
else { | ||
ret = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who frees eg. "argv1" in wasm_runtime_invoke_native on a termination?
while OS_ENABLE_HW_BOUND_CHECK seems to have the same problem,
it seems much more complex to fix for OS_ENABLE_BLOCK_INSN_INTERRUPT because it's an async event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a potential issue, maybe there are two ways to resolve it: (1) avoid allocating memory in wasm_runtime_invoke_native, just define a large enough buffer for argv, (2) add a list in exec_env, add the allocating the memory to the list, and free it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(2) needs to block the signal for OS_ENABLE_BLOCK_INSN_INTERRUPT.
maybe we should invent a convenient api for common cases like memory allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the call at L1396? Isn't it freed at L1458 as before my changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a try with valgrind using interpreter and AOT, with or without HW bound check and no memory leakage is detected. Unfortunately, in case of AOT I see many warnings about invalid memory accesses that may explain the weird behavior that I reported in another comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge it to main branch as soon as possible when (2) is done and well tested.
At this point, we can probably just have the feature disabled by default until it's well tested.
the only simple solution i can think of right now involves block/unblock of the signal.
The tricky part is that it's not always possible to just block signals before the malloc and unblock after the free, because the blocking instruction can be in the middle. Apart from argv1
that we could avoid allocating dynamically, the problem is in the native function calls.
Not sure what's the best solution there, maybe the approach of keeping a list in exec_env
is not that bad. The memory that is not freed stays in the list and gets released only at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, suggest to add a list of allocated memories after setjmp in the exec_env.
@eloparco I have created branch dev/interrupt_block_insn
, could you upload another patch to it? Or do you mind I refine some code and upload the PR by myself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the source branch and dest branch are invalid, closed it and submitted #1948 instead
…instruction interrupt" wrapper
static void | ||
interrupt_block_insn_sig_handler() | ||
{ | ||
WASMJmpBuf *jmpbuf_node = exec_env_tls->jmpbuf_stack_top; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jmpbuf_stack_top here can be for another handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if HW bound check is enabled, it will get a "HW bound" jmpbuf
#1930 (comment), but that should not be a problem unless I'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can even get a signal between wasm_exec_env_push_jmpbuf and os_setjmp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can even get a signal between wasm_exec_env_push_jmpbuf and os_setjmp.
In that case, the jmpbuf
would be invalid, as it is in general before being set up by os_setjmp
. Isn't that the same for the HW bound check before my PR?
I wonder if we have to mask/un-mask signals to avoid the signal handler being called in the middle. We want to avoid signal handlers being called between wasm_exec_env_push_jmpbuf
and os_setjmp
as they would use a non-initialized jmpbuf
iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it isn't a problem for hw bound check because the signal for it is basically synchronous to the thread execution. ie. the thread itself can control where it can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe invert the order of wasm_exec_env_push_jmpbuf
and os_setjmp
? So that the jmpbuf
is only pushed after being initialized by os_setjmp
.
But there are probably other cases that are not covered, like an exception before os_setjmp
. [EDIT] These cases shouldn't be a problem since they're already handled by the normal exception spreading mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe invert the order of wasm_exec_env_push_jmpbuf and os_setjmp? So that the jmpbuf is only pushed after being initialized by os_setjmp.
I tried that in the latest commit, but maybe it's safer to use something like static volatile sig_atomic_t canjump;
(but with tls) as explained here https://notes.shichao.io/apue/ch10/?
you can even get a signal between wasm_exec_env_push_jmpbuf and os_setjmp.
Apart from this, do you think there are other cases that are not covered? If we get the signal before os_setjmp
, the signal handler (for instruction interruption) will return and the thread will stop when starting to execute instructions, using the normal exception propagation mechanism.
core/iwasm/aot/aot_runtime.c
Outdated
#ifdef OS_ENABLE_BLOCK_INSN_INTERRUPT | ||
} | ||
else { | ||
ret = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the call at L1396? Isn't it freed at L1458 as before my changes?
i'm not sure which file and version your line numbers are for.
i meant eg.
if (!(argv1 = runtime_malloc((uint32)size, exec_env->module_inst, NULL, |
wasm_runtime_free(argv1); |
when it escapes with a long jump, it might not be executed.
Move "blocking instruction interruption" implementation from #1930
Implementing #1910