Skip to content

Conversation

eloparco
Copy link
Contributor

@eloparco eloparco commented Feb 2, 2023

Implementing #1910

@eloparco eloparco force-pushed the eloparco/allow-interrupting-blocking-instructions branch from ad7582f to 3c2f060 Compare February 2, 2023 01:21
@eloparco
Copy link
Contributor Author

eloparco commented Feb 2, 2023

I tested different combinations of blocking from main thread or from spawn thread, interpreter or aot, with or without hw bound check enabled.
So far the combination that doesn't seem to work is aot + hw bound check disabled, where I see that the exit code is not the one expected. I'll keep investigating.
@wenyongh

@wenyongh
Copy link
Contributor

wenyongh commented Feb 2, 2023

I tested different combinations of blocking from main thread or from spawn thread, interpreter or aot, with or without hw bound check enabled. So far the combination that doesn't seem to work is aot + hw bound check disabled, where I see that the exit code is not the one expected. I'll keep investigating. @wenyongh

@eloparco Thanks a lot, the code is much better now, I will look into it soon.

@eloparco eloparco force-pushed the eloparco/allow-interrupting-blocking-instructions branch 3 times, most recently from dea3508 to f6ef4e7 Compare February 3, 2023 01:39
@eloparco eloparco force-pushed the eloparco/allow-interrupting-blocking-instructions branch from f6ef4e7 to 185cd4d Compare February 3, 2023 02:25
#ifdef OS_ENABLE_BLOCK_INSN_INTERRUPT
}
else {
ret = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who frees eg. "argv1" in wasm_runtime_invoke_native on a termination?

while OS_ENABLE_HW_BOUND_CHECK seems to have the same problem,
it seems much more complex to fix for OS_ENABLE_BLOCK_INSN_INTERRUPT because it's an async event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a potential issue, maybe there are two ways to resolve it: (1) avoid allocating memory in wasm_runtime_invoke_native, just define a large enough buffer for argv, (2) add a list in exec_env, add the allocating the memory to the list, and free it later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) needs to block the signal for OS_ENABLE_BLOCK_INSN_INTERRUPT.
maybe we should invent a convenient api for common cases like memory allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the call at L1396? Isn't it freed at L1458 as before my changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a try with valgrind using interpreter and AOT, with or without HW bound check and no memory leakage is detected. Unfortunately, in case of AOT I see many warnings about invalid memory accesses that may explain the weird behavior that I reported in another comment.

Copy link
Contributor Author

@eloparco eloparco Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge it to main branch as soon as possible when (2) is done and well tested.

At this point, we can probably just have the feature disabled by default until it's well tested.

the only simple solution i can think of right now involves block/unblock of the signal.

The tricky part is that it's not always possible to just block signals before the malloc and unblock after the free, because the blocking instruction can be in the middle. Apart from argv1 that we could avoid allocating dynamically, the problem is in the native function calls.

Not sure what's the best solution there, maybe the approach of keeping a list in exec_env is not that bad. The memory that is not freed stays in the list and gets released only at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, suggest to add a list of allocated memories after setjmp in the exec_env.
@eloparco I have created branch dev/interrupt_block_insn, could you upload another patch to it? Or do you mind I refine some code and upload the PR by myself?

Copy link
Contributor Author

@eloparco eloparco Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wenyongh, I pushed the changes I made so far in #1946. Not removing this one so that I can keep trying the changes on the wasi thread examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eloparco I submitted PR #1947 to refactor the feature, please help review, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the source branch and dest branch are invalid, closed it and submitted #1948 instead

static void
interrupt_block_insn_sig_handler()
{
WASMJmpBuf *jmpbuf_node = exec_env_tls->jmpbuf_stack_top;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jmpbuf_stack_top here can be for another handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if HW bound check is enabled, it will get a "HW bound" jmpbuf #1930 (comment), but that should not be a problem unless I'm missing something

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can even get a signal between wasm_exec_env_push_jmpbuf and os_setjmp.

Copy link
Contributor Author

@eloparco eloparco Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can even get a signal between wasm_exec_env_push_jmpbuf and os_setjmp.

In that case, the jmpbuf would be invalid, as it is in general before being set up by os_setjmp. Isn't that the same for the HW bound check before my PR?

I wonder if we have to mask/un-mask signals to avoid the signal handler being called in the middle. We want to avoid signal handlers being called between wasm_exec_env_push_jmpbuf and os_setjmp as they would use a non-initialized jmpbuf iiuc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't a problem for hw bound check because the signal for it is basically synchronous to the thread execution. ie. the thread itself can control where it can happen.

Copy link
Contributor Author

@eloparco eloparco Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe invert the order of wasm_exec_env_push_jmpbuf and os_setjmp? So that the jmpbuf is only pushed after being initialized by os_setjmp.

But there are probably other cases that are not covered, like an exception before os_setjmp. [EDIT] These cases shouldn't be a problem since they're already handled by the normal exception spreading mechanism.

Copy link
Contributor Author

@eloparco eloparco Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe invert the order of wasm_exec_env_push_jmpbuf and os_setjmp? So that the jmpbuf is only pushed after being initialized by os_setjmp.

I tried that in the latest commit, but maybe it's safer to use something like static volatile sig_atomic_t canjump; (but with tls) as explained here https://notes.shichao.io/apue/ch10/?

you can even get a signal between wasm_exec_env_push_jmpbuf and os_setjmp.

Apart from this, do you think there are other cases that are not covered? If we get the signal before os_setjmp, the signal handler (for instruction interruption) will return and the thread will stop when starting to execute instructions, using the normal exception propagation mechanism.

#ifdef OS_ENABLE_BLOCK_INSN_INTERRUPT
}
else {
ret = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the call at L1396? Isn't it freed at L1458 as before my changes?

i'm not sure which file and version your line numbers are for.

i meant eg.

if (!(argv1 = runtime_malloc((uint32)size, exec_env->module_inst, NULL,

when it escapes with a long jump, it might not be executed.

wenyongh pushed a commit that referenced this pull request Feb 9, 2023
Move "blocking instruction interruption" implementation from #1930
@wenyongh
Copy link
Contributor

Test this feature with branch dev/interrupt_block_insn, and PR #1946, #1948 were merged.

@wenyongh wenyongh closed this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants