-
Notifications
You must be signed in to change notification settings - Fork 35
bug: fix mptest hang, ProxyClient<Thread> deadlock in disconnect handler #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
I captured two stack traces of mptest hang on master (1b8d4a6) to be more confident this PR fixes the bug. I might also be able to update the test to make it trigger this bug more reliably, but would need to think more about how to do that. Both stack traces were identical and showed a deadlock between
thread 1
#0 0x00007f480509438e in __futex_abstimed_wait_common () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#1 0x00007f4805097010 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#2 0x000055df0ff8bdbb in std::condition_variable::wait<mp::EventLoop::post(kj::Function<void ()>)::$_2>(std::unique_lock<std::mutex>&, mp::EventLoop::post(kj::Function<void ()>)::$_2) (this=0x7f4804ffed08, __lock=..., __p=...)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:105
#3 mp::EventLoop::post (this=0x7f4804ffec90, fn=...) at mp/src/mp/proxy.cpp:273
269 Unlock(lock, [&] {
270 char buffer = 0;
271 KJ_SYSCALL(write(post_fd, &buffer, 1));
272 });
> 273 m_cv.wait(lock.m_lock, [this, &fn]() MP_REQUIRES(m_mutex) { return m_post_fn != &fn; });
274 }
#4 0x000055df0feefd7d in mp::EventLoop::sync<mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}>(mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}&&) (this=0x7f4804ffec90, callable=...) at mp/include/mp/proxy-io.h:182
179 template <typename Callable>
180 void sync(Callable&& callable)
181 {
> 182 post(std::forward<Callable>(callable));
183 }
#5 0x000055df0feefc52 in mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const (this=<optimized out>)
at mp/include/mp/proxy-io.h:434
431 Sub::destroy(*this);
432
433 // FIXME: Could just invoke removed addCleanup fn here instead of duplicating code
> 434 m_context.loop->sync([&]() {
435 // Remove disconnect callback on cleanup so it doesn't run and try
436 // to access this object after it's destroyed. This call needs to
437 // run inside loop->sync() on the event loop thread because
438 // otherwise, if there were an ill-timed disconnect, the
439 // onDisconnect handler could fire and delete the Connection object
440 // before the removeSyncCleanup call.
441 if (m_context.connection) m_context.connection->removeSyncCleanup(disconnect_cb);
#6 std::__invoke_impl<void, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&>(std::__invoke_other, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
#7 std::__invoke_r<void, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&>(mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&) (__fn=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
#8 std::_Function_handler<void (), mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}>::_M_invoke(std::_Any_data const&) (__functor=...)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
#9 0x000055df0ff52d26 in std::function<void()>::operator() (this=0x7fffb7c1b700) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
#10 mp::CleanupRun (fns=empty std::__cxx11::list) at mp/include/mp/proxy.h:43
39 inline void CleanupRun(CleanupList& fns) {
40 while (!fns.empty()) {
41 auto fn = std::move(fns.front());
42 fns.pop_front();
> 43 fn();
44 }
45 }
#11 0x000055df0ff872e7 in mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::~ProxyClientBase (this=0x7f4800027cf0) at mp/include/mp/proxy-io.h:460
457 template <typename Interface, typename Impl>
458 ProxyClientBase<Interface, Impl>::~ProxyClientBase() noexcept
459 {
> 460 CleanupRun(m_context.cleanup_fns);
461 }
#12 0x000055df0feebfc4 in std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> >::operator() (this=0x7fffb7c1b830, __ptr=0x7f4800027cf0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:93
#13 std::__uniq_ptr_impl<mp::ProxyClient<mp::test::messages::FooInterface>, std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> > >::reset (this=0x7fffb7c1b830, __p=0x0)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:205
#14 std::unique_ptr<mp::ProxyClient<mp::test::messages::FooInterface>, std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> > >::reset (this=0x7fffb7c1b830, __p=0x0)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:504
#15 mp::test::TestSetup::~TestSetup (this=this@entry=0x7fffb7c1b7d8) at mp/test/mp/test/test.cpp:103
98 ~TestSetup()
99 {
100 // Test that client cleanup_fns are executed.
101 bool destroyed = false;
102 client->m_context.cleanup_fns.emplace_front([&destroyed] { destroyed = true; });
> 103 client.reset();
104 KJ_EXPECT(destroyed);
105
106 thread.join();
107 }
#16 0x000055df0fee997f in mp::test::TestCase251::run (this=<optimized out>) at mp/test/mp/test/test.cpp:298
292 // Now that the disconnect has been detected, set signal allowing the
293 // callFnAsync() IPC call to return. Since signalling may not wake up the
294 // thread right away, it is important for the signal variable to be declared
295 // *before* the TestSetup variable so is not destroyed while
296 // signal.get_future().get() is called.
297 signal.set_value();
> 298 }
#17 0x00007f4805a34291 in kj::Maybe<kj::Exception> kj::runCatchingExceptions<kj::TestRunner::run()::{lambda()#1}>(kj::TestRunner::run()::{lambda()#1}&&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
#18 0x00007f4805a33a08 in kj::TestRunner::run() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
#19 0x00007f4805a337bd in kj::Function<kj::MainBuilder::Validity ()>::Impl<kj::_::BoundMethod<kj::TestRunner&, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#7}, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#8}> >::operator()() ()
from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
#20 0x00007f480572f93d in kj::MainBuilder::MainImpl::operator()(kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
#21 0x00007f480572df56 in kj::Maybe<kj::Exception> kj::runCatchingExceptions<kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**)::$_0>(kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**)::$_0&&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
#22 0x00007f480572dc7f in kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
#23 0x00007f4805a329b9 in main () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
#24 0x00007f480502a47e in __libc_start_call_main () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#25 0x00007f480502a539 in __libc_start_main_impl () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#26 0x000055df0fee6eb5 in _start ()
thread 2
#0 0x00007f480509438e in __futex_abstimed_wait_common () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#1 0x00007f4805097010 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#2 0x000055df0ff8bd0b in std::condition_variable::wait<mp::EventLoop::post(kj::Function<void ()>)::$_0>(std::unique_lock<std::mutex>&, mp::EventLoop::post(kj::Function<void ()>)::$_0) (this=0x7f4804ffed08, __lock=..., __p=...)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:105
#3 mp::EventLoop::post (this=this@entry=0x7f4804ffec90, fn=...) at mp/src/mp/proxy.cpp:266
258 void EventLoop::post(kj::Function<void()> fn)
259 {
260 if (std::this_thread::get_id() == m_thread_id) {
261 fn();
262 return;
263 }
264 Lock lock(m_mutex);
265 EventLoopRef ref(*this, &lock);
> 266 m_cv.wait(lock.m_lock, [this]() MP_REQUIRES(m_mutex) { return m_post_fn == nullptr; });
267 m_post_fn = &fn;
268 int post_fd{m_post_fd};
269 Unlock(lock, [&] {
270 char buffer = 0;
271 KJ_SYSCALL(write(post_fd, &buffer, 1));
272 });
273 m_cv.wait(lock.m_lock, [this, &fn]() MP_REQUIRES(m_mutex) { return m_post_fn != &fn; });
274 }
#4 0x000055df0ff8f22d in mp::EventLoop::sync<mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}>(mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}&&) (this=0x7f4804ffec90, callable=...) at mp/include/mp/proxy-io.h:182
179 template <typename Callable>
180 void sync(Callable&& callable)
181 {
> 182 post(std::forward<Callable>(callable));
183 }
#5 0x000055df0ff8f102 in mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const (this=<optimized out>) at mp/include/mp/proxy-io.h:434
431 Sub::destroy(*this);
432
433 // FIXME: Could just invoke removed addCleanup fn here instead of duplicating code
> 434 m_context.loop->sync([&]() {
435 // Remove disconnect callback on cleanup so it doesn't run and try
436 // to access this object after it's destroyed. This call needs to
437 // run inside loop->sync() on the event loop thread because
438 // otherwise, if there were an ill-timed disconnect, the
439 // onDisconnect handler could fire and delete the Connection object
440 // before the removeSyncCleanup call.
441 if (m_context.connection) m_context.connection->removeSyncCleanup(disconnect_cb);
#6 std::__invoke_impl<void, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&>(std::__invoke_other, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
#7 std::__invoke_r<void, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&>(mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&) (__fn=...)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
#8 std::_Function_handler<void (), mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}>::_M_invoke(std::_Any_data const&) (__functor=...)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
#9 0x000055df0ff52d26 in std::function<void()>::operator() (this=0x7f47ffffe870) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
#10 mp::CleanupRun (fns=empty std::__cxx11::list) at mp/include/mp/proxy.h:43
39 inline void CleanupRun(CleanupList& fns) {
40 while (!fns.empty()) {
41 auto fn = std::move(fns.front());
42 fns.pop_front();
> 43 fn();
44 }
45 }
#11 0x000055df0ff8e627 in mp::ProxyClientBase<mp::Thread, capnp::Void>::~ProxyClientBase (this=0x7f47f8000e58) at mp/include/mp/proxy-io.h:460
457 template <typename Interface, typename Impl>
458 ProxyClientBase<Interface, Impl>::~ProxyClientBase() noexcept
459 {
> 460 CleanupRun(m_context.cleanup_fns);
461 }
#12 0x000055df0ff53129 in std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >::~pair (this=0x7f47f8000e50) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_iterator.h:3013
#13 std::destroy_at<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > (__location=0x7f47f8000e50) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_construct.h:88
#14 std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > > >::destroy<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > (__p=0x7f47f8000e50, __a=...)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/alloc_traits.h:599
#15 std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_destroy_node (this=0x7f47fffff680, __p=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:621
#16 std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_drop_node (this=0x7f47fffff680, __p=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:629
#17 std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_erase (this=this@entry=0x7f47fffff680, __x=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:1934
#18 0x000055df0ff530c9 in std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::clear (this=0x7f47fffff680) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:1251
#19 std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_erase_aux (this=0x7f47fffff680, __first={...}, __last={...}) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:2505
#20 0x000055df0ff70fa6 in std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::erase (this=0x7f4804ffed30, __x=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:2519
#21 std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::erase (this=0x7f4804ffed30, __x=<optimized out>)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_map.h:1118
#22 operator() (this=this@entry=0x7f47ffffeaf8) at mp/include/mp/type-context.h:103
102 const bool erase_thread{inserted};
103 KJ_DEFER(if (erase_thread) {
104 std::unique_lock<std::mutex> lock(thread_context.waiter->m_mutex);
105 // Call erase here with a Connection* argument instead
106 // of an iterator argument, because the `request_thread`
107 // iterator may be invalid if the connection is closed
108 // during this function call. More specifically, the
109 // iterator may be invalid because SetThread adds a
110 // cleanup callback to the Connection destructor that
111 // erases the thread from the map, and also because the
112 // ProxyServer<Thread> destructor calls
113 // request_threads.clear().
> 114 request_threads.erase(server.m_context.connection);
115 });
116 fn.invoke(server_context, args...);
117 }
#23 0x000055df0ff709c7 in run (this=<optimized out>) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/common.h:2007
#24 ~Deferred (this=<optimized out>) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/common.h:1996
#25 operator() (this=0x7f4800034628) at mp/include/mp/type-context.h:117
> 117 }
#26 0x000055df0ff04920 in kj::Function<void()>::operator() (this=0x7f47ffffeda0) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/function.h:119
#27 mp::Unlock<std::unique_lock<std::mutex>, kj::Function<void()>&> (lock=..., callback=...) at mp/include/mp/util.h:198
194 template <typename Lock, typename Callback>
195 void Unlock(Lock& lock, Callback&& callback)
196 {
197 const UnlockGuard<Lock> unlock(lock);
> 198 callback();
199 }
#28 0x000055df0ff8d979 in mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}::operator()() const (this=<optimized out>) at mp/include/mp/proxy-io.h:296
284 template <class Predicate>
285 void wait(std::unique_lock<std::mutex>& lock, Predicate pred)
286 {
287 m_cv.wait(lock, [&] {
288 // Important for this to be "while (m_fn)", not "if (m_fn)" to avoid
289 // a lost-wakeup bug. A new m_fn and m_cv notification might be sent
290 // after the fn() call and before the lock.lock() call in this loop
291 // in the case where a capnp response is sent and a brand new
292 // request is immediately received.
293 while (m_fn) {
294 auto fn = std::move(*m_fn);
295 m_fn.reset();
> 296 Unlock(lock, fn);
297 }
298 const bool done = pred();
299 return done;
300 });
301 }
#29 std::condition_variable::wait<mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}) (this=0x7f47f8000dd8, __lock=..., __p=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:104
#30 mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}) (this=0x7f47f8000db0, lock=..., pred=...) at mp/include/mp/proxy-io.h:287
> 287 m_cv.wait(lock, [&] {
#31 mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const (this=<optimized out>) at mp/src/mp/proxy.cpp:404
393 kj::Promise<void> ProxyServer<ThreadMap>::makeThread(MakeThreadContext context)
394 {
395 const std::string from = context.getParams().getName();
396 std::promise<ThreadContext*> thread_context;
397 std::thread thread([&thread_context, from, this]() {
398 g_thread_context.thread_name = ThreadName(m_connection.m_loop->m_exe_name) + " (from " + from + ")";
399 g_thread_context.waiter = std::make_unique<Waiter>();
400 thread_context.set_value(&g_thread_context);
401 std::unique_lock<std::mutex> lock(g_thread_context.waiter->m_mutex);
402 // Wait for shutdown signal from ProxyServer<Thread> destructor (signal
403 // is just waiter getting set to null.)
> 404 g_thread_context.waiter->wait(lock, [] { return !g_thread_context.waiter; });
405 });
406 auto thread_server = kj::heap<ProxyServer<Thread>>(*thread_context.get_future().get(), std::move(thread));
407 auto thread_client = m_connection.m_threads.add(kj::mv(thread_server));
408 context.getResults().setResult(kj::mv(thread_client));
409 return kj::READY_NOW;
410 }
#32 std::__invoke_impl<void, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0>(std::__invoke_other, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0&&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
#33 std::__invoke<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0>(mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0&&) (__fn=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:96
#34 std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> >::_M_invoke<0ul> (this=<optimized out>)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:301
#35 std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> >::operator() (this=<optimized out>)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:308
#36 std::thread::_State_impl<std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> > >::_M_run (this=<optimized out>)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:253
#37 0x00007f48054ed064 in execute_native_thread_routine () from /nix/store/7c0v0kbrrdc2cqgisi78jdqxn73n3401-gcc-14.2.1.20250322-lib/lib/libstdc++.so.6
#38 0x00007f4805097e63 in start_thread () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#39 0x00007f480511bdbc in __clone3 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6 thread 3
#0 0x00007f480509463f in __lll_lock_wait () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#1 0x00007f480509b392 in pthread_mutex_lock@@GLIBC_2.2.5 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#2 0x000055df0ff8d324 in __gthread_mutex_lock (__mutex=0x7f47f8000db0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/x86_64-unknown-linux-gnu/bits/gthr-default.h:762
#3 std::mutex::lock (this=0x7f47f8000db0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_mutex.h:113
#4 std::unique_lock<std::mutex>::lock (this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_lock.h:147
#5 std::unique_lock<std::mutex>::unique_lock (__m=..., this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_lock.h:73
#6 mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0::operator()() const (
this=0x7f47f8001170) at mp/src/mp/proxy.cpp:326
308 std::tuple<ConnThread, bool> SetThread(ConnThreads& threads, std::mutex& mutex, Connection* connection, const std::function<Thread::Client()>& make_thread)
309 {
310 const std::unique_lock<std::mutex> lock(mutex);
311 auto thread = threads.find(connection);
312 if (thread != threads.end()) return {thread, false};
313 thread = threads.emplace(
314 std::piecewise_construct, std::forward_as_tuple(connection),
315 std::forward_as_tuple(make_thread(), connection, /* destroy_connection= */ false)).first;
316 thread->second.setDisconnectCallback([&threads, &mutex, thread] {
317 // Note: it is safe to use the `thread` iterator in this cleanup
318 // function, because the iterator would only be invalid if the map entry
319 // was removed, and if the map entry is removed the ProxyClient<Thread>
320 // destructor unregisters the cleanup.
321
322 // Connection is being destroyed before thread client is, so reset
323 // thread client m_disconnect_cb member so thread client destructor does not
324 // try to unregister this callback after connection is destroyed.
325 // Remove connection pointer about to be destroyed from the map
> 326 const std::unique_lock<std::mutex> lock(mutex);
327 thread->second.m_disconnect_cb.reset();
328 threads.erase(thread);
329 });
330 return {thread, true};
331 }
#7 std::__invoke_impl<void, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&>(std::__invoke_other, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&)
(__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
#8 std::__invoke_r<void, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&>(mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&) (__fn=...)
at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
#9 std::_Function_handler<void(), mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client()> const&)::$_0>::_M_invoke (__functor=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
#10 0x000055df0ff8de15 in std::function<void()>::operator() (this=0x7f47f8000fb0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
#11 mp::Unlock<mp::Lock, std::function<void()>&> (lock=..., callback=...) at mp/include/mp/util.h:198
194 template <typename Lock, typename Callback>
195 void Unlock(Lock& lock, Callback&& callback)
196 {
197 const UnlockGuard<Lock> unlock(lock);
> 198 callback();
199 }
#12 0x000055df0ff8a70f in mp::Connection::~Connection (this=0x7f480002a810) at mp/src/mp/proxy.cpp:139
135 Lock lock{m_loop->m_mutex};
136 while (!m_sync_cleanup_fns.empty()) {
137 CleanupList fn;
138 fn.splice(fn.begin(), m_sync_cleanup_fns, m_sync_cleanup_fns.begin());
> 139 Unlock(lock, fn.front());
140 }
141 }
#13 0x000055df0feeedd3 in std::default_delete<mp::Connection>::operator() (__ptr=0x7f480002a810, this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:93
#14 std::__uniq_ptr_impl<mp::Connection, std::default_delete<mp::Connection> >::reset (this=0xfffffffffffffe00, __p=0x0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:205
#15 std::unique_ptr<mp::Connection, std::default_delete<mp::Connection> >::reset (this=0xfffffffffffffe00, __p=0x0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:504
#16 mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}::operator()() const (this=<optimized out>) at mp/test/mp/test/test.cpp:79
76 server_disconnect = [&] { loop.sync([&] { server_connection.reset(); }); };
77 // Set handler to destroy the server when the client disconnects. This
78 // is ignored if server_disconnect() is called instead.
> 79 server_connection->onDisconnect([&] { server_connection.reset(); });
80
81 auto client_connection = std::make_unique<Connection>(loop, kj::mv(pipe.ends[1]));
#17 kj::_::MaybeVoidCaller<kj::_::Void, kj::_::Void>::apply<mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}>(mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}&, kj::_::Void&&) (func=..., in=...)
at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-prelude.h:195
#18 kj::_::TransformPromiseNode<kj::_::Void, kj::_::Void, mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}, kj::_::PropagateException>::getImpl(kj::_::ExceptionOrValue&) (this=<optimized out>, output=...)
at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-inl.h:739
#19 0x00007f48057c3a3e in kj::_::TransformPromiseNodeBase::get(kj::_::ExceptionOrValue&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
#20 0x00007f48057d102a in kj::TaskSet::Task::fire() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
#21 0x00007f48057d143d in non-virtual thunk to kj::TaskSet::Task::fire() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
#22 0x00007f48057c8cb2 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation)::$_2::operator()() const () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
#23 0x00007f48057c1d68 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
#24 0x000055df0ff8e44c in kj::Promise<unsigned long>::wait (this=<optimized out>, waitScope=..., location=...) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-inl.h:1357
#25 0x000055df0ff8b722 in mp::EventLoop::loop (this=0x7f4804ffec90) at mp/src/mp/proxy.cpp:231
> 231 const size_t read_bytes = wait_stream->read(&buffer, 0, 1).wait(m_io_context.waitScope);
#26 0x000055df0feed376 in mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const (this=0x7f480002d1e8) at mp/test/mp/test/test.cpp:92
91 client_promise.set_value(std::move(client_proxy));
> 92 loop.loop();
#27 0x00007f48054ed064 in execute_native_thread_routine () from /nix/store/7c0v0kbrrdc2cqgisi78jdqxn73n3401-gcc-14.2.1.20250322-lib/lib/libstdc++.so.6
#28 0x00007f4805097e63 in start_thread () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
#29 0x00007f480511bdbc in __clone3 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6 |
fe1cbed
to
1d6d854
Compare
Updated fe1cbed -> 1d6d854 ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestedACK 1d6d854
As I understand it, there are two problems that are solved by this PR:
- a kind of use-after-free bug where the disconnect_cb is deleted in one thread while it is being executed in another
- a deadlock caused by two threads competing for the same waiter mutex
I left comments with more details under this review.
564e353
to
f219f1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Updated with suggested changes
re: #201 (review)
As I understand it, there are two problems that are solved by this PR:
Yes, there were basically the deadlock bug which happened most commonly, and different race bugs that could happen because there wasn't really any synchronization previously between disconnect handler and the type-context.h cleanup code after asynchronous IPC calls.
Updated 1d6d854 -> 564e353 (pr/context.2
-> pr/context.3
, compare) updating comments and adding assert.
Updated 564e353 -> f219f1b (pr/context.3
-> pr/context.4
, compare) fixing LLM linter issues
Can confirm that this fixes the observed test hang. |
utACK f219f1b I left some suggestions to make the intermediate commits more clear, which might benefit other reviewers. |
Use -Wthread-safety not -Wthread-safety-analysis in llvm and sanitize jobs. -Wthread-safety is a more general flag which adds additional checks outside the core thread safety analysis. Credit to hodlinator for noticing the bitcoin core build being stricter about thread safety checks than the libmultiprocess build here: bitcoin-core#201 (comment)
This is just a refactoring, no behavior is changing. This is only adding annotations and switching from unannotated to annotated types. In SetThread, the first two parameters are combined into a single GuardedRef parameter to avoid -Wthread-safety-reference warnings at call sites like "warning: passing variable 'callback_threads' by reference requires holding mutex 'thread_context.waiter->m_mutex'"
This is just a refactoring changing the ConnThreads data type. The optional value is not actually left in an unset state until the next commit.
This commit easiest to review ignoring whitespace (git diff -w). This is a minor change in behavior, but the only change is shortening the duration that threads.mutex is locked while inserting a new entry in the threads.ref map. The lock is now only held while the entry is created and is released while the ProxyClient<Thread> object is initialized. This change doesn't really fix any problems but it simplifies the next commit which deals with race conditions and deadlocks in this code, so it has been split out.
…returning This bug is currently causing mptest "disconnecting and blocking" test to occasionally hang as reported by maflcko in bitcoin/bitcoin#33244. The bug was actually first reported by Sjors in Sjors/bitcoin#90 (comment) and there are more details about it in bitcoin-core#189. The bug is caused by the "disconnecting and blocking" test triggering a disconnect right before a server IPC call returns. This results in a race between the IPC server thread and the onDisconnect handler in the event loop thread both trying to destroy the server's request_threads ProxyClient<Thread> object when the IPC call is done. There was a lack of synchronization in this case, fixed here by adding loop->sync() a few places. Specifically the fixes were to: - Always call SetThread on the event loop thread using the loop->sync() method, to prevent a race between the ProxyClient<Thread> creation code and the connection shutdown code if there was an ill-timed disconnect. - Similarly in ~ProxyClient<Thread> and thread-context.h PassField(), use loop->sync() when destroying the thread object, in case a disconnect happens at that time. A few other changes were made in this commit to make the resulting code safer and simpler, even though they are not technically necessary for the fix: - In thread-context.h PassField(), Waiter::m_mutex is now unlocked while destroying ProxyClient<Thread> just to respect EventLoop::m_mutex and Waiter::m_mutex lock order and never lock the Waiter first. This is just for consistency. There is no actually possibility for a deadlock here due to the new sync() call. - This adds asserts to make sure functions expected to run on the event loop thread are only called on that thread. - This inlines the ProxyClient<Thread>::setDisconnectCallback function, just because it was a short function only called in a single place.
f219f1b
to
b6321bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews! I took Sjors commits to break the change up into smaller parts and addressed hodlinators suggestions to add more thread safety annotations and make them more strict. I think both changes should make this PR easier to review.
Updated f219f1b -> b6321bb (pr/context.4
-> pr/context.5
, compare) with suggested changes to split commits and improve thread safety annotations. This actual fix is not changed.
Updated b6321bb -> d86823e (pr/context.5
-> pr/context.6
, compare) fixing typos
b6321bb
to
d86823e
Compare
d86823e looks good once CI is happy... Thanks for splitting the commits. The new When I ran Additionally, maybe one of @DrahtBot's llm's can look through CI logs (when all jobs pass) for anything unusual compared to master. OpenBSD fail seems real:
https://github.com/bitcoin-core/libmultiprocess/actions/runs/17628430545/job/50090729622?pr=201 Co-pilot seems to think this is due to incomplete c++20 support, and we could pre-empt that by adding |
d86823e
to
4a269b2
Compare
Thanks, the openbsd failure is probably just happening because openbsd is using an old version of clang. It should be possible to reproduce this locally using the llvm job and following change in shell.nix: - llvm = crossPkgs.llvmPackages_20;
+ llvm = crossPkgs.llvmPackages_16; But I'm having some kind of network issue and can't download the llvm 16 packages. You are right that it would be nice to have a preset or something that makes it easier to see thread safety errors locally. FWIW I just have a configuration script that does: MP_CXX_FLAGS="-Werror -ftemplate-backtrace-limit=0 -Wthread-safety"
CC=clang CXX=clang++ cmake -DCMAKE_INSTALL_PREFIX=$HOME/work/mp/build/prefix -DCapnProto_DEBUG=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="$MP_CXX_FLAGS" -DMULTIPROCESS_RUN_CLANG_TIDY=1 .. And sometimes I'll add other things like As far as draftbot/CI, the llvm job in Updated d86823e -> 4a269b2 ( |
Openbsd / clang-16 fix seems to have worked! |
ACK 4a269b2 |
I've made progress on implementing a more comprehensive disconnect test that can detect bugs like bitcoin/bitcoin#33244 and #189 more reliably instead of only intermittently when the test is run thousands of times. If I can finish it soon, I'll add it to this PR, otherwise it can be a followup. The test is implemented in commit ea4976f (tag) and I think I'm maybe 70% finished with the implementation. The test passes on master because not all of the checks are enabled yet and many aren't implemented (these are noted in FIXME comments in the code). The commit combines the 4 existing disconnect test cases into a more general test and follows up on earlier ideas and efforts from: |
@ryanofsky that commit also touches non-test code, so I think it would be better as a standalone PR. |
That's reasonable. But to be fair the only changes to non-test code are adding 4 lines like: if (connection->m_loop->m_signal) connection->m_loop->m_signal(SignalCallSetup{}); so the test is able to disconnect at specific points. if (connection->m_loop->hasSignal<CallSetup>()) connection->m_loop->sendSignal<CallSetup>()); |
This bug is currently causing mptest "disconnecting and blocking" test to occasionally hang as reported by maflcko in bitcoin/bitcoin#33244.
The bug was actually first reported by Sjors in Sjors/bitcoin#90 (comment) and there are more details about it in #189.
The bug is caused by the "disconnecting and blocking" test triggering a disconnect right before a server IPC call returns. This results in a race between the IPC server thread and the
onDisconnect
handler in the event loop thread both trying to destroy the server'srequest_threads
ProxyClient<Thread>
object when the IPC call is done.There was a lack of synchronization in this case, fixed here by adding
loop->sync()
various places. There were also lock order issues whereWaiter::m_mutex
could be incorrectly locked beforeEventLoop::m_mutex
resulting in a deadlock.