Background:
For a research project, I am trying to periodically clear the content of my Bitcoin node’s mempool. For this purpose I have created a new thread in init.cpp
using the boost
threadgroup
(I have experimented with this successfully with a different project in the past and I am sure this is not the cause of the problem) in the Bitcoin Core software. In the new thread, I periodically call a function func
(defined below), the sole purpose of which is to clear the content of the node’s mempool.
void func()
{
LOCK2(cs_main, mempool.cs);
AssertLockHeld(cs_main);
AssertLockHeld(mempool.cs);
mempool.clear();
}
I check for and lock critical sections that I think I should before I modify the mempool (the code above mimics what happens when a new transaction is added to the mempool; I basically looked at what happens in the NetMsgType::TX
branch of ProcessMessage
in net_processing.cpp
and then AcceptToMemoryPoolWorker
in validation.cpp
and mimic that).
Issue:
When I run Bitcoin after successfully compiling my code, I sometimes immediately (i.e., as soon as the GUI window pops up) and sometimes much later in the execution face a segfault that causes the program to crash.
ubuntu@root:~$ bitcoin-qt -mpctimeout=1
Segmentation fault (core dumped)
In the code above, the function func
is executed every 1 minute (represented by mpctimeout=1
).
Question:
What could be causing the segfault? Could it possibly be race conditions when trying to get a lock on the mempool or execution of the main thread? It is my understanding that LOCK2
should keep trying to get a lock on the critical sections when they become available in case they are already locked by another thread/process. Since these locks only exist within the scope of the function func
, they should ideally be released when returning from the function.
Edit # 1:
Some debugging shows that the segfault doesn’t occur while func
is being executed but while the thread is put in sleep for mpctimeout
amount of time. This leaves me wondering whether clearing the mempool has some long-term effects? I have made sure that the lock is released by verifying that transactions are indeed accepted to the mempool while the new thread is asleep. The segfault still occurs even if the mpctimeout
value is large (e.g., 1 hour).
FWIW, I use boost::this_thread::sleep_for
to put the thread to sleep. I assume this works without fault and that the segfault is instead caused in the long run due to clearing the mempool. Any tips?
Edit # 2:
Below is the output of debugging the binary through gdb
:
(gdb) run
Starting program: /usr/local/bin/bitcoin-qt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe9900700 (LWP 25643)]
[New Thread 0x7fffdf37f700 (LWP 25644)]
[New Thread 0x7fffdeb7e700 (LWP 25645)]
[New Thread 0x7fffdd0e7700 (LWP 25646)]
[New Thread 0x7fffc7cd2700 (LWP 25648)]
[New Thread 0x7fffc74d1700 (LWP 25649)]
[New Thread 0x7fffc6cd0700 (LWP 25650)]
[New Thread 0x7fffc64cf700 (LWP 25651)]
[New Thread 0x7fffc5cce700 (LWP 25652)]
[New Thread 0x7fffc54cd700 (LWP 25653)]
[New Thread 0x7fffc4ccc700 (LWP 25654)]
[New Thread 0x7fffa7fff700 (LWP 25655)]
[New Thread 0x7fffa77fe700 (LWP 25656)]
[New Thread 0x7fffa6ffd700 (LWP 25657)]
[New Thread 0x7fffa67fc700 (LWP 25658)]
[New Thread 0x7fffa5ffb700 (LWP 25659)]
[New Thread 0x7fffa57fa700 (LWP 25660)]
[New Thread 0x7fffa4f63700 (LWP 25661)]
[New Thread 0x7fff8dffd700 (LWP 25662)]
[New Thread 0x7fff8d7fc700 (LWP 25663)]
[New Thread 0x7fff8cffb700 (LWP 25664)]
[New Thread 0x7fff7bfff700 (LWP 25665)]
[New Thread 0x7fff7b7fe700 (LWP 25666)]
[New Thread 0x7fff7affd700 (LWP 25667)]
[New Thread 0x7fff7a7fc700 (LWP 25668)]
[New Thread 0x7fff79ffb700 (LWP 25669)]
[New Thread 0x7fff797fa700 (LWP 25670)]
[New Thread 0x7fff78ff9700 (LWP 25671)]
[New Thread 0x7fff53fff700 (LWP 25672)]
[New Thread 0x7fff537fe700 (LWP 25673)]
[New Thread 0x7fff4a174700 (LWP 25677)]
[New Thread 0x7ffed1a7b700 (LWP 25678)]
[Thread 0x7ffed1a7b700 (LWP 25678) exited]
[New Thread 0x7ffed127a700 (LWP 25679)]
[New Thread 0x7ffed0a79700 (LWP 25680)]
[New Thread 0x7ffecbfff700 (LWP 25681)]
[New Thread 0x7ffecb7fe700 (LWP 25682)]
[New Thread 0x7ffecaffd700 (LWP 25683)]
[New Thread 0x7ffeca7fc700 (LWP 25684)]
[New Thread 0x7ffec9ffb700 (LWP 25685)]
[New Thread 0x7ffec97fa700 (LWP 25686)]
[New Thread 0x7ffec8ff9700 (LWP 25687)]
[Thread 0x7ffed127a700 (LWP 25679) exited]
[Thread 0x7ffecb7fe700 (LWP 25682) exited]
Thread 40 "bitcoin-msghand" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffec9ffb700 (LWP 25685)]
__gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x1a45fe36190ee438)
at /usr/include/c++/7/ext/atomicity.h:96
96 __atomic_add(__mem, __val);
(gdb)
I suppose that the segfault is caused when trying to update an atomic
variable.
Edit # 3:
gdb
backtrace
#0 0x0000555555854178 in __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0xf) at /usr/include/c++/7/ext/atomicity.h:96
#1 0x0000555555854178 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy() (this=0x7) at /usr/include/c++/7/bits/shared_ptr_base.h:138
#2 0x0000555555854178 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) (__r=..., this=0x7ffef5ff71b8)
at /usr/include/c++/7/bits/shared_ptr_base.h:691
#3 0x0000555555854178 in std::__shared_ptr<CTransaction const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<CTransaction const, (__gnu_cxx::_Lock_policy)2> const&)
(this=0x7ffef5ff71b0) at /usr/include/c++/7/bits/shared_ptr_base.h:1121
#4 0x0000555555854178 in std::shared_ptr<CTransaction const>::shared_ptr(std::shared_ptr<CTransaction const> const&) (this=0x7ffef5ff71b0)
at /usr/include/c++/7/bits/shared_ptr.h:119
#5 0x0000555555854178 in CTxMemPoolEntry::GetSharedTx() const (this=0x7ffee12106a0) at ./txmempool.h:101
#6 0x0000555555854178 in PartiallyDownloadedBlock::InitData(CBlockHeaderAndShortTxIDs const&, std::vector<std::pair<uint256, std::shared_ptr<CTransaction const> >, std::allocator<std::pair<uint256, std::shared_ptr<CTransaction const> > > > const&) (this=this@entry=0x7ffef5ff9f30, cmpctblock=..., extra_txn=std::vector of length 100, capacity 100 = {...})
at blockencodings.cpp:114
#7 0x00005555556a2d4a in ProcessMessage(CNode*, std::__cxx11::string const&, CDataStream&, int64_t, CChainParams const&, CConnman*, std::atomic<bool> const&, bool)
(pfrom=pfrom@entry=0x7ffedc0010d0, strCommand="cmpctblock", vRecv=..., nTimeReceived=1593549860206352, chainparams=..., connman=0x7fffbc04ba20, interruptMsgProc=..., enable_bip61=false) at net_processing.cpp:3051
#8 0x00005555556a6f01 in PeerLogicValidation::ProcessMessages(CNode*, std::atomic<bool>&) (this=<optimized out>, pfrom=0x7ffedc0010d0, interruptMsgProc=...) at net_processing.cpp:3629
#9 0x000055555566b19e in CConnman::ThreadMessageHandler() (this=0x7fffbc04ba20) at net.cpp:1966