See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs
regarding how to terminate threads gracefully.
The goal of this work was for valgrind to work correctly, which in turn
was to investigate #141. It looks like I have succeeded:
==2715080== Warning: unimplemented fcntl command: 1036
==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135
==2715080== at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2715080== by 0x3B708E: allocate (new_allocator.h:121)
==2715080== by 0x3B708E: allocate (allocator.h:173)
==2715080== by 0x3B708E: allocate (alloc_traits.h:460)
==2715080== by 0x3B708E: _M_allocate (stl_vector.h:346)
==2715080== by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635)
==2715080== by 0x42BF1C: resize (stl_vector.h:940)
==2715080== by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921)
==2715080== by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034)
==2715080== by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in*, char*, unsigned long) (shard/Shard.cpp:347)
==2715080== by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405)
==2715080== by 0x40B1E8: run (core/Loop.cpp:67)
==2715080== by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37)
==2715080== by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so)
==2715080== by 0x4D005E2: clone (in /usr/lib/libc-2.33.so)
==2715080==
==2715080==
==2715080== Exit program on first error (--exit-on-first-error=yes)
The idea is to drain the socket and do a single RocksDB WAL
write/fsync for all the write requests we have found.
The read requests are immediately executed. The reasoning here is
that currently write requests are _a lot_ slower than the read
requests because fsyncing takes ~500us on fsf1. In the future this
might change.
Since we're at it, we also use batch UDP syscalls in the CDC.
Fixes#119.
This is one of the two data model/protocol changes I want to perform
before going into production, the other being file atime.
Right now the kernel module does not take advantage of this, but
it's OK since I tested the rest of the code reasonably and the goal
here is to perform the protocol/data changes.
...most notably we now produce fully static binaries in an alpine
image.
A few assorted thoughts:
* I really like static binaries, ideally I'd like to run EggsFS
deployments with just systemd scripts and a few binaries.
* Go already does this, which is great.
* C++ does not, which is less great.
* Linking statically against `glibc` works, but is unsupported.
Not only stuff like NSS (which `gethostbyname` requires)
straight up does not work, unless you build `glibc` with
unsupported and currently apparently broken flags
(`--enable-static-nss`), but also other stuff is subtly
broken (I couldn't remember exactly what was broken,
but see comments such as
<https://github.com/haskell/haskell-language-server/issues/2431#issuecomment-985880838>).
* So we're left with alternative libcs -- the most popular being
musl.
* The simplest way to build a C++ application using musl is to just
build on a system where musl is already the default libc -- such
as alpine linux.
The backtrace support is in a bit of a bad state. Exception stacktraces
work on musl, but DWARF seems to be broken on the normal release build.
Moreover, libunwind doesn't play well with musl's signal handler:
<https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>.
Keeping it working seems to be a bit of a chore, and I'm going to revisit
it later.
In the meantime, gdb stack traces do work fine.
Also, produce fully static binaries. This means that `gethostname`
does not work (doesn't work with static glibc unless you build it
with `--enable-static-nss`, which no distro builds glibc with).