ternfs-XTXMarkets

mirror of https://github.com/XTXMarkets/ternfs.git synced 2026-05-08 05:12:56 -05:00

Author	SHA1	Message	Date
Francesco Mazzoli	4096e73818	Kill all references to internal services	2025-09-03 10:35:40 +00:00
Miroslav Crnic	5043e6d09a	remove unusued appNameSuffix option	2025-06-23 12:03:49 +00:00
Miroslav Crnic	48c3aa7d4a	logsdb: enable partial leader election	2024-10-11 09:52:18 +01:00
Miroslav Crnic	2dec9ec117	cdc: register location	2024-09-12 14:27:55 +01:00
Miroslav Crnic	c7b6a1cbeb	stats: stop producing them	2024-07-09 15:57:01 +01:00
Miroslav Crnic	4e574374ca	shard/cdc: cleanup logsdb options, hostmon name match service name	2024-05-22 10:21:41 +00:00
Miroslav Crnic	8a0ea10cde	core: UDPSocketPair and use IpPort AddrsInfo everywhere * core: UDPSocketPair and use IpPort AddrsInfo everywhere * Refactor UDPSocketPair a bit * ci: kmod always delete img before create * shuckle: fix scripts/json marshal --------- Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>	2024-05-03 11:32:07 +01:00
Miroslav Crnic	409b126e4b	cdc: use SharedRocksDB	2024-04-05 23:22:39 +01:00
Francesco Mazzoli	7a5fc9f8a9	Allow to disable shuckle stat inserting	2024-03-25 16:08:54 +00:00
Miroslav Crnic	b240de53b5	shard: distributed log implementation and shard can use it with a flag set	2024-03-12 11:02:04 +00:00
Francesco Mazzoli	8075e99bb6	Graceful shard teardown See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs regarding how to terminate threads gracefully. The goal of this work was for valgrind to work correctly, which in turn was to investigate #141. It looks like I have succeeded: ==2715080== Warning: unimplemented fcntl command: 1036 ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135 ==2715080== at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==2715080== by 0x3B708E: allocate (new_allocator.h:121) ==2715080== by 0x3B708E: allocate (allocator.h:173) ==2715080== by 0x3B708E: allocate (alloc_traits.h:460) ==2715080== by 0x3B708E: _M_allocate (stl_vector.h:346) ==2715080== by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635) ==2715080== by 0x42BF1C: resize (stl_vector.h:940) ==2715080== by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921) ==2715080== by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034) ==2715080== by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in, char, unsigned long) (shard/Shard.cpp:347) ==2715080== by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405) ==2715080== by 0x40B1E8: run (core/Loop.cpp:67) ==2715080== by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37) ==2715080== by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so) ==2715080== by 0x4D005E2: clone (in /usr/lib/libc-2.33.so) ==2715080== ==2715080== ==2715080== Exit program on first error (--exit-on-first-error=yes)	2024-01-08 15:41:22 +00:00
Francesco Mazzoli	53049d5779	Shard batch writes, use batch UDP syscalls The idea is to drain the socket and do a single RocksDB WAL write/fsync for all the write requests we have found. The read requests are immediately executed. The reasoning here is that currently write requests are _a lot_ slower than the read requests because fsyncing takes ~500us on fsf1. In the future this might change. Since we're at it, we also use batch UDP syscalls in the CDC. Fixes #119.	2023-12-07 14:29:07 +00:00
Francesco Mazzoli	476009381a	Remove maximum enqueued requests limit We already drop in-flight requests that we're already processing, so I don't think this matters very much currently.	2023-11-29 11:08:07 +00:00
Francesco Mazzoli	afc4e78a62	Reduce default CDC queue size	2023-11-05 22:38:57 +00:00
Francesco Mazzoli	77ac15af8d	Allow to choose xmon env in C++ apps	2023-09-18 11:56:44 +00:00
Francesco Mazzoli	5117ddd16e	Add shard/CDC metrics	2023-08-08 09:21:35 +01:00
Francesco Mazzoli	63e2db0889	Cap maximum number of CDC requests No point letting huge queues build -- especially now that we deduplicate client requests.	2023-08-01 21:17:23 +01:00
Francesco Mazzoli	ff9306f6e3	Add Xmon support to C++ code	2023-07-11 12:13:22 +00:00
Francesco Mazzoli	4e0e6fe8a8	Configurable CDC shard timeout Running in valgrind seems to just not be able to process a small FullReadDirReq in 100ms, which is a bit concerning, but I'll let it slide for now.	2023-07-04 08:05:42 +00:00
Francesco Mazzoli	b041d14860	Add second ip/addr for CDC/shards too This is one of the two data model/protocol changes I want to perform before going into production, the other being file atime. Right now the kernel module does not take advantage of this, but it's OK since I tested the rest of the code reasonably and the goal here is to perform the protocol/data changes.	2023-06-05 12:14:14 +00:00
Francesco Mazzoli	a12a938c40	syslogify logs	2023-05-29 09:52:01 +00:00
Francesco Mazzoli	51860fac3a	Various improveents, nothing substantial	2023-02-14 22:39:38 +00:00
Francesco Mazzoli	85889266b1	Various housekeeping while I get ready to deploy... ...most notably we now produce fully static binaries in an alpine image. A few assorted thoughts: * I really like static binaries, ideally I'd like to run EggsFS deployments with just systemd scripts and a few binaries. * Go already does this, which is great. * C++ does not, which is less great. * Linking statically against `glibc` works, but is unsupported. Not only stuff like NSS (which `gethostbyname` requires) straight up does not work, unless you build `glibc` with unsupported and currently apparently broken flags (`--enable-static-nss`), but also other stuff is subtly broken (I couldn't remember exactly what was broken, but see comments such as <https://github.com/haskell/haskell-language-server/issues/2431#issuecomment-985880838>). * So we're left with alternative libcs -- the most popular being musl. * The simplest way to build a C++ application using musl is to just build on a system where musl is already the default libc -- such as alpine linux. The backtrace support is in a bit of a bad state. Exception stacktraces work on musl, but DWARF seems to be broken on the normal release build. Moreover, libunwind doesn't play well with musl's signal handler: <https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>. Keeping it working seems to be a bit of a chore, and I'm going to revisit it later. In the meantime, gdb stack traces do work fine.	2023-01-29 21:41:40 +00:00
Francesco Mazzoli	9adca070ba	Convert build system to cmake Also, produce fully static binaries. This means that `gethostname` does not work (doesn't work with static glibc unless you build it with `--enable-static-nss`, which no distro builds glibc with).	2023-01-26 23:20:58 +00:00

24 Commits