ternfs-XTXMarkets

mirror of https://github.com/XTXMarkets/ternfs.git synced 2026-04-28 05:39:34 -05:00

Author	SHA1	Message	Date
Francesco Mazzoli	e5f133d826	Correct rota for "queue full" alert	2024-02-20 13:55:30 +00:00
Francesco Mazzoli	303421763a	Allow to specify rota per alert in C++	2024-02-20 12:59:42 +00:00
Saulius Grusnys	796e46f466	shuckle to track if blockservices have any files on them (currently t… (#177 ) * shuckle to track if blockservices have any files on them (currently there is issue with transient files)	2024-02-20 08:10:51 +00:00
Joshua Leahy	37a205b71e	Docker networking seems to not work on new arch snaps, this is fine	2024-02-19 14:38:52 +00:00
Francesco Mazzoli	bfe8a449df	Some `eggsktools` additions/improvements	2024-02-12 11:50:18 +00:00
Miroslav Crnic	83d0469c7f	SharedRocksdDB: correctly export metrics	2024-02-08 19:39:00 +00:00
Miroslav Crnic	37ba9bc457	shard: support for sharing rocksdb and init LogsDB CFs	2024-02-08 17:44:03 +00:00
Miroslav Crnic	38707535e3	shuckle: support metadata replication	2024-02-07 13:57:00 +00:00
Francesco Mazzoli	9c477ffa40	Make RocksDB patching idempotent	2024-01-30 11:37:52 +00:00
Francesco Mazzoli	25676f1096	Handle concurrent block swapping better	2024-01-30 11:22:45 +00:00
Miroslav Crnic	1d6ac9f648	cmake: add patch -N back	2024-01-29 17:25:07 +00:00
Miroslav Crnic	1dedd7d181	core: SPSC return 0 on timeout in pull	2024-01-29 17:16:05 +00:00
Miroslav Crnic	2ec1304981	core: ppoll, futex dont like negative timeouts	2024-01-29 17:00:14 +00:00
Francesco Mazzoli	9d1a31b482	Fix another signedness mismatch	2024-01-29 16:46:05 +00:00
Miroslav Crnic	e543665f8f	core: SPSC support timeout in pull	2024-01-29 16:06:31 +00:00
Francesco Mazzoli	2a326f7c5f	Fix usual signedness shenanigans 🥱	2024-01-29 16:05:19 +00:00
Francesco Mazzoli	9cf2931bc7	We do want the default patch behavior, not the -N one	2024-01-29 16:02:26 +00:00
Francesco Mazzoli	0a6a0c8f24	Process CDC timeouts in a timely manner	2024-01-29 15:08:06 +00:00
Francesco Mazzoli	1145ea10a3	Put `patch` in alpine docker build image	2024-01-29 14:43:36 +00:00
Francesco Mazzoli	2a6feb6df5	Patch RocksDB to make it compile with clang 15.	2024-01-29 14:15:29 +00:00
Miroslav Crnic	7ce185c219	cdc: remove uneccessary zeroing in shared	2024-01-24 14:24:06 +00:00
Francesco Mazzoli	8c0c246348	More robust detection of file vs. device errors Just check if we're also unable to count the blocks for the disk, and if yes, assume it's a single file error. Of course there will be a time period where we will not have detected the bad disk when counting the blocks (a few minutes at most), but that's OK -- the scrubber will scrub blocks for that period, and then stop. Once <internal-repo/issues/65#issuecomment-24747> is done, we should use whatever error detection we use for migration to also distinguish between these errors.	2024-01-22 13:18:53 +00:00
Francesco Mazzoli	f979a67b04	Always set non-zero transient deadline, fixes #145 .	2024-01-18 19:04:36 +00:00
Francesco Mazzoli	cd23deaf19	Accept `DIRECTORY_NOT_FOUND` in `SOFT_UNLINK_DIRECTORY` Nothing is preventing a non-existant inode to be sent in that request.	2024-01-18 12:00:43 +00:00
Francesco Mazzoli	2a95b345d2	Many changes to make CI work on new runner Most notably, we now run the non-kmod integration tests in docker. The kmod tests are already in their own environment (qemu).	2024-01-18 11:57:17 +00:00
Francesco Mazzoli	f8b432eb18	Add metric and alert for CDC update size	2024-01-16 23:22:39 +00:00
Francesco Mazzoli	694e17cbc2	Add alerts for full shard queues	2024-01-16 23:11:41 +00:00
Francesco Mazzoli	b6cf2b67a6	Distribute block services from shuckle This is in preparation for #44, but more immediately, to better stop writing to full block services. The previous strategy of setting a flag was flawed since once the flag was set it stayed set -- i.e. we would not remove it once files would be deleted. This consideration should just be integrated in distributing the block services.	2024-01-16 16:17:27 +00:00
Francesco Mazzoli	d569bdb494	Re-introduce thread names (they got lost in a refactor)	2024-01-11 17:32:52 +00:00
Francesco Mazzoli	c80c6269d9	Remove spurious `MsgsGen.hpp` includes	2024-01-11 16:05:34 +00:00
Francesco Mazzoli	8d0b97171e	Remove dead code	2024-01-11 13:03:26 +00:00
Francesco Mazzoli	c27ba8398a	Tear down all threads at once I had copied the LIFO pattern from ETD codebase, but it's not needed here given that the loop terminates gracefully and so we can coordinate explicitly if needed.	2024-01-09 16:53:23 +00:00
Francesco Mazzoli	c9bf49d387	Fix silly SPSC bug	2024-01-09 11:14:18 +00:00
Francesco Mazzoli	3097752a30	Minor tweak	2024-01-08 16:03:07 +00:00
Francesco Mazzoli	ee9e0ad0af	Remove `pthread_attr_setsigmask_np`, musl does not have it	2024-01-08 15:58:31 +00:00
Francesco Mazzoli	002b2854ec	Fix leak in `FetchedSpan`, and hopefully fix #141 .	2024-01-08 15:58:31 +00:00
Francesco Mazzoli	8075e99bb6	Graceful shard teardown See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs regarding how to terminate threads gracefully. The goal of this work was for valgrind to work correctly, which in turn was to investigate #141. It looks like I have succeeded: ==2715080== Warning: unimplemented fcntl command: 1036 ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135 ==2715080== at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==2715080== by 0x3B708E: allocate (new_allocator.h:121) ==2715080== by 0x3B708E: allocate (allocator.h:173) ==2715080== by 0x3B708E: allocate (alloc_traits.h:460) ==2715080== by 0x3B708E: _M_allocate (stl_vector.h:346) ==2715080== by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635) ==2715080== by 0x42BF1C: resize (stl_vector.h:940) ==2715080== by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921) ==2715080== by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034) ==2715080== by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in, char, unsigned long) (shard/Shard.cpp:347) ==2715080== by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405) ==2715080== by 0x40B1E8: run (core/Loop.cpp:67) ==2715080== by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37) ==2715080== by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so) ==2715080== by 0x4D005E2: clone (in /usr/lib/libc-2.33.so) ==2715080== ==2715080== ==2715080== Exit program on first error (--exit-on-first-error=yes)	2024-01-08 15:41:22 +00:00
Francesco Mazzoli	1963714c0f	Remove avoidable stat in collect directories	2023-12-15 21:20:05 +00:00
Francesco Mazzoli	01af461477	Factor out function	2023-12-15 18:30:12 +00:00
Francesco Mazzoli	73200f24b6	Use DWARF 4, Ubuntu 20.04 does not understand DWARF 5.	2023-12-11 16:23:55 +00:00
Francesco Mazzoli	898b85ad9c	Tweak GC parameters We're almost in a steady state, no need to overwhelm the shards.	2023-12-11 15:04:41 +00:00
Francesco Mazzoli	8c172fd2e8	Tiny C++ xmon fix	2023-12-10 11:14:19 +00:00
Francesco Mazzoli	27bd28ead0	Remove outdated comment	2023-12-10 08:39:17 +00:00
Francesco Mazzoli	788b5eed57	Fill in current block services before applying the log It makes a lot more sense to pick outside, given that it involves randomness. Also, this is in preparation for shuckle picking them in a smarter way.	2023-12-09 15:20:24 +00:00
Francesco Mazzoli	3394328000	Do not try to close xmon fd if we don't have one Also, ignore errors if we can't close it. Fixes #134.	2023-12-09 14:50:51 +00:00
Francesco Mazzoli	ab1df9137d	Fix error logging when inserting stats	2023-12-08 15:57:02 +00:00
Francesco Mazzoli	128078988d	Get rid of -parallel in GC With separate workers it's not really needed anymore.	2023-12-08 11:51:21 +00:00
Francesco Mazzoli	5f4467d0c6	Synchronize access to in-memory block service data This was alread an issue before, but it never surfaced so far. Today the quants actually hit it.	2023-12-07 16:43:11 +00:00
Francesco Mazzoli	53049d5779	Shard batch writes, use batch UDP syscalls The idea is to drain the socket and do a single RocksDB WAL write/fsync for all the write requests we have found. The read requests are immediately executed. The reasoning here is that currently write requests are _a lot_ slower than the read requests because fsyncing takes ~500us on fsf1. In the future this might change. Since we're at it, we also use batch UDP syscalls in the CDC. Fixes #119.	2023-12-07 14:29:07 +00:00
Francesco Mazzoli	3eae5bbf9b	Use an EMA for the in-flight CDC txns as well	2023-12-07 10:27:32 +00:00

1 2 3 4 5 ...

269 Commits