ternfs-XTXMarkets

mirror of https://github.com/XTXMarkets/ternfs.git synced 2026-02-16 15:29:11 -06:00

Author	SHA1	Message	Date
Miroslav Crnic	aebcce4017	logsdb: fix assert for last relased going backwards	2024-03-25 10:31:58 +00:00
Miroslav Crnic	7df0a5da89	shard: cli options now match migration phases for LogsDB, and support manual failover	2024-03-20 15:34:55 +00:00
Saulius Grusnys	fd9079febf	Rate limited shuckle endpoint to decom blockservices	2024-03-20 15:16:00 +00:00
Francesco Mazzoli	1cf299bfac	Use atomics where appropriate	2024-03-20 13:21:18 +00:00
Francesco Mazzoli	f85714dbba	Use `pthread_self()` to get pthread thread id	2024-03-20 13:11:14 +00:00
Francesco Mazzoli	3a6e498664	Make some `Loop` methods static	2024-03-20 13:00:18 +00:00
Francesco Mazzoli	9bc7e209e4	Safer `ShuckleSock`	2024-03-20 11:33:39 +00:00
Francesco Mazzoli	66fe0a2621	Correct `pthread_timedjoin_np` handling	2024-03-20 11:13:26 +00:00
Francesco Mazzoli	8f1ba6361b	Resist interruptions when joining threads	2024-03-20 10:32:42 +00:00
Francesco Mazzoli	66ccba6124	Forward termination signal to main thread	2024-03-20 10:32:42 +00:00
Francesco Mazzoli	b12cdf7507	Add replicas info to shuckle web ui	2024-03-19 15:55:18 +00:00
Miroslav Crnic	938c845a30	eggsdbtool: cli for shard db comparison	2024-03-19 15:00:01 +00:00
Miroslav Crnic	a4c091c7b2	logsdb: log state at flush to have consistent view	2024-03-19 12:44:56 +00:00
Miroslav Crnic	096b9cbe6a	logsdb: fix for replication path	2024-03-18 17:29:49 +00:00
Miroslav Crnic	dfcabdba97	LogsDB: tweak catchup timeout	2024-03-18 12:00:27 +00:00
Miroslav Crnic	c8cda7e4db	logsdb: periodically log status	2024-03-18 09:44:47 +00:00
Miroslav Crnic	72c1acaea8	xmon: if too many alerts initialize appType to _parent	2024-03-15 19:39:41 +00:00
Miroslav Crnic	27faaa45ae	ci: add ability to run with LogsDB, shard: add handling of LogsDB messages	2024-03-15 16:49:39 +00:00
Miroslav Crnic	ebcdcb650a	shard: add support for resetting all data in LogsDB	2024-03-13 11:33:48 +00:00
Francesco Mazzoli	005121bcac	Spin block service cache out of `ShardDB` This started being a problem since the block service update log entry does not fit in a UDP packet (it's like 100KB). I think this approach makes more sense anyway. See comment for `getCache()` for gotchas.	2024-03-13 11:29:58 +00:00
Francesco Mazzoli	6968c25bc5	Allow `:` in metrics	2024-03-12 14:04:34 +00:00
Miroslav Crnic	13c5df0131	shard: fix name in xmon and add replica id to tag in metrics	2024-03-12 13:40:35 +00:00
Miroslav Crnic	b240de53b5	shard: distributed log implementation and shard can use it with a flag set	2024-03-12 11:02:04 +00:00
Francesco Mazzoli	0037e8d10b	Print some info about block service flags in shard	2024-03-08 09:18:54 +00:00
Miroslav Crnic	712ed8973e	core: simplify implementing custom stop for Loop	2024-02-23 13:52:34 +00:00
Francesco Mazzoli	531f989a06	Correct app type for quiet alert creation	2024-02-20 14:16:52 +00:00
Francesco Mazzoli	303421763a	Allow to specify rota per alert in C++	2024-02-20 12:59:42 +00:00
Saulius Grusnys	796e46f466	shuckle to track if blockservices have any files on them (currently t… (#177 ) * shuckle to track if blockservices have any files on them (currently there is issue with transient files)	2024-02-20 08:10:51 +00:00
Miroslav Crnic	83d0469c7f	SharedRocksdDB: correctly export metrics	2024-02-08 19:39:00 +00:00
Miroslav Crnic	37ba9bc457	shard: support for sharing rocksdb and init LogsDB CFs	2024-02-08 17:44:03 +00:00
Miroslav Crnic	38707535e3	shuckle: support metadata replication	2024-02-07 13:57:00 +00:00
Miroslav Crnic	1dedd7d181	core: SPSC return 0 on timeout in pull	2024-01-29 17:16:05 +00:00
Miroslav Crnic	2ec1304981	core: ppoll, futex dont like negative timeouts	2024-01-29 17:00:14 +00:00
Francesco Mazzoli	9d1a31b482	Fix another signedness mismatch	2024-01-29 16:46:05 +00:00
Miroslav Crnic	e543665f8f	core: SPSC support timeout in pull	2024-01-29 16:06:31 +00:00
Francesco Mazzoli	2a326f7c5f	Fix usual signedness shenanigans 🥱	2024-01-29 16:05:19 +00:00
Francesco Mazzoli	0a6a0c8f24	Process CDC timeouts in a timely manner	2024-01-29 15:08:06 +00:00
Francesco Mazzoli	2a6feb6df5	Patch RocksDB to make it compile with clang 15.	2024-01-29 14:15:29 +00:00
Francesco Mazzoli	8c0c246348	More robust detection of file vs. device errors Just check if we're also unable to count the blocks for the disk, and if yes, assume it's a single file error. Of course there will be a time period where we will not have detected the bad disk when counting the blocks (a few minutes at most), but that's OK -- the scrubber will scrub blocks for that period, and then stop. Once <internal-repo/issues/65#issuecomment-24747> is done, we should use whatever error detection we use for migration to also distinguish between these errors.	2024-01-22 13:18:53 +00:00
Francesco Mazzoli	b6cf2b67a6	Distribute block services from shuckle This is in preparation for #44, but more immediately, to better stop writing to full block services. The previous strategy of setting a flag was flawed since once the flag was set it stayed set -- i.e. we would not remove it once files would be deleted. This consideration should just be integrated in distributing the block services.	2024-01-16 16:17:27 +00:00
Francesco Mazzoli	d569bdb494	Re-introduce thread names (they got lost in a refactor)	2024-01-11 17:32:52 +00:00
Francesco Mazzoli	8d0b97171e	Remove dead code	2024-01-11 13:03:26 +00:00
Francesco Mazzoli	c27ba8398a	Tear down all threads at once I had copied the LIFO pattern from ETD codebase, but it's not needed here given that the loop terminates gracefully and so we can coordinate explicitly if needed.	2024-01-09 16:53:23 +00:00
Francesco Mazzoli	c9bf49d387	Fix silly SPSC bug	2024-01-09 11:14:18 +00:00
Francesco Mazzoli	3097752a30	Minor tweak	2024-01-08 16:03:07 +00:00
Francesco Mazzoli	ee9e0ad0af	Remove `pthread_attr_setsigmask_np`, musl does not have it	2024-01-08 15:58:31 +00:00
Francesco Mazzoli	002b2854ec	Fix leak in `FetchedSpan`, and hopefully fix #141 .	2024-01-08 15:58:31 +00:00
Francesco Mazzoli	8075e99bb6	Graceful shard teardown See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs regarding how to terminate threads gracefully. The goal of this work was for valgrind to work correctly, which in turn was to investigate #141. It looks like I have succeeded: ==2715080== Warning: unimplemented fcntl command: 1036 ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135 ==2715080== at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==2715080== by 0x3B708E: allocate (new_allocator.h:121) ==2715080== by 0x3B708E: allocate (allocator.h:173) ==2715080== by 0x3B708E: allocate (alloc_traits.h:460) ==2715080== by 0x3B708E: _M_allocate (stl_vector.h:346) ==2715080== by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635) ==2715080== by 0x42BF1C: resize (stl_vector.h:940) ==2715080== by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921) ==2715080== by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034) ==2715080== by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in, char, unsigned long) (shard/Shard.cpp:347) ==2715080== by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405) ==2715080== by 0x40B1E8: run (core/Loop.cpp:67) ==2715080== by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37) ==2715080== by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so) ==2715080== by 0x4D005E2: clone (in /usr/lib/libc-2.33.so) ==2715080== ==2715080== ==2715080== Exit program on first error (--exit-on-first-error=yes)	2024-01-08 15:41:22 +00:00
Francesco Mazzoli	1963714c0f	Remove avoidable stat in collect directories	2023-12-15 21:20:05 +00:00
Francesco Mazzoli	898b85ad9c	Tweak GC parameters We're almost in a steady state, no need to overwhelm the shards.	2023-12-11 15:04:41 +00:00

1 2 3 4

156 Commits