Commit Graph

165 Commits

Author SHA1 Message Date
Miroslav Crnic a579b41dfc shuckle: support for MoveLeaderReq 2024-04-15 14:24:15 +01:00
Francesco Mazzoli 20e7635d75 Clear data when request fails in Shuckle.cpp 2024-04-10 10:39:30 +00:00
Francesco Mazzoli e42c548777 Make SwapSpans idempotent 2024-04-09 07:53:10 +01:00
Francesco Mazzoli 4dd929a798 Implement swap spans 2024-04-09 07:53:10 +01:00
Miroslav Crnic 409b126e4b cdc: use SharedRocksDB 2024-04-05 23:22:39 +01:00
Miroslav Crnic 0a6e4be683 shard: disable double flush and improve kmod vm 2024-04-05 17:34:42 +01:00
Miroslav Crnic de17eee24f core: fix incorrect return in connectHost 2024-04-03 15:08:48 +01:00
Miroslav Crnic 30ee029f7e shuckle: make requests interruptable and pass timeout to all operations
This means that they'll be interrupted at shutdown, rather than holding everything up when shuckle is overloaded.
We also detect idle connection or slow transmitting data.
2024-04-02 18:15:29 +01:00
Francesco Mazzoli 68c4c03750 Add command to run some checks directly in RocksDB database 2024-03-27 18:45:14 +00:00
Miroslav Crnic aebcce4017 logsdb: fix assert for last relased going backwards 2024-03-25 10:31:58 +00:00
Miroslav Crnic 7df0a5da89 shard: cli options now match migration phases for LogsDB, and support manual failover 2024-03-20 15:34:55 +00:00
Saulius Grusnys fd9079febf Rate limited shuckle endpoint to decom blockservices 2024-03-20 15:16:00 +00:00
Francesco Mazzoli 1cf299bfac Use atomics where appropriate 2024-03-20 13:21:18 +00:00
Francesco Mazzoli f85714dbba Use pthread_self() to get pthread thread id 2024-03-20 13:11:14 +00:00
Francesco Mazzoli 3a6e498664 Make some Loop methods static 2024-03-20 13:00:18 +00:00
Francesco Mazzoli 9bc7e209e4 Safer ShuckleSock 2024-03-20 11:33:39 +00:00
Francesco Mazzoli 66fe0a2621 Correct pthread_timedjoin_np handling 2024-03-20 11:13:26 +00:00
Francesco Mazzoli 8f1ba6361b Resist interruptions when joining threads 2024-03-20 10:32:42 +00:00
Francesco Mazzoli 66ccba6124 Forward termination signal to main thread 2024-03-20 10:32:42 +00:00
Francesco Mazzoli b12cdf7507 Add replicas info to shuckle web ui 2024-03-19 15:55:18 +00:00
Miroslav Crnic 938c845a30 eggsdbtool: cli for shard db comparison 2024-03-19 15:00:01 +00:00
Miroslav Crnic a4c091c7b2 logsdb: log state at flush to have consistent view 2024-03-19 12:44:56 +00:00
Miroslav Crnic 096b9cbe6a logsdb: fix for replication path 2024-03-18 17:29:49 +00:00
Miroslav Crnic dfcabdba97 LogsDB: tweak catchup timeout 2024-03-18 12:00:27 +00:00
Miroslav Crnic c8cda7e4db logsdb: periodically log status 2024-03-18 09:44:47 +00:00
Miroslav Crnic 72c1acaea8 xmon: if too many alerts initialize appType to _parent 2024-03-15 19:39:41 +00:00
Miroslav Crnic 27faaa45ae ci: add ability to run with LogsDB, shard: add handling of LogsDB messages 2024-03-15 16:49:39 +00:00
Miroslav Crnic ebcdcb650a shard: add support for resetting all data in LogsDB 2024-03-13 11:33:48 +00:00
Francesco Mazzoli 005121bcac Spin block service cache out of ShardDB
This started being a problem since the block service update log
entry does not fit in a UDP packet (it's like 100KB). I think this
approach makes more sense anyway. See comment for `getCache()` for
gotchas.
2024-03-13 11:29:58 +00:00
Francesco Mazzoli 6968c25bc5 Allow : in metrics 2024-03-12 14:04:34 +00:00
Miroslav Crnic 13c5df0131 shard: fix name in xmon and add replica id to tag in metrics 2024-03-12 13:40:35 +00:00
Miroslav Crnic b240de53b5 shard: distributed log implementation and shard can use it with a flag set 2024-03-12 11:02:04 +00:00
Francesco Mazzoli 0037e8d10b Print some info about block service flags in shard 2024-03-08 09:18:54 +00:00
Miroslav Crnic 712ed8973e core: simplify implementing custom stop for Loop 2024-02-23 13:52:34 +00:00
Francesco Mazzoli 531f989a06 Correct app type for quiet alert creation 2024-02-20 14:16:52 +00:00
Francesco Mazzoli 303421763a Allow to specify rota per alert in C++ 2024-02-20 12:59:42 +00:00
Saulius Grusnys 796e46f466 shuckle to track if blockservices have any files on them (currently t… (#177)
* shuckle to track if blockservices have any files on them (currently there is issue with transient files)
2024-02-20 08:10:51 +00:00
Miroslav Crnic 83d0469c7f SharedRocksdDB: correctly export metrics 2024-02-08 19:39:00 +00:00
Miroslav Crnic 37ba9bc457 shard: support for sharing rocksdb and init LogsDB CFs 2024-02-08 17:44:03 +00:00
Miroslav Crnic 38707535e3 shuckle: support metadata replication 2024-02-07 13:57:00 +00:00
Miroslav Crnic 1dedd7d181 core: SPSC return 0 on timeout in pull 2024-01-29 17:16:05 +00:00
Miroslav Crnic 2ec1304981 core: ppoll, futex dont like negative timeouts 2024-01-29 17:00:14 +00:00
Francesco Mazzoli 9d1a31b482 Fix another signedness mismatch 2024-01-29 16:46:05 +00:00
Miroslav Crnic e543665f8f core: SPSC support timeout in pull 2024-01-29 16:06:31 +00:00
Francesco Mazzoli 2a326f7c5f Fix usual signedness shenanigans 🥱 2024-01-29 16:05:19 +00:00
Francesco Mazzoli 0a6a0c8f24 Process CDC timeouts in a timely manner 2024-01-29 15:08:06 +00:00
Francesco Mazzoli 2a6feb6df5 Patch RocksDB to make it compile with clang 15. 2024-01-29 14:15:29 +00:00
Francesco Mazzoli 8c0c246348 More robust detection of file vs. device errors
Just check if we're also unable to count the blocks for the disk,
and if yes, assume it's a single file error.

Of course there will be a time period where we will not have detected
the bad disk when counting the blocks (a few minutes at most), but
that's OK -- the scrubber will scrub blocks for that period, and then
stop.

Once <internal-repo/issues/65#issuecomment-24747>
is done, we should use whatever error detection we use for migration
to also distinguish between these errors.
2024-01-22 13:18:53 +00:00
Francesco Mazzoli b6cf2b67a6 Distribute block services from shuckle
This is in preparation for #44, but more immediately, to better
stop writing to full block services.

The previous strategy of setting a flag was flawed since once
the flag was set it stayed set -- i.e. we would not remove it once
files would be deleted.  This consideration should just be integrated
in distributing the block services.
2024-01-16 16:17:27 +00:00
Francesco Mazzoli d569bdb494 Re-introduce thread names (they got lost in a refactor) 2024-01-11 17:32:52 +00:00