Commit Graph

184 Commits

Author SHA1 Message Date
Miroslav Crnic 9e13d6b56e shard: support checkpointed responses 2024-06-13 15:39:37 +01:00
Miroslav Crnic 2cd15fc0be core: various protocol changes 2024-06-13 09:13:11 +01:00
Miroslav Crnic 6eaed4ff0e core: remove Stopped (unused) 2024-06-12 15:10:58 +00:00
Miroslav Crnic 170f2fbc61 logsdb: add stats and expose in shard/cdc 2024-06-10 16:24:49 +01:00
Miroslav Crnic 1f145c030e shard/cdc: support snapshoting 2024-05-23 10:17:59 +01:00
Miroslav Crnic 1446e4d0d2 cdc: force log cleanup after crash
Transactional db that CDC uses has a slightly
annoying property that it flushes WAL on transaction
start. As a result release point can get moved and
log records persisted even if we crash.
We want to remove them automatically for now.
2024-05-22 16:58:18 +00:00
Miroslav Crnic f11b675807 shuckle: add cdc replicas to page 2024-05-22 11:57:34 +00:00
Miroslav Crnic b524748210 cdc: use DEFAULT_UDP_MTU for serialization of entries 2024-05-21 14:23:27 +00:00
Francesco Mazzoli 6faa917c18 Add endpoint and cli util to resurrect files
Only works in the same shard, for now.
2024-05-20 12:06:15 +00:00
Miroslav Crnic 0b3348b458 SharedRocksDB: (ref) paths in constructor 2024-05-17 14:23:25 +00:00
Miroslav Crnic ab337068ad httplib: use poll instead of select 2024-05-17 14:23:01 +00:00
Miroslav Crnic f5e17dace5 cdc: add LogsDB
* cdc: pack req/resp into log entries and apply

* shard: drop support for unused incomming packet drop

* cdc: add logsdb
2024-05-14 12:50:17 +01:00
Miroslav Crnic 91d462ab0e UDPSocketPair: (fix) dont look at unused fd-s 2024-05-14 08:51:29 +00:00
Miroslav Crnic 8a0ea10cde core: UDPSocketPair and use IpPort AddrsInfo everywhere
* core: UDPSocketPair and use IpPort AddrsInfo everywhere

* Refactor UDPSocketPair a bit

* ci: kmod always delete img before create

* shuckle: fix scripts/json marshal

---------

Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>
2024-05-03 11:32:07 +01:00
Francesco Mazzoli cd8e52f8f7 Remove assertions in ShardDB
We got a crash because of it (presumably can happen if defrag
conflicts with migrate or something like that)
2024-05-01 08:13:19 +00:00
Francesco Mazzoli d3be7bf53a Remove old-style register block service request 2024-04-22 19:20:04 +00:00
Francesco Mazzoli f109e3542b Have eggsblocks to refresh decommissioned block services
So that we can reliably ignore stale block services in GC (done in
a future commit). To enable this and future-proof this kind of
mechanism (e.g. having `eggsblocks` to mark something as D itself)
I added a new way to register the block service that lets you mask
which flags you're checking. I'll remove the old way once we've
rolled out everywhere.
2024-04-22 18:47:54 +00:00
Miroslav Crnic 6007192a31 bincode: generate STATIC_SIZE for req/resp 2024-04-22 13:49:49 +01:00
Miroslav Crnic 43f69b1f7e shuckle: support ClearShardInfoReq/Resp 2024-04-16 10:25:24 +01:00
Miroslav Crnic a579b41dfc shuckle: support for MoveLeaderReq 2024-04-15 14:24:15 +01:00
Francesco Mazzoli 20e7635d75 Clear data when request fails in Shuckle.cpp 2024-04-10 10:39:30 +00:00
Francesco Mazzoli e42c548777 Make SwapSpans idempotent 2024-04-09 07:53:10 +01:00
Francesco Mazzoli 4dd929a798 Implement swap spans 2024-04-09 07:53:10 +01:00
Miroslav Crnic 409b126e4b cdc: use SharedRocksDB 2024-04-05 23:22:39 +01:00
Miroslav Crnic 0a6e4be683 shard: disable double flush and improve kmod vm 2024-04-05 17:34:42 +01:00
Miroslav Crnic de17eee24f core: fix incorrect return in connectHost 2024-04-03 15:08:48 +01:00
Miroslav Crnic 30ee029f7e shuckle: make requests interruptable and pass timeout to all operations
This means that they'll be interrupted at shutdown, rather than holding everything up when shuckle is overloaded.
We also detect idle connection or slow transmitting data.
2024-04-02 18:15:29 +01:00
Francesco Mazzoli 68c4c03750 Add command to run some checks directly in RocksDB database 2024-03-27 18:45:14 +00:00
Miroslav Crnic aebcce4017 logsdb: fix assert for last relased going backwards 2024-03-25 10:31:58 +00:00
Miroslav Crnic 7df0a5da89 shard: cli options now match migration phases for LogsDB, and support manual failover 2024-03-20 15:34:55 +00:00
Saulius Grusnys fd9079febf Rate limited shuckle endpoint to decom blockservices 2024-03-20 15:16:00 +00:00
Francesco Mazzoli 1cf299bfac Use atomics where appropriate 2024-03-20 13:21:18 +00:00
Francesco Mazzoli f85714dbba Use pthread_self() to get pthread thread id 2024-03-20 13:11:14 +00:00
Francesco Mazzoli 3a6e498664 Make some Loop methods static 2024-03-20 13:00:18 +00:00
Francesco Mazzoli 9bc7e209e4 Safer ShuckleSock 2024-03-20 11:33:39 +00:00
Francesco Mazzoli 66fe0a2621 Correct pthread_timedjoin_np handling 2024-03-20 11:13:26 +00:00
Francesco Mazzoli 8f1ba6361b Resist interruptions when joining threads 2024-03-20 10:32:42 +00:00
Francesco Mazzoli 66ccba6124 Forward termination signal to main thread 2024-03-20 10:32:42 +00:00
Francesco Mazzoli b12cdf7507 Add replicas info to shuckle web ui 2024-03-19 15:55:18 +00:00
Miroslav Crnic 938c845a30 eggsdbtool: cli for shard db comparison 2024-03-19 15:00:01 +00:00
Miroslav Crnic a4c091c7b2 logsdb: log state at flush to have consistent view 2024-03-19 12:44:56 +00:00
Miroslav Crnic 096b9cbe6a logsdb: fix for replication path 2024-03-18 17:29:49 +00:00
Miroslav Crnic dfcabdba97 LogsDB: tweak catchup timeout 2024-03-18 12:00:27 +00:00
Miroslav Crnic c8cda7e4db logsdb: periodically log status 2024-03-18 09:44:47 +00:00
Miroslav Crnic 72c1acaea8 xmon: if too many alerts initialize appType to _parent 2024-03-15 19:39:41 +00:00
Miroslav Crnic 27faaa45ae ci: add ability to run with LogsDB, shard: add handling of LogsDB messages 2024-03-15 16:49:39 +00:00
Miroslav Crnic ebcdcb650a shard: add support for resetting all data in LogsDB 2024-03-13 11:33:48 +00:00
Francesco Mazzoli 005121bcac Spin block service cache out of ShardDB
This started being a problem since the block service update log
entry does not fit in a UDP packet (it's like 100KB). I think this
approach makes more sense anyway. See comment for `getCache()` for
gotchas.
2024-03-13 11:29:58 +00:00
Francesco Mazzoli 6968c25bc5 Allow : in metrics 2024-03-12 14:04:34 +00:00
Miroslav Crnic 13c5df0131 shard: fix name in xmon and add replica id to tag in metrics 2024-03-12 13:40:35 +00:00