Commit Graph

85 Commits

Author SHA1 Message Date
Miroslav Crnic
c53af171e5 make scratch file gc-able on release
* shard: support ScrapTransientFile

* scratch: scrap file on release
2025-03-18 12:49:44 +00:00
Miroslav Crnic
25b2cd965e shard: transient file deadline part of entry 2025-03-18 10:03:08 +00:00
Miroslav Crnic
6948f36bc7 shard: support multiple locations in operations 2024-12-02 09:47:48 +00:00
Miroslav Crnic
f931e3c0d5 msgs: remove ConverBlockReq/Resp 2024-12-02 08:16:44 +00:00
Miroslav Crnic
5726a2e308 shuckle: assign writable services per location + messages cleanup 2024-11-28 15:42:44 +00:00
Miroslav Crnic
637543f0a0 shard: enforce no duplicate failure domains 2024-11-25 17:57:57 +00:00
Miroslav Crnic
1a47089b3d shard: proxy read/write 2024-11-17 16:38:43 +00:00
Miroslav Crnic
5f24b43184 shuckle: support locations 2024-11-14 09:26:44 +00:00
Miroslav Crnic
75dfd723c0 shuckle: fix ClearCdcInfoReq name 2024-09-17 10:05:46 +00:00
Miroslav Crnic
b2ea95091a shuckle: support cdc replica moving across hosts 2024-09-16 17:31:47 +01:00
Miroslav Crnic
59fc480e85 shuckle: remove unused requests 2024-09-16 15:21:06 +01:00
Miroslav Crnic
8ac93a4c54 shuckle: add location for all services 2024-09-11 16:59:19 +01:00
Miroslav Crnic
9cd425d7f3 eggsblocks/kmod: add file_id to FetchBlockWithCrcReq 2024-08-22 14:11:01 +01:00
Miroslav Crnic
49bd2e6a2a eggsblocks: conversion as a separate request 2024-08-21 15:39:11 +01:00
Miroslav Crnic
73622ce637 eggsblocks: write/read from new block format with crc after page 2024-08-20 14:55:45 +01:00
Miroslav Crnic
cf40e318ec shuckle: support BlockServicesWithFlagChangeReq 2024-07-24 10:08:01 +01:00
Miroslav Crnic
a41a4b7482 shuckle: drop BlockServiceInfoWithoutFlagsLastChanged 2024-07-23 15:40:44 +01:00
Miroslav Crnic
49723653f8 shuckle: BlockServiceInfo backward compatibility
* shuckle: rename BlockServiceInfo to BlockServiceInfoWithoutFlagsLastChanged

* shuckle: handle AllBlockServices
2024-07-23 13:10:57 +01:00
Miroslav Crnic
e2bfb15c5f blockservice: add BlockFetchWithCrc 2024-07-12 14:24:37 +01:00
Miroslav Crnic
3195d39d9d stats: fully remove everywhere 2024-07-09 15:22:10 +00:00
Miroslav Crnic
f3b7ef4d94 eggsgc: destroy decommissioned blocks through shuckle 2024-07-02 09:52:20 +00:00
Miroslav Crnic
2cd15fc0be core: various protocol changes 2024-06-13 09:13:11 +01:00
Miroslav Crnic
1f145c030e shard/cdc: support snapshoting 2024-05-23 10:17:59 +01:00
Miroslav Crnic
f11b675807 shuckle: add cdc replicas to page 2024-05-22 11:57:34 +00:00
Francesco Mazzoli
6faa917c18 Add endpoint and cli util to resurrect files
Only works in the same shard, for now.
2024-05-20 12:06:15 +00:00
Miroslav Crnic
8a0ea10cde core: UDPSocketPair and use IpPort AddrsInfo everywhere
* core: UDPSocketPair and use IpPort AddrsInfo everywhere

* Refactor UDPSocketPair a bit

* ci: kmod always delete img before create

* shuckle: fix scripts/json marshal

---------

Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>
2024-05-03 11:32:07 +01:00
Francesco Mazzoli
cd8e52f8f7 Remove assertions in ShardDB
We got a crash because of it (presumably can happen if defrag
conflicts with migrate or something like that)
2024-05-01 08:13:19 +00:00
Francesco Mazzoli
d3be7bf53a Remove old-style register block service request 2024-04-22 19:20:04 +00:00
Francesco Mazzoli
f109e3542b Have eggsblocks to refresh decommissioned block services
So that we can reliably ignore stale block services in GC (done in
a future commit). To enable this and future-proof this kind of
mechanism (e.g. having `eggsblocks` to mark something as D itself)
I added a new way to register the block service that lets you mask
which flags you're checking. I'll remove the old way once we've
rolled out everywhere.
2024-04-22 18:47:54 +00:00
Miroslav Crnic
43f69b1f7e shuckle: support ClearShardInfoReq/Resp 2024-04-16 10:25:24 +01:00
Miroslav Crnic
a579b41dfc shuckle: support for MoveLeaderReq 2024-04-15 14:24:15 +01:00
Francesco Mazzoli
e42c548777 Make SwapSpans idempotent 2024-04-09 07:53:10 +01:00
Francesco Mazzoli
4dd929a798 Implement swap spans 2024-04-09 07:53:10 +01:00
Saulius Grusnys
fd9079febf Rate limited shuckle endpoint to decom blockservices 2024-03-20 15:16:00 +00:00
Francesco Mazzoli
b12cdf7507 Add replicas info to shuckle web ui 2024-03-19 15:55:18 +00:00
Francesco Mazzoli
005121bcac Spin block service cache out of ShardDB
This started being a problem since the block service update log
entry does not fit in a UDP packet (it's like 100KB). I think this
approach makes more sense anyway. See comment for `getCache()` for
gotchas.
2024-03-13 11:29:58 +00:00
Miroslav Crnic
b240de53b5 shard: distributed log implementation and shard can use it with a flag set 2024-03-12 11:02:04 +00:00
Saulius Grusnys
796e46f466 shuckle to track if blockservices have any files on them (currently t… (#177)
* shuckle to track if blockservices have any files on them (currently there is issue with transient files)
2024-02-20 08:10:51 +00:00
Miroslav Crnic
38707535e3 shuckle: support metadata replication 2024-02-07 13:57:00 +00:00
Francesco Mazzoli
8c0c246348 More robust detection of file vs. device errors
Just check if we're also unable to count the blocks for the disk,
and if yes, assume it's a single file error.

Of course there will be a time period where we will not have detected
the bad disk when counting the blocks (a few minutes at most), but
that's OK -- the scrubber will scrub blocks for that period, and then
stop.

Once <internal-repo/issues/65#issuecomment-24747>
is done, we should use whatever error detection we use for migration
to also distinguish between these errors.
2024-01-22 13:18:53 +00:00
Francesco Mazzoli
b6cf2b67a6 Distribute block services from shuckle
This is in preparation for #44, but more immediately, to better
stop writing to full block services.

The previous strategy of setting a flag was flawed since once
the flag was set it stayed set -- i.e. we would not remove it once
files would be deleted.  This consideration should just be integrated
in distributing the block services.
2024-01-16 16:17:27 +00:00
Francesco Mazzoli
788b5eed57 Fill in current block services before applying the log
It makes a lot more sense to pick outside, given that it involves
randomness. Also, this is in preparation for shuckle picking them
in a smarter way.
2023-12-09 15:20:24 +00:00
Francesco Mazzoli
53049d5779 Shard batch writes, use batch UDP syscalls
The idea is to drain the socket and do a single RocksDB WAL
write/fsync for all the write requests we have found.

The read requests are immediately executed. The reasoning here is
that currently write requests are _a lot_ slower than the read
requests because fsyncing takes ~500us on fsf1. In the future this
might change.

Since we're at it, we also use batch UDP syscalls in the CDC.

Fixes #119.
2023-12-07 14:29:07 +00:00
Francesco Mazzoli
91db9566e1 Remove option to not write out atime which is too recent
This was pretty nasty to begin with, we now do it in the client.
2023-11-23 13:28:23 +00:00
Francesco Mazzoli
b964d0632a Add option to not write out atime which is too recent
This is to save on a ton of writes as jobs stat tons of files.
It would maybe be a bit cleaner to do it in the kmod, but this is
much quicker.

Thanks to @sgrusny for the good idea.
2023-11-16 14:45:58 +00:00
Saulius Grusnys
2ce5586eb9 Periodically refresh metadata info in kmod, use two IPs for shuckle
Fixes #112.

Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>
2023-11-14 13:49:36 +00:00
Francesco Mazzoli
3bc17301d6 Switch from tuple to variant for req/resp containers
The `tuple` was for when I thought it'd be useful to leave slots
for each request, but we don't need this anymore, and now leading
up to #66 I want to be able to keep vectors of reqs/resps.
2023-11-09 19:03:37 +00:00
Francesco Mazzoli
d0126d0656 Distinguish IO errors in eggsblocks
See #115 for background.
2023-11-06 19:35:05 +00:00
Francesco Mazzoli
1ec63f9710 Implement scrubbing functionality
Fixes #32. This also involves some reworking of the block request machinery
to make it more robust and faster. The scrubbing is done assuming that
the overwhelming majority of block checking will go through.
2023-11-05 18:33:00 +00:00
Francesco Mazzoli
c529d96c88 Garbage collect zero block service files mappings.
See #91.
2023-10-21 11:41:33 +00:00