ternfs-XTXMarkets

mirror of https://github.com/XTXMarkets/ternfs.git synced 2026-01-23 19:39:05 -06:00

Author	SHA1	Message	Date
Miroslav Crnic	6948f36bc7	shard: support multiple locations in operations	2024-12-02 09:47:48 +00:00
Miroslav Crnic	f931e3c0d5	msgs: remove ConverBlockReq/Resp	2024-12-02 08:16:44 +00:00
Miroslav Crnic	5726a2e308	shuckle: assign writable services per location + messages cleanup	2024-11-28 15:42:44 +00:00
Miroslav Crnic	637543f0a0	shard: enforce no duplicate failure domains	2024-11-25 17:57:57 +00:00
Miroslav Crnic	1a47089b3d	shard: proxy read/write	2024-11-17 16:38:43 +00:00
Miroslav Crnic	5f24b43184	shuckle: support locations	2024-11-14 09:26:44 +00:00
Miroslav Crnic	75dfd723c0	shuckle: fix ClearCdcInfoReq name	2024-09-17 10:05:46 +00:00
Miroslav Crnic	b2ea95091a	shuckle: support cdc replica moving across hosts	2024-09-16 17:31:47 +01:00
Miroslav Crnic	59fc480e85	shuckle: remove unused requests	2024-09-16 15:21:06 +01:00
Miroslav Crnic	8ac93a4c54	shuckle: add location for all services	2024-09-11 16:59:19 +01:00
Miroslav Crnic	9cd425d7f3	eggsblocks/kmod: add file_id to FetchBlockWithCrcReq	2024-08-22 14:11:01 +01:00
Miroslav Crnic	49bd2e6a2a	eggsblocks: conversion as a separate request	2024-08-21 15:39:11 +01:00
Miroslav Crnic	73622ce637	eggsblocks: write/read from new block format with crc after page	2024-08-20 14:55:45 +01:00
Miroslav Crnic	cf40e318ec	shuckle: support BlockServicesWithFlagChangeReq	2024-07-24 10:08:01 +01:00
Miroslav Crnic	a41a4b7482	shuckle: drop BlockServiceInfoWithoutFlagsLastChanged	2024-07-23 15:40:44 +01:00
Miroslav Crnic	49723653f8	shuckle: BlockServiceInfo backward compatibility * shuckle: rename BlockServiceInfo to BlockServiceInfoWithoutFlagsLastChanged * shuckle: handle AllBlockServices	2024-07-23 13:10:57 +01:00
Miroslav Crnic	e2bfb15c5f	blockservice: add BlockFetchWithCrc	2024-07-12 14:24:37 +01:00
Miroslav Crnic	3195d39d9d	stats: fully remove everywhere	2024-07-09 15:22:10 +00:00
Miroslav Crnic	f3b7ef4d94	eggsgc: destroy decommissioned blocks through shuckle	2024-07-02 09:52:20 +00:00
Miroslav Crnic	2cd15fc0be	core: various protocol changes	2024-06-13 09:13:11 +01:00
Miroslav Crnic	1f145c030e	shard/cdc: support snapshoting	2024-05-23 10:17:59 +01:00
Miroslav Crnic	f11b675807	shuckle: add cdc replicas to page	2024-05-22 11:57:34 +00:00
Francesco Mazzoli	6faa917c18	Add endpoint and cli util to resurrect files Only works in the same shard, for now.	2024-05-20 12:06:15 +00:00
Miroslav Crnic	8a0ea10cde	core: UDPSocketPair and use IpPort AddrsInfo everywhere * core: UDPSocketPair and use IpPort AddrsInfo everywhere * Refactor UDPSocketPair a bit * ci: kmod always delete img before create * shuckle: fix scripts/json marshal --------- Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>	2024-05-03 11:32:07 +01:00
Francesco Mazzoli	cd8e52f8f7	Remove assertions in ShardDB We got a crash because of it (presumably can happen if defrag conflicts with migrate or something like that)	2024-05-01 08:13:19 +00:00
Francesco Mazzoli	d3be7bf53a	Remove old-style register block service request	2024-04-22 19:20:04 +00:00
Francesco Mazzoli	f109e3542b	Have `eggsblocks` to refresh decommissioned block services So that we can reliably ignore stale block services in GC (done in a future commit). To enable this and future-proof this kind of mechanism (e.g. having `eggsblocks` to mark something as D itself) I added a new way to register the block service that lets you mask which flags you're checking. I'll remove the old way once we've rolled out everywhere.	2024-04-22 18:47:54 +00:00
Miroslav Crnic	43f69b1f7e	shuckle: support ClearShardInfoReq/Resp	2024-04-16 10:25:24 +01:00
Miroslav Crnic	a579b41dfc	shuckle: support for MoveLeaderReq	2024-04-15 14:24:15 +01:00
Francesco Mazzoli	e42c548777	Make SwapSpans idempotent	2024-04-09 07:53:10 +01:00
Francesco Mazzoli	4dd929a798	Implement swap spans	2024-04-09 07:53:10 +01:00
Saulius Grusnys	fd9079febf	Rate limited shuckle endpoint to decom blockservices	2024-03-20 15:16:00 +00:00
Francesco Mazzoli	b12cdf7507	Add replicas info to shuckle web ui	2024-03-19 15:55:18 +00:00
Francesco Mazzoli	005121bcac	Spin block service cache out of `ShardDB` This started being a problem since the block service update log entry does not fit in a UDP packet (it's like 100KB). I think this approach makes more sense anyway. See comment for `getCache()` for gotchas.	2024-03-13 11:29:58 +00:00
Miroslav Crnic	b240de53b5	shard: distributed log implementation and shard can use it with a flag set	2024-03-12 11:02:04 +00:00
Saulius Grusnys	796e46f466	shuckle to track if blockservices have any files on them (currently t… (#177 ) * shuckle to track if blockservices have any files on them (currently there is issue with transient files)	2024-02-20 08:10:51 +00:00
Miroslav Crnic	38707535e3	shuckle: support metadata replication	2024-02-07 13:57:00 +00:00
Francesco Mazzoli	8c0c246348	More robust detection of file vs. device errors Just check if we're also unable to count the blocks for the disk, and if yes, assume it's a single file error. Of course there will be a time period where we will not have detected the bad disk when counting the blocks (a few minutes at most), but that's OK -- the scrubber will scrub blocks for that period, and then stop. Once <internal-repo/issues/65#issuecomment-24747> is done, we should use whatever error detection we use for migration to also distinguish between these errors.	2024-01-22 13:18:53 +00:00
Francesco Mazzoli	b6cf2b67a6	Distribute block services from shuckle This is in preparation for #44, but more immediately, to better stop writing to full block services. The previous strategy of setting a flag was flawed since once the flag was set it stayed set -- i.e. we would not remove it once files would be deleted. This consideration should just be integrated in distributing the block services.	2024-01-16 16:17:27 +00:00
Francesco Mazzoli	788b5eed57	Fill in current block services before applying the log It makes a lot more sense to pick outside, given that it involves randomness. Also, this is in preparation for shuckle picking them in a smarter way.	2023-12-09 15:20:24 +00:00
Francesco Mazzoli	53049d5779	Shard batch writes, use batch UDP syscalls The idea is to drain the socket and do a single RocksDB WAL write/fsync for all the write requests we have found. The read requests are immediately executed. The reasoning here is that currently write requests are _a lot_ slower than the read requests because fsyncing takes ~500us on fsf1. In the future this might change. Since we're at it, we also use batch UDP syscalls in the CDC. Fixes #119.	2023-12-07 14:29:07 +00:00
Francesco Mazzoli	91db9566e1	Remove option to not write out atime which is too recent This was pretty nasty to begin with, we now do it in the client.	2023-11-23 13:28:23 +00:00
Francesco Mazzoli	b964d0632a	Add option to not write out atime which is too recent This is to save on a ton of writes as jobs stat tons of files. It would maybe be a bit cleaner to do it in the kmod, but this is much quicker. Thanks to @sgrusny for the good idea.	2023-11-16 14:45:58 +00:00
Saulius Grusnys	2ce5586eb9	Periodically refresh metadata info in kmod, use two IPs for shuckle Fixes #112. Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>	2023-11-14 13:49:36 +00:00
Francesco Mazzoli	3bc17301d6	Switch from `tuple` to `variant` for req/resp containers The `tuple` was for when I thought it'd be useful to leave slots for each request, but we don't need this anymore, and now leading up to #66 I want to be able to keep vectors of reqs/resps.	2023-11-09 19:03:37 +00:00
Francesco Mazzoli	d0126d0656	Distinguish IO errors in `eggsblocks` See #115 for background.	2023-11-06 19:35:05 +00:00
Francesco Mazzoli	1ec63f9710	Implement scrubbing functionality Fixes #32. This also involves some reworking of the block request machinery to make it more robust and faster. The scrubbing is done assuming that the overwhelming majority of block checking will go through.	2023-11-05 18:33:00 +00:00
Francesco Mazzoli	c529d96c88	Garbage collect zero block service files mappings. See #91.	2023-10-21 11:41:33 +00:00
Francesco Mazzoli	b87a43a297	Continue running GC if servers are down This was triggered by a server failing hard (fsr13), without any short term resolution (we've already replaced the mobo, we'll probably replace the HBA). In this case GC should still run rather than get stuck.	2023-08-29 12:47:24 +00:00
Francesco Mazzoli	40f229b6f5	Add endpoint to specify which file to get the "reference" block services from See comments for more details.	2023-08-16 08:40:47 +01:00

1 2

83 Commits