ternfs-XTXMarkets

mirror of https://github.com/XTXMarkets/ternfs.git synced 2026-05-07 21:01:48 -05:00

Author	SHA1	Message	Date
Miroslav Crnic	6948f36bc7	shard: support multiple locations in operations	2024-12-02 09:47:48 +00:00
Miroslav Crnic	5726a2e308	shuckle: assign writable services per location + messages cleanup	2024-11-28 15:42:44 +00:00
Miroslav Crnic	637543f0a0	shard: enforce no duplicate failure domains	2024-11-25 17:57:57 +00:00
Miroslav Crnic	5f24b43184	shuckle: support locations	2024-11-14 09:26:44 +00:00
Miroslav Crnic	9cd425d7f3	eggsblocks/kmod: add file_id to FetchBlockWithCrcReq	2024-08-22 14:11:01 +01:00
Miroslav Crnic	73622ce637	eggsblocks: write/read from new block format with crc after page	2024-08-20 14:55:45 +01:00
Miroslav Crnic	cf40e318ec	shuckle: support BlockServicesWithFlagChangeReq	2024-07-24 10:08:01 +01:00
Miroslav Crnic	e2bfb15c5f	blockservice: add BlockFetchWithCrc	2024-07-12 14:24:37 +01:00
Miroslav Crnic	1f145c030e	shard/cdc: support snapshoting	2024-05-23 10:17:59 +01:00
Francesco Mazzoli	6faa917c18	Add endpoint and cli util to resurrect files Only works in the same shard, for now.	2024-05-20 12:06:15 +00:00
Miroslav Crnic	8a0ea10cde	core: UDPSocketPair and use IpPort AddrsInfo everywhere * core: UDPSocketPair and use IpPort AddrsInfo everywhere * Refactor UDPSocketPair a bit * ci: kmod always delete img before create * shuckle: fix scripts/json marshal --------- Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>	2024-05-03 11:32:07 +01:00
Francesco Mazzoli	cd8e52f8f7	Remove assertions in ShardDB We got a crash because of it (presumably can happen if defrag conflicts with migrate or something like that)	2024-05-01 08:13:19 +00:00
Francesco Mazzoli	f109e3542b	Have `eggsblocks` to refresh decommissioned block services So that we can reliably ignore stale block services in GC (done in a future commit). To enable this and future-proof this kind of mechanism (e.g. having `eggsblocks` to mark something as D itself) I added a new way to register the block service that lets you mask which flags you're checking. I'll remove the old way once we've rolled out everywhere.	2024-04-22 18:47:54 +00:00
Saulius Grusnys	fd9079febf	Rate limited shuckle endpoint to decom blockservices	2024-03-20 15:16:00 +00:00
Miroslav Crnic	b240de53b5	shard: distributed log implementation and shard can use it with a flag set	2024-03-12 11:02:04 +00:00
Miroslav Crnic	38707535e3	shuckle: support metadata replication	2024-02-07 13:57:00 +00:00
Francesco Mazzoli	8c0c246348	More robust detection of file vs. device errors Just check if we're also unable to count the blocks for the disk, and if yes, assume it's a single file error. Of course there will be a time period where we will not have detected the bad disk when counting the blocks (a few minutes at most), but that's OK -- the scrubber will scrub blocks for that period, and then stop. Once <internal-repo/issues/65#issuecomment-24747> is done, we should use whatever error detection we use for migration to also distinguish between these errors.	2024-01-22 13:18:53 +00:00
Francesco Mazzoli	91db9566e1	Remove option to not write out atime which is too recent This was pretty nasty to begin with, we now do it in the client.	2023-11-23 13:28:23 +00:00
Francesco Mazzoli	b964d0632a	Add option to not write out atime which is too recent This is to save on a ton of writes as jobs stat tons of files. It would maybe be a bit cleaner to do it in the kmod, but this is much quicker. Thanks to @sgrusny for the good idea.	2023-11-16 14:45:58 +00:00
Saulius Grusnys	2ce5586eb9	Periodically refresh metadata info in kmod, use two IPs for shuckle Fixes #112. Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>	2023-11-14 13:49:36 +00:00
Francesco Mazzoli	d0126d0656	Distinguish IO errors in `eggsblocks` See #115 for background.	2023-11-06 19:35:05 +00:00
Saulius Grusnys	82992b7c7d	Add request counters for shards and cdc, expose via debugfs See #71.	2023-10-24 22:11:40 +01:00
Francesco Mazzoli	b87a43a297	Continue running GC if servers are down This was triggered by a server failing hard (fsr13), without any short term resolution (we've already replaced the mobo, we'll probably replace the HBA). In this case GC should still run rather than get stuck.	2023-08-29 12:47:24 +00:00
Francesco Mazzoli	9405b64a76	Remove `ExpireTransientFile`, make future cutoff tunable Fixes #48. Also, reorganize error handling in `eggsblocks` requests, especially around write requests, which might help with #45.	2023-08-15 12:43:49 +01:00
Francesco Mazzoli	c2bd882cdc	Allow erasing blocks for decommissioned block services Otherwise GC cannot run after disposing of a broken disk. This commit also adds various safety checks regarding decommissioned block services.	2023-07-24 19:03:16 +01:00
Francesco Mazzoli	37ce3be74c	Implement `utime`-like functions Also, update atime when opening a file.	2023-07-21 06:28:48 +00:00
Francesco Mazzoli	53598c2fe9	Allow to re-open files as writing if we're already writing them This makes `cp` work	2023-07-12 12:22:40 +01:00
Francesco Mazzoli	65174341a0	Drop MM after flushing out a transient file	2023-07-12 12:22:40 +01:00
Francesco Mazzoli	1a4301a499	Simplify go span read/write code, make it work with broken block services And some other assorted changes.	2023-07-04 08:05:42 +00:00
Francesco Mazzoli	87d0e69f85	Port kmod to new FullReadDir request	2023-07-04 08:05:42 +00:00
Francesco Mazzoli	e2dcd43fea	Fix bug in CreateLockedCurrentEdge logic See comment in `msgs.go`. This would normally have required entirely new transactions, but since we're not in production yet I'm going to just change the schema and wipe the current FS. This also adds in an unrelated change regarding more flexible blacklisting, which will be required for some additional testing I'm preparing.	2023-07-04 08:05:42 +00:00
Francesco Mazzoli	e26eeaede1	Add "mtu" field to requests that benefit from it Not used right now, but this way we can easily start stuffing more data in responses. I also split off some arguments in `NewClient`, unrelated change (I wanted to pair the MTU with a single client, but I then realized that it's enough to have it as some global property for now).	2023-06-15 11:57:05 +00:00
Francesco Mazzoli	d1e02e261b	Various QOL improvements Also, try to avoid thundering herds on shuckle from CDC/shards too.	2023-06-08 11:59:09 +00:00
Francesco Mazzoli	d076941ce8	Simplify block write/fetch And hopefully reduce the likelihood of bugs. On the write end, given that we do things less asynchronously, things might be a bit slower, but I think the simplification is worth it for now. Also, fix/improve a bunch of other stuff.	2023-06-08 11:59:09 +00:00
Francesco Mazzoli	c1949a5950	Remove spaces before newlines I was getting annoyed at the red in `git diff`.	2023-06-05 12:27:49 +00:00
Francesco Mazzoli	90e8500722	Add atime field to file Right now it's always the same as mtime, but we'll add an endpoint to modify it.	2023-06-05 12:19:09 +00:00
Francesco Mazzoli	b041d14860	Add second ip/addr for CDC/shards too This is one of the two data model/protocol changes I want to perform before going into production, the other being file atime. Right now the kernel module does not take advantage of this, but it's OK since I tested the rest of the code reasonably and the goal here is to perform the protocol/data changes.	2023-06-05 12:14:14 +00:00
Francesco Mazzoli	55074b16b4	Implement fs stat 10.97.12.10:10001 29P 208T 29P 1% /home/restechprod/eggs/mnt	2023-05-29 18:49:50 +00:00
Francesco Mazzoli	7e25c1fd95	Do not crash if we can't find blocks	2023-05-29 09:52:01 +00:00
Francesco Mazzoli	cc165539c5	Use __print_symbolic, custom string functions don't work in traces	2023-05-27 23:13:10 +00:00
Francesco Mazzoli	951628ecfd	Add str functions for kmod kinds	2023-05-27 20:30:16 +00:00
Francesco Mazzoli	6addbdee6a	First version of kernel module Initial version really by Pawel, but many changes in between. Big outstanding issues: * span cache reclamation (unbounded memory otherwise...) * bad block service detection and workarounds * corrupted blocks detection and workaround Co-authored-by: Paweł Dziepak <pawel.dziepak@xtxmarkets.com>	2023-05-18 15:29:41 +00:00

42 Commits