Miroslav Crnic
88909c1f7c
kmod: kernel 5.12/5.14 support
2025-07-02 14:55:39 +01:00
Miroslav Crnic
b0013ba759
kmod: compatibility for kernel >5.10
2025-04-03 10:12:41 +00:00
Miroslav Crnic
08e22661e1
kmod: fix use after free
2025-03-24 09:34:22 +00:00
Miroslav Crnic
b409b39240
kmod: set metadata socket rcv buff to 1MB
2025-03-24 08:53:25 +00:00
Miroslav Crnic
169f712176
kmod: fix deadlock and leak race in request handling
2025-03-21 17:22:53 +00:00
Miroslav Crnic
0065108033
kmod: minor cleanup
2025-01-21 13:08:44 +00:00
Francesco Mazzoli
15040974f3
Remove commented out code
2024-06-03 10:41:17 +00:00
Saulius Grusnys
0f127e220d
switch atime to synchronous metadata call ( #268 )
...
We rely on atime being updated to not delete the files during cleanup so we have to ensure it succeeds.
2024-05-20 12:48:38 +01:00
Saulius Grusnys
6487d2cab1
retry or ignore EPERM error in metadata udp calls
2024-05-09 09:27:12 +01:00
Francesco Mazzoli
25925b4024
Fix fixup when we get ENETUNREACH
2024-04-19 08:42:11 +00:00
Francesco Mazzoli
ede20a9c6c
Handle ERESTARTSYS in a couple of places
...
Sadly it is still a possibility, due to `kernel_sendmsg` returning
it, which we cannot avoid.
2024-04-04 15:41:52 +01:00
Francesco Mazzoli
4b0dd25bdc
Correctly record attempts in eggsfs_metadata_request
...
This got lost in the `net.c` refactor, and it caused the recovery
mechanism on repeated requests to fail.
2024-03-21 14:19:54 +00:00
Francesco Mazzoli
3f4988bb32
Some more metadata debug logging
2024-03-21 14:08:59 +00:00
Francesco Mazzoli
488f096eb9
Stat files/directories speculatively on readdir
...
Also, split the timeouts for dentries and for stats. We generally
don't care if stats are out of dates, but dentries should be up
to date.
The code leaves various aspects to be desired:
* No attempt is made to only send stats when needed -- it is always
done. It might be a good idea to instead wait for the first two
stats to come back.
* Theres quite a bit of code duplication.
* It's pretty wasteful to have so many different packets for the
stats. It'd be much better to pack multiple requests and multiple
responses in single packets.
This could be done simply by allowing many requests to come
in the same packet (just one after the other would be fine),
and same for the responses. We can still use the protocol and
request id to keep track of things anyway.
2024-03-19 20:29:23 +00:00
Saulius Grusnys
682f5f5a3a
Remove interruptible functions
...
See #187 .
2024-03-11 15:33:57 +00:00
Francesco Mazzoli
752c53ced5
Do not crash when there's no shards in eggsfs info
...
This should almost never happened, but I think it did because of
an upgrade by @mcrnic which temporarily zeroed the state.
2024-02-08 20:02:48 +00:00
Francesco Mazzoli
bdbddc8ac7
Better kmod logging
2023-12-05 11:34:55 +00:00
Francesco Mazzoli
f17ddece8d
Warn when timeout is very long
...
We suspect something is off with `net.c` timeouts in #127 , this
might help debug it.
2023-11-29 15:33:05 +00:00
Francesco Mazzoli
8eae3f332b
Restore sk_data_ready before release
...
Should not matter at all, but it's good hygiene.
2023-11-29 15:29:07 +00:00
Francesco Mazzoli
2278095d13
Complete metadata req even if timeout and sock completion race
...
Also, tune the maximum amount of time we're willing to wait for
inodes to complete based on the max metadata timeout.
2023-11-29 14:15:01 +00:00
Francesco Mazzoli
0080cfcbf7
Fix unit confusion in net.c
...
We called `wait_for_request` passing in jiffies, but
`wait_for_request` thought they were ms, blowing up all timeouts
by 10x.
2023-11-29 13:25:05 +00:00
Francesco Mazzoli
fa1506bcf2
kmod logs
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
b56a6aa53b
Avoid zero addresses in kmod
2023-11-22 13:25:01 +00:00
Saulius Grusnys
1070c96571
kmod retry after ENETUNREACH, relatime, eggstest fix ( #122 )
...
* kmod retry after ENETUNREACH, relatime, eggstest fix
* include missed file
* fix retry, correct time counting in atime
2023-11-21 16:00:18 +00:00
Saulius Grusnys
2ce5586eb9
Periodically refresh metadata info in kmod, use two IPs for shuckle
...
Fixes #112 .
Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com >
2023-11-14 13:49:36 +00:00
Francesco Mazzoli
ef1885a4b2
Print out more info when failing because of bad proofs
2023-11-08 11:57:32 +00:00
Francesco Mazzoli
baf6240225
Increase CDC default timeouts
...
Informed by GC stats in production.
2023-11-05 22:39:07 +00:00
Saulius Grusnys
871a44a731
add request latency histogram data for shards and cdc ( #104 )
...
* add request timing histogram data for shards and cdc
* address review comments
* address more review comments
2023-10-31 08:39:35 +00:00
Saulius Grusnys
82992b7c7d
Add request counters for shards and cdc, expose via debugfs
...
See #71 .
2023-10-24 22:11:40 +01:00
Francesco Mazzoli
2d3ac1c2c3
Increase default CDC overall timeout to two minutes
...
This is tragic, but we seem to trip over this often. After the
queueing changes we at least should never have a queue clogged up
by repeated requests, so in theory we should be able to process
~40k mkdir/sec, if /stats is correct anyway.
2023-08-02 14:09:25 +00:00
Francesco Mazzoli
52351cd69f
Reduce "severity" of late requests dmesg
...
These are happening often, and we'll still get warnings when we
give up entirely.
2023-08-02 13:08:22 +00:00
Francesco Mazzoli
632d3492bf
Support nonlinear skb in metadata reqs
2023-07-19 21:44:17 +00:00
Francesco Mazzoli
0fef0a34ed
Cache and timeout fetch block sockets
2023-07-09 19:50:37 +00:00
Francesco Mazzoli
ca5debd8e7
Configurable timeouts
2023-07-06 19:39:12 +01:00
Francesco Mazzoli
7954d01b41
Configurable metadata reqs timeouts
2023-07-06 19:39:12 +01:00
Francesco Mazzoli
c1c535dc9a
Automatically retry when failing to to fetch blocks
...
The system is essentially oblivious to how exactly fetching failed,
and only retries once. Given that we have 4x redundancy this should
be fairly robust already though.
2023-06-14 12:26:19 +00:00
Francesco Mazzoli
583b53a111
Continue tightening various ownership structures
...
Also, start renaming static stuff taking `eggsfs` out, I get tired
typing. Various other tweaks, too.
2023-06-13 14:52:45 +00:00
Francesco Mazzoli
d1e02e261b
Various QOL improvements
...
Also, try to avoid thundering herds on shuckle from CDC/shards too.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
6df7f4b530
Improve logging when metadata reqs go wrong.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
d076941ce8
Simplify block write/fetch
...
And hopefully reduce the likelihood of bugs. On the write end, given
that we do things less asynchronously, things might be a bit slower,
but I think the simplification is worth it for now.
Also, fix/improve a bunch of other stuff.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
b041d14860
Add second ip/addr for CDC/shards too
...
This is one of the two data model/protocol changes I want to perform
before going into production, the other being file atime.
Right now the kernel module does not take advantage of this, but
it's OK since I tested the rest of the code reasonably and the goal
here is to perform the protocol/data changes.
2023-06-05 12:14:14 +00:00
Francesco Mazzoli
845e86e952
Trace eggsfs errors
2023-05-27 20:31:39 +00:00
Francesco Mazzoli
a4bc32a18f
Span drop improvements
...
We could get into situations where async droppings were scheduled
at every read.
2023-05-26 17:22:43 +00:00
Francesco Mazzoli
f95d177c34
WIP commit...
...
...which I mistakely left in and I'm too lazy to fix.
2023-05-26 17:22:30 +00:00
Francesco Mazzoli
8f32ecc1b6
Cached spans reclamation
...
Right now this is very crude (global spinlock), but reasonably simple.
We can improve if needed.
2023-05-23 16:57:29 +00:00
Francesco Mazzoli
972bb55356
A few tweaks
...
Notably, make heavy debug output a sysctl flag.
2023-05-19 10:57:26 +00:00
Francesco Mazzoli
6addbdee6a
First version of kernel module
...
Initial version really by Pawel, but many changes in between.
Big outstanding issues:
* span cache reclamation (unbounded memory otherwise...)
* bad block service detection and workarounds
* corrupted blocks detection and workaround
Co-authored-by: Paweł Dziepak <pawel.dziepak@xtxmarkets.com >
2023-05-18 15:29:41 +00:00