So that we can reliably ignore stale block services in GC (done in
a future commit). To enable this and future-proof this kind of
mechanism (e.g. having `eggsblocks` to mark something as D itself)
I added a new way to register the block service that lets you mask
which flags you're checking. I'll remove the old way once we've
rolled out everywhere.
Just check if we're also unable to count the blocks for the disk,
and if yes, assume it's a single file error.
Of course there will be a time period where we will not have detected
the bad disk when counting the blocks (a few minutes at most), but
that's OK -- the scrubber will scrub blocks for that period, and then
stop.
Once <internal-repo/issues/65#issuecomment-24747>
is done, we should use whatever error detection we use for migration
to also distinguish between these errors.
This is to save on a ton of writes as jobs stat tons of files.
It would maybe be a bit cleaner to do it in the kmod, but this is
much quicker.
Thanks to @sgrusny for the good idea.
This was triggered by a server failing hard (fsr13), without any
short term resolution (we've already replaced the mobo, we'll probably
replace the HBA). In this case GC should still run rather than
get stuck.
See comment in `msgs.go`. This would normally have required
entirely new transactions, but since we're not in production yet
I'm going to just change the schema and wipe the current FS.
This also adds in an unrelated change regarding more flexible
blacklisting, which will be required for some additional testing
I'm preparing.
Not used right now, but this way we can easily start stuffing more
data in responses.
I also split off some arguments in `NewClient`, unrelated change
(I wanted to pair the MTU with a single client, but I then realized
that it's enough to have it as some global property for now).
And hopefully reduce the likelihood of bugs. On the write end, given
that we do things less asynchronously, things might be a bit slower,
but I think the simplification is worth it for now.
Also, fix/improve a bunch of other stuff.
This is one of the two data model/protocol changes I want to perform
before going into production, the other being file atime.
Right now the kernel module does not take advantage of this, but
it's OK since I tested the rest of the code reasonably and the goal
here is to perform the protocol/data changes.
Initial version really by Pawel, but many changes in between.
Big outstanding issues:
* span cache reclamation (unbounded memory otherwise...)
* bad block service detection and workarounds
* corrupted blocks detection and workaround
Co-authored-by: Paweł Dziepak <pawel.dziepak@xtxmarkets.com>