Commit Graph

37 Commits

Author SHA1 Message Date
Francesco Mazzoli 1a4301a499 Simplify go span read/write code, make it work with broken block services
And some other assorted changes.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli 87d0e69f85 Port kmod to new FullReadDir request 2023-07-04 08:05:42 +00:00
Francesco Mazzoli f0add4d926 Remove C++ varint code, we don't use varints anymore 2023-07-04 08:05:42 +00:00
Francesco Mazzoli e2dcd43fea Fix bug in CreateLockedCurrentEdge logic
See comment in `msgs.go`. This would normally have required
entirely new transactions, but since we're not in production yet
I'm going to just change the schema and wipe the current FS.

This also adds in an unrelated change regarding more flexible
blacklisting, which will be required for some additional testing
I'm preparing.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli 0f114623f3 Just use unix nanos for eggs times
This was bugging me for a while, but the final straw was that if
one wants to use the max time (for example to look backwards when
traversing edges), you cannot trivially convert from one to the
other, since you'd overflow. So you can't (for instance) trivially
convert from eggs time to `time.Time` in go.

The main disadvantage is that we lose ~50 of the ~600 years
representable with nanoseconds. But I think that's fine.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli e26eeaede1 Add "mtu" field to requests that benefit from it
Not used right now, but this way we can easily start stuffing more
data in responses.

I also split off some arguments in `NewClient`, unrelated change
(I wanted to pair the MTU with a single client, but I then realized
that it's enough to have it as some global property for now).
2023-06-15 11:57:05 +00:00
Francesco Mazzoli d4715ea11d Add flags to block services in shards 2023-06-14 14:10:16 +00:00
Francesco Mazzoli d1e02e261b Various QOL improvements
Also, try to avoid thundering herds on shuckle from CDC/shards too.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli d076941ce8 Simplify block write/fetch
And hopefully reduce the likelihood of bugs. On the write end, given
that we do things less asynchronously, things might be a bit slower,
but I think the simplification is worth it for now.

Also, fix/improve a bunch of other stuff.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli 90e8500722 Add atime field to file
Right now it's always the same as mtime, but we'll add an endpoint
to modify it.
2023-06-05 12:19:09 +00:00
Francesco Mazzoli b041d14860 Add second ip/addr for CDC/shards too
This is one of the two data model/protocol changes I want to perform
before going into production, the other being file atime.

Right now the kernel module does not take advantage of this, but
it's OK since I tested the rest of the code reasonably and the goal
here is to perform the protocol/data changes.
2023-06-05 12:14:14 +00:00
Saulius Grusnys 3b503861e9 Switch shuckle to store data in sqlite db, add block services flags
Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>
2023-06-04 16:10:14 +00:00
Francesco Mazzoli 273fa40dda Fix USR2 logging teardown 2023-06-01 10:10:25 +00:00
Francesco Mazzoli 55074b16b4 Implement fs stat
10.97.12.10:10001       29P  208T   29P   1% /home/restechprod/eggs/mnt
2023-05-29 18:49:50 +00:00
Francesco Mazzoli 7e25c1fd95 Do not crash if we can't find blocks 2023-05-29 09:52:01 +00:00
Francesco Mazzoli a12a938c40 syslogify logs 2023-05-29 09:52:01 +00:00
Francesco Mazzoli fc0ae851c8 Typo 2023-05-26 10:14:32 +00:00
Francesco Mazzoli 1458759534 Allow to enable shard/cdc debugging at runtime using USR2 2023-05-26 10:03:59 +00:00
Francesco Mazzoli 1b83a50419 Ensure files actually are on different failure domains...
...in a very lazy way which will probably do for now.
2023-05-25 15:14:02 +00:00
Francesco Mazzoli a61fce55f8 Simple write test for block service
We can only write 3.2Gbit/s right now, so that's definitely something
to improve.
2023-05-22 13:08:22 +00:00
Francesco Mazzoli ac25c0aa3d Limit to 64 CPP expansions
Seems a more reasonable number than 256 for some reason.
2023-05-22 08:13:22 +00:00
Francesco Mazzoli 3cccb451d6 Remove unused define 2023-05-22 08:11:40 +00:00
Francesco Mazzoli 1eab8ee6cf Add versions to some RocksDB values
Only the ones where it is needed -- in some cases we can just
modify the keys (e.g. metadata stuff).

Also, come up with a sort of horrifying but more robust way
to specify the RocksDB values with the C preprocessor.
2023-05-22 08:03:01 +00:00
Francesco Mazzoli 6addbdee6a First version of kernel module
Initial version really by Pawel, but many changes in between.

Big outstanding issues:

* span cache reclamation (unbounded memory otherwise...)
* bad block service detection and workarounds
* corrupted blocks detection and workaround

Co-authored-by: Paweł Dziepak <pawel.dziepak@xtxmarkets.com>
2023-05-18 15:29:41 +00:00
Francesco Mazzoli 4ef819f4e5 Do not duplicate block services when migrating...
...also add checks so that that never happens in ShardDB
2023-03-10 16:23:30 +00:00
Francesco Mazzoli 5bff9b8fae Many, many changes -- tests pass, but FUSE is currently not present
The main thing that's added is full RS support, but a lot of things
were rejigged along the way. The tests are still a bit lacking,
and will be augmented in future commits.
2023-03-03 16:42:22 +00:00
Francesco Mazzoli e1b8de02dc More assorted improvements 2023-02-15 14:03:53 +00:00
Francesco Mazzoli 51860fac3a Various improveents, nothing substantial 2023-02-14 22:39:38 +00:00
Francesco Mazzoli 4288189766 Reorganize logs, add req/resp to CLI, add last seen to UI 2023-02-14 12:21:48 +00:00
Francesco Mazzoli 5bafbf03f6 Extend protocol to allow for double route for blocks services 2023-02-02 12:45:13 +00:00
Francesco Mazzoli 63b5b8d3f9 Start adding web UI, for now integrated with shuckle 2023-02-01 14:47:23 +00:00
Francesco Mazzoli f42ff2b219 Get rid of backtrace machinery
It is currently very fragile, due to:

* Differing versions of compilers/DWARF version result in a variety
    of breakages in the our code which analyzes the DWARF info;

* With musl, libunwind seems to be currently unable to traverse
    beyond signal handlers, due to the DWARF information not
    being present in the signal frame.
    See <https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>.
    Note that I have not verified that the problem in the blog
    post above is indeed what we're hitting, but it seems plausible.
2023-02-01 12:00:47 +00:00
Francesco Mazzoli 4243ff71e2 Assorted fixes 2023-02-01 12:00:44 +00:00
Francesco Mazzoli e90c1cadb8 Prompt termination in shard/cdc 2023-01-30 18:27:46 +00:00
Francesco Mazzoli df9efa481d Keep pinging shuckle 2023-01-30 12:18:46 +00:00
Francesco Mazzoli 85889266b1 Various housekeeping while I get ready to deploy...
...most notably we now produce fully static binaries in an alpine
image.

A few assorted thoughts:

* I really like static binaries, ideally I'd like to run EggsFS
    deployments with just systemd scripts and a few binaries.

* Go already does this, which is great.

* C++ does not, which is less great.

* Linking statically against `glibc` works, but is unsupported.
    Not only stuff like NSS (which `gethostbyname` requires)
    straight up does not work, unless you build `glibc` with
    unsupported and currently apparently broken flags
    (`--enable-static-nss`), but also other stuff is subtly
    broken (I couldn't remember exactly what was broken,
    but see comments such as
    <https://github.com/haskell/haskell-language-server/issues/2431#issuecomment-985880838>).

* So we're left with alternative libcs -- the most popular being
    musl.

* The simplest way to build a C++ application using musl is to just
    build on a system where musl is already the default libc -- such
    as alpine linux.

The backtrace support is in a bit of a bad state. Exception stacktraces
work on musl, but DWARF seems to be broken on the normal release build.

Moreover, libunwind doesn't play well with musl's signal handler:
<https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>.

Keeping it working seems to be a bit of a chore, and I'm going to revisit
it later.

In the meantime, gdb stack traces do work fine.
2023-01-29 21:41:40 +00:00
Francesco Mazzoli 9adca070ba Convert build system to cmake
Also, produce fully static binaries. This means that `gethostname`
does not work (doesn't work with static glibc unless you build it
with `--enable-static-nss`, which no distro builds glibc with).
2023-01-26 23:20:58 +00:00