Commit Graph

41 Commits

Author SHA1 Message Date
Francesco Mazzoli
93b212c665 Alert while initializing shard DB 2023-08-07 10:16:00 +00:00
Francesco Mazzoli
b370118e90 Rate limit binnable xmon requests
This involved clearly separating non-clearable and clearable alerts,
which simplifies the design and I think satisfies all our needs.
2023-08-05 23:41:10 +01:00
Francesco Mazzoli
9ef3162882 Add error count to inspect how things failed 2023-08-03 06:53:35 +00:00
Francesco Mazzoli
acf4f129f6 Fix CDC txn status tracking
This might be a resolution to #38, although I'm not sure yet.
2023-08-02 13:49:38 +00:00
Francesco Mazzoli
63e2db0889 Cap maximum number of CDC requests
No point letting huge queues build -- especially now that we
deduplicate client requests.
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
fe2ce7aa17 See comments 2023-08-01 21:17:23 +01:00
Francesco Mazzoli
a5eb12a262 Do not alert/log on innocuous shard error in CDC 2023-08-01 13:41:18 +00:00
Francesco Mazzoli
e851457c52 Do not re-insert requests in C++ xmon code
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
7dceb5fda5 More alerts shenanigans 2023-07-27 15:51:15 +00:00
Francesco Mazzoli
a01b1f036d More alert-related fixes 2023-07-27 13:54:51 +00:00
Francesco Mazzoli
6a52a961eb Split CDC timings to distinguish queue time from exec time 2023-07-27 13:14:12 +00:00
Francesco Mazzoli
889c04766f Do not bump req ids when retrying requests in the CDC
Fixes #29.

The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
f797663d8c Transient alerts for EPERM errors on sendto 2023-07-27 07:31:34 +00:00
Francesco Mazzoli
bf447408a6 Actually wait for things to finish terminating before reaping next one
Fixes #27. This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli
dd39466daa Insert CDC stats on shutdown 2023-07-26 20:41:35 +00:00
Francesco Mazzoli
999d2df52b Do not alert for missing CDC request
This is totally normal if the CDC is restarted with queued
transactions.
2023-07-26 19:38:17 +00:00
Francesco Mazzoli
45b2618296 Temporarily put a stop to alert spam 2023-07-26 19:33:00 +00:00
Francesco Mazzoli
b0ff28dc44 Do not alert on error which can happen naturally in GC 2023-07-26 19:28:44 +00:00
Francesco Mazzoli
0fc80dfe0f Remove additional CDC status fields
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
60554ec58d Have bigger histograms, remove other metrics entirely
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00
Francesco Mazzoli
2b1b1a1c15 Insert stats when shutting down 2023-07-17 12:27:07 +00:00
Francesco Mazzoli
3cc7310a6e Add histograms for all components in /stats 2023-07-17 08:56:09 +00:00
Francesco Mazzoli
ff9306f6e3 Add Xmon support to C++ code 2023-07-11 12:13:22 +00:00
Francesco Mazzoli
4e0e6fe8a8 Configurable CDC shard timeout
Running in valgrind seems to just not be able to process a small
FullReadDirReq in 100ms, which is a bit concerning, but I'll let
it slide for now.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli
dd78912c0c More stuff as debug 2023-06-18 12:50:05 +00:00
Francesco Mazzoli
e26eeaede1 Add "mtu" field to requests that benefit from it
Not used right now, but this way we can easily start stuffing more
data in responses.

I also split off some arguments in `NewClient`, unrelated change
(I wanted to pair the MTU with a single client, but I then realized
that it's enough to have it as some global property for now).
2023-06-15 11:57:05 +00:00
Francesco Mazzoli
d1e02e261b Various QOL improvements
Also, try to avoid thundering herds on shuckle from CDC/shards too.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
d076941ce8 Simplify block write/fetch
And hopefully reduce the likelihood of bugs. On the write end, given
that we do things less asynchronously, things might be a bit slower,
but I think the simplification is worth it for now.

Also, fix/improve a bunch of other stuff.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
b041d14860 Add second ip/addr for CDC/shards too
This is one of the two data model/protocol changes I want to perform
before going into production, the other being file atime.

Right now the kernel module does not take advantage of this, but
it's OK since I tested the rest of the code reasonably and the goal
here is to perform the protocol/data changes.
2023-06-05 12:14:14 +00:00
Francesco Mazzoli
a12a938c40 syslogify logs 2023-05-29 09:52:01 +00:00
Francesco Mazzoli
1458759534 Allow to enable shard/cdc debugging at runtime using USR2 2023-05-26 10:03:59 +00:00
Francesco Mazzoli
5bff9b8fae Many, many changes -- tests pass, but FUSE is currently not present
The main thing that's added is full RS support, but a lot of things
were rejigged along the way. The tests are still a bit lacking,
and will be augmented in future commits.
2023-03-03 16:42:22 +00:00
Francesco Mazzoli
e1b8de02dc More assorted improvements 2023-02-15 14:03:53 +00:00
Francesco Mazzoli
51860fac3a Various improveents, nothing substantial 2023-02-14 22:39:38 +00:00
Francesco Mazzoli
e580cd5fe9 Select right source address in CDC/Shard 2023-02-14 12:20:21 +00:00
Francesco Mazzoli
f42ff2b219 Get rid of backtrace machinery
It is currently very fragile, due to:

* Differing versions of compilers/DWARF version result in a variety
    of breakages in the our code which analyzes the DWARF info;

* With musl, libunwind seems to be currently unable to traverse
    beyond signal handlers, due to the DWARF information not
    being present in the signal frame.
    See <https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>.
    Note that I have not verified that the problem in the blog
    post above is indeed what we're hitting, but it seems plausible.
2023-02-01 12:00:47 +00:00
Francesco Mazzoli
4243ff71e2 Assorted fixes 2023-02-01 12:00:44 +00:00
Francesco Mazzoli
e90c1cadb8 Prompt termination in shard/cdc 2023-01-30 18:27:46 +00:00
Francesco Mazzoli
df9efa481d Keep pinging shuckle 2023-01-30 12:18:46 +00:00
Francesco Mazzoli
85889266b1 Various housekeeping while I get ready to deploy...
...most notably we now produce fully static binaries in an alpine
image.

A few assorted thoughts:

* I really like static binaries, ideally I'd like to run EggsFS
    deployments with just systemd scripts and a few binaries.

* Go already does this, which is great.

* C++ does not, which is less great.

* Linking statically against `glibc` works, but is unsupported.
    Not only stuff like NSS (which `gethostbyname` requires)
    straight up does not work, unless you build `glibc` with
    unsupported and currently apparently broken flags
    (`--enable-static-nss`), but also other stuff is subtly
    broken (I couldn't remember exactly what was broken,
    but see comments such as
    <https://github.com/haskell/haskell-language-server/issues/2431#issuecomment-985880838>).

* So we're left with alternative libcs -- the most popular being
    musl.

* The simplest way to build a C++ application using musl is to just
    build on a system where musl is already the default libc -- such
    as alpine linux.

The backtrace support is in a bit of a bad state. Exception stacktraces
work on musl, but DWARF seems to be broken on the normal release build.

Moreover, libunwind doesn't play well with musl's signal handler:
<https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>.

Keeping it working seems to be a bit of a chore, and I'm going to revisit
it later.

In the meantime, gdb stack traces do work fine.
2023-01-29 21:41:40 +00:00
Francesco Mazzoli
9adca070ba Convert build system to cmake
Also, produce fully static binaries. This means that `gethostname`
does not work (doesn't work with static glibc unless you build it
with `--enable-static-nss`, which no distro builds glibc with).
2023-01-26 23:20:58 +00:00