Commit Graph

25 Commits

Author SHA1 Message Date
Miroslav Crnic
8a0ea10cde core: UDPSocketPair and use IpPort AddrsInfo everywhere
* core: UDPSocketPair and use IpPort AddrsInfo everywhere

* Refactor UDPSocketPair a bit

* ci: kmod always delete img before create

* shuckle: fix scripts/json marshal

---------

Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>
2024-05-03 11:32:07 +01:00
Francesco Mazzoli
f109e3542b Have eggsblocks to refresh decommissioned block services
So that we can reliably ignore stale block services in GC (done in
a future commit). To enable this and future-proof this kind of
mechanism (e.g. having `eggsblocks` to mark something as D itself)
I added a new way to register the block service that lets you mask
which flags you're checking. I'll remove the old way once we've
rolled out everywhere.
2024-04-22 18:47:54 +00:00
Francesco Mazzoli
20e7635d75 Clear data when request fails in Shuckle.cpp 2024-04-10 10:39:30 +00:00
Miroslav Crnic
30ee029f7e shuckle: make requests interruptable and pass timeout to all operations
This means that they'll be interrupted at shutdown, rather than holding everything up when shuckle is overloaded.
We also detect idle connection or slow transmitting data.
2024-04-02 18:15:29 +01:00
Francesco Mazzoli
9bc7e209e4 Safer ShuckleSock 2024-03-20 11:33:39 +00:00
Francesco Mazzoli
005121bcac Spin block service cache out of ShardDB
This started being a problem since the block service update log
entry does not fit in a UDP packet (it's like 100KB). I think this
approach makes more sense anyway. See comment for `getCache()` for
gotchas.
2024-03-13 11:29:58 +00:00
Miroslav Crnic
b240de53b5 shard: distributed log implementation and shard can use it with a flag set 2024-03-12 11:02:04 +00:00
Saulius Grusnys
796e46f466 shuckle to track if blockservices have any files on them (currently t… (#177)
* shuckle to track if blockservices have any files on them (currently there is issue with transient files)
2024-02-20 08:10:51 +00:00
Francesco Mazzoli
b6cf2b67a6 Distribute block services from shuckle
This is in preparation for #44, but more immediately, to better
stop writing to full block services.

The previous strategy of setting a flag was flawed since once
the flag was set it stayed set -- i.e. we would not remove it once
files would be deleted.  This consideration should just be integrated
in distributing the block services.
2024-01-16 16:17:27 +00:00
Francesco Mazzoli
7dceb5fda5 More alerts shenanigans 2023-07-27 15:51:15 +00:00
Francesco Mazzoli
0fc80dfe0f Remove additional CDC status fields
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
d918df0fcc Correctly return errors when failing to connect
Triggered by investigating

    xmon: could not read message type: unexpected EOF, will reconnect
    xmon: connected to xmon REDACTED
    Undertaker: hard abort - running abort handlers
    Uncaught exception thrown: SyscallException(Xmon.cpp@186, 9/EBADF=Bad file descriptor in void Xmon::run()): setsockopt

which caused crashes in shards/CDC.
2023-07-26 12:59:40 +00:00
Francesco Mazzoli
441eebb514 Do not crash on bad shuckle response 2023-07-20 12:46:38 +00:00
Francesco Mazzoli
3cc7310a6e Add histograms for all components in /stats 2023-07-17 08:56:09 +00:00
Francesco Mazzoli
ff9306f6e3 Add Xmon support to C++ code 2023-07-11 12:13:22 +00:00
Francesco Mazzoli
d076941ce8 Simplify block write/fetch
And hopefully reduce the likelihood of bugs. On the write end, given
that we do things less asynchronously, things might be a bit slower,
but I think the simplification is worth it for now.

Also, fix/improve a bunch of other stuff.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
b041d14860 Add second ip/addr for CDC/shards too
This is one of the two data model/protocol changes I want to perform
before going into production, the other being file atime.

Right now the kernel module does not take advantage of this, but
it's OK since I tested the rest of the code reasonably and the goal
here is to perform the protocol/data changes.
2023-06-05 12:14:14 +00:00
Francesco Mazzoli
1b83a50419 Ensure files actually are on different failure domains...
...in a very lazy way which will probably do for now.
2023-05-25 15:14:02 +00:00
Francesco Mazzoli
e1b8de02dc More assorted improvements 2023-02-15 14:03:53 +00:00
Francesco Mazzoli
63b5b8d3f9 Start adding web UI, for now integrated with shuckle 2023-02-01 14:47:23 +00:00
Francesco Mazzoli
4243ff71e2 Assorted fixes 2023-02-01 12:00:44 +00:00
Francesco Mazzoli
e90c1cadb8 Prompt termination in shard/cdc 2023-01-30 18:27:46 +00:00
Francesco Mazzoli
df9efa481d Keep pinging shuckle 2023-01-30 12:18:46 +00:00
Francesco Mazzoli
85889266b1 Various housekeeping while I get ready to deploy...
...most notably we now produce fully static binaries in an alpine
image.

A few assorted thoughts:

* I really like static binaries, ideally I'd like to run EggsFS
    deployments with just systemd scripts and a few binaries.

* Go already does this, which is great.

* C++ does not, which is less great.

* Linking statically against `glibc` works, but is unsupported.
    Not only stuff like NSS (which `gethostbyname` requires)
    straight up does not work, unless you build `glibc` with
    unsupported and currently apparently broken flags
    (`--enable-static-nss`), but also other stuff is subtly
    broken (I couldn't remember exactly what was broken,
    but see comments such as
    <https://github.com/haskell/haskell-language-server/issues/2431#issuecomment-985880838>).

* So we're left with alternative libcs -- the most popular being
    musl.

* The simplest way to build a C++ application using musl is to just
    build on a system where musl is already the default libc -- such
    as alpine linux.

The backtrace support is in a bit of a bad state. Exception stacktraces
work on musl, but DWARF seems to be broken on the normal release build.

Moreover, libunwind doesn't play well with musl's signal handler:
<https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>.

Keeping it working seems to be a bit of a chore, and I'm going to revisit
it later.

In the meantime, gdb stack traces do work fine.
2023-01-29 21:41:40 +00:00
Francesco Mazzoli
9adca070ba Convert build system to cmake
Also, produce fully static binaries. This means that `gethostname`
does not work (doesn't work with static glibc unless you build it
with `--enable-static-nss`, which no distro builds glibc with).
2023-01-26 23:20:58 +00:00