Commit Graph

46 Commits

Author SHA1 Message Date
Copilot
045e9adb8a cdc: Fix various RenameDirectory issues
RenameDirectory state machine was not handling target not found correctly.
This would have caused asserts (which result in crashes in production builds)
There was also a bug in the rollback logic which would have caused a lingering
lock on the source link. While breaking assumptions this was a benign bug as
any operation on that directory would try and succeed acquiring this lock again.
It would succeed as lock requests are idempotent.
2025-11-13 15:09:34 +00:00
Miroslav Crnic
b110a7cb38 cdc: ignore unknown tags 2025-11-12 13:18:53 +00:00
Joshua Leahy
7a4e466ac6 Make TernFS open source 2025-09-17 18:20:23 +01:00
Francesco Mazzoli
110705db8d EggsFS -> TernFS rename
Things not done because probably disruptive:

* kmod filesystem string
* sysctl/debugfs/trace
* metrics names
* xmon instance names

Some of these might be renamed too, but starting with a relatively
safe set.
2025-09-03 09:29:53 +01:00
Francesco Mazzoli
0c25fbb497 Retry on make directory if mtime is too recent 2025-08-13 10:28:22 +00:00
Miroslav Crnic
5b924fb272 cdc: log soft unlink edge lock error 2025-06-06 09:40:26 +00:00
Miroslav Crnic
25b2cd965e shard: transient file deadline part of entry 2025-03-18 10:03:08 +00:00
Miroslav Crnic
1f70e33119 cdc: version init fixes 2024-12-06 19:16:09 +00:00
Miroslav Crnic
8d3c593022 cdc: acquire target lock for soft unlink dir to avoid race with gc 2024-08-19 13:24:52 +00:00
Francesco Mazzoli
d92265d1ce Better assertion still 2024-08-15 13:55:28 +00:00
Francesco Mazzoli
2d79d7156f More informative assertion 2024-08-15 13:44:37 +00:00
Miroslav Crnic
78baed62a5 cdc: request checkpoints from shard and push through log 2024-06-13 16:24:22 +01:00
Miroslav Crnic
9d06deeedc cdc: error part of shard response 2024-06-13 13:00:43 +01:00
Miroslav Crnic
2cd15fc0be core: various protocol changes 2024-06-13 09:13:11 +01:00
Miroslav Crnic
f5e17dace5 cdc: add LogsDB
* cdc: pack req/resp into log entries and apply

* shard: drop support for unused incomming packet drop

* cdc: add logsdb
2024-05-14 12:50:17 +01:00
Francesco Mazzoli
eb766f2fb5 Do not attempt to cross-shard unlink file if the file is a directory 2024-04-09 11:43:03 +00:00
Miroslav Crnic
409b126e4b cdc: use SharedRocksDB 2024-04-05 23:22:39 +01:00
Francesco Mazzoli
cd23deaf19 Accept DIRECTORY_NOT_FOUND in SOFT_UNLINK_DIRECTORY
Nothing is preventing a non-existant inode to be sent in that request.
2024-01-18 12:00:43 +00:00
Francesco Mazzoli
c80c6269d9 Remove spurious MsgsGen.hpp includes 2024-01-11 16:05:34 +00:00
Francesco Mazzoli
8075e99bb6 Graceful shard teardown
See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs
regarding how to terminate threads gracefully.

The goal of this work was for valgrind to work correctly, which in turn
was to investigate #141. It looks like I have succeeded:

    ==2715080== Warning: unimplemented fcntl command: 1036
    ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135
    ==2715080==    at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==2715080==    by 0x3B708E: allocate (new_allocator.h:121)
    ==2715080==    by 0x3B708E: allocate (allocator.h:173)
    ==2715080==    by 0x3B708E: allocate (alloc_traits.h:460)
    ==2715080==    by 0x3B708E: _M_allocate (stl_vector.h:346)
    ==2715080==    by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635)
    ==2715080==    by 0x42BF1C: resize (stl_vector.h:940)
    ==2715080==    by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921)
    ==2715080==    by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034)
    ==2715080==    by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in*, char*, unsigned long) (shard/Shard.cpp:347)
    ==2715080==    by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405)
    ==2715080==    by 0x40B1E8: run (core/Loop.cpp:67)
    ==2715080==    by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37)
    ==2715080==    by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so)
    ==2715080==    by 0x4D005E2: clone (in /usr/lib/libc-2.33.so)
    ==2715080==
    ==2715080==
    ==2715080== Exit program on first error (--exit-on-first-error=yes)
2024-01-08 15:41:22 +00:00
Francesco Mazzoli
38f3d54ecd Wait forever, rather than having timeouts
The goal here is to not have constant wakeups due to timeout. Do
not attempt to clean things up nicely before termination -- just
terminate instead. We can setup a proper termination system in
the future, I first want to see if this makes a difference.

Also, change xmon to use pipes for communication, so that it can
wait without timers as well.

Also, `write` directly for logging, so that we know the logs will
make it to the file after the logging call returns (since we now
do not have the chance to flush them afterwards).
2023-12-07 10:11:19 +00:00
Francesco Mazzoli
a367858684 Drop entire CF at once, rather than one-by-one
A dry run of the production upgrade using a backup revealed that
dropping them one-by-one would take ages, since before we kept every
single CDC request.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
fac014a864 Self-PR review, part 2 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
ba9424e224 Remove unordered_set
Almost certainly irrelevant, but it was bugging me
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c5562c7ca3 Parallelize CDC by directory
Fixes #66.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
ad3c969772 Push full RocksDB stats to grafana 2023-11-09 16:48:51 +00:00
Francesco Mazzoli
f70c484883 Dump RocksDB full statistics to file 2023-11-09 14:12:54 +00:00
Francesco Mazzoli
057be91613 rocksDBStats -> rocksDBMetrics 2023-11-09 13:38:32 +00:00
Francesco Mazzoli
c5979a9d90 Expose some RocksDB stats 2023-11-09 13:23:49 +00:00
Francesco Mazzoli
9e21969637 Slightly tighter error checks 2023-10-11 13:40:46 +01:00
Francesco Mazzoli
440a78510e Add concrete quiet windows to C++ alerts
This together with the previous commits fixes #72.
2023-10-02 23:06:40 +00:00
Francesco Mazzoli
59237ed673 Limit number of open RocksDB files
We got to the point where we had ~4k open SST files per shard, which
meant that we eat up all the available FDs.
2023-09-30 11:08:35 +00:00
Francesco Mazzoli
2679ee7c80 Retry RocksDB transactions if appropriate 2023-09-30 10:44:40 +00:00
Francesco Mazzoli
1d4c4abafd Correctly check that RocksDB txn succeeded
This was caught anyway by the fact that we check that the log index
is what we expect. Would have been very nasty otherwise.

The right thing to do is to check for `Status::TryAgain()` and
retry. `Status::Busy()` should never happen because we never
run transactions concurrently so far.
2023-09-30 09:51:26 +00:00
Francesco Mazzoli
acf4f129f6 Fix CDC txn status tracking
This might be a resolution to #38, although I'm not sure yet.
2023-08-02 13:49:38 +00:00
Francesco Mazzoli
6a52a961eb Split CDC timings to distinguish queue time from exec time 2023-07-27 13:14:12 +00:00
Francesco Mazzoli
889c04766f Do not bump req ids when retrying requests in the CDC
Fixes #29.

The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
0fc80dfe0f Remove additional CDC status fields
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
60554ec58d Have bigger histograms, remove other metrics entirely
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00
Francesco Mazzoli
ff9306f6e3 Add Xmon support to C++ code 2023-07-11 12:13:22 +00:00
Francesco Mazzoli
e2dcd43fea Fix bug in CreateLockedCurrentEdge logic
See comment in `msgs.go`. This would normally have required
entirely new transactions, but since we're not in production yet
I'm going to just change the schema and wipe the current FS.

This also adds in an unrelated change regarding more flexible
blacklisting, which will be required for some additional testing
I'm preparing.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli
6addbdee6a First version of kernel module
Initial version really by Pawel, but many changes in between.

Big outstanding issues:

* span cache reclamation (unbounded memory otherwise...)
* bad block service detection and workarounds
* corrupted blocks detection and workaround

Co-authored-by: Paweł Dziepak <pawel.dziepak@xtxmarkets.com>
2023-05-18 15:29:41 +00:00
Francesco Mazzoli
5bff9b8fae Many, many changes -- tests pass, but FUSE is currently not present
The main thing that's added is full RS support, but a lot of things
were rejigged along the way. The tests are still a bit lacking,
and will be augmented in future commits.
2023-03-03 16:42:22 +00:00
Francesco Mazzoli
e1b8de02dc More assorted improvements 2023-02-15 14:03:53 +00:00
Francesco Mazzoli
85889266b1 Various housekeeping while I get ready to deploy...
...most notably we now produce fully static binaries in an alpine
image.

A few assorted thoughts:

* I really like static binaries, ideally I'd like to run EggsFS
    deployments with just systemd scripts and a few binaries.

* Go already does this, which is great.

* C++ does not, which is less great.

* Linking statically against `glibc` works, but is unsupported.
    Not only stuff like NSS (which `gethostbyname` requires)
    straight up does not work, unless you build `glibc` with
    unsupported and currently apparently broken flags
    (`--enable-static-nss`), but also other stuff is subtly
    broken (I couldn't remember exactly what was broken,
    but see comments such as
    <https://github.com/haskell/haskell-language-server/issues/2431#issuecomment-985880838>).

* So we're left with alternative libcs -- the most popular being
    musl.

* The simplest way to build a C++ application using musl is to just
    build on a system where musl is already the default libc -- such
    as alpine linux.

The backtrace support is in a bit of a bad state. Exception stacktraces
work on musl, but DWARF seems to be broken on the normal release build.

Moreover, libunwind doesn't play well with musl's signal handler:
<https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler>.

Keeping it working seems to be a bit of a chore, and I'm going to revisit
it later.

In the meantime, gdb stack traces do work fine.
2023-01-29 21:41:40 +00:00
Francesco Mazzoli
9adca070ba Convert build system to cmake
Also, produce fully static binaries. This means that `gethostname`
does not work (doesn't work with static glibc unless you build it
with `--enable-static-nss`, which no distro builds glibc with).
2023-01-26 23:20:58 +00:00