Commit Graph

109 Commits

Author SHA1 Message Date
Francesco Mazzoli 8075e99bb6 Graceful shard teardown
See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs
regarding how to terminate threads gracefully.

The goal of this work was for valgrind to work correctly, which in turn
was to investigate #141. It looks like I have succeeded:

    ==2715080== Warning: unimplemented fcntl command: 1036
    ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135
    ==2715080==    at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==2715080==    by 0x3B708E: allocate (new_allocator.h:121)
    ==2715080==    by 0x3B708E: allocate (allocator.h:173)
    ==2715080==    by 0x3B708E: allocate (alloc_traits.h:460)
    ==2715080==    by 0x3B708E: _M_allocate (stl_vector.h:346)
    ==2715080==    by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635)
    ==2715080==    by 0x42BF1C: resize (stl_vector.h:940)
    ==2715080==    by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921)
    ==2715080==    by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034)
    ==2715080==    by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in*, char*, unsigned long) (shard/Shard.cpp:347)
    ==2715080==    by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405)
    ==2715080==    by 0x40B1E8: run (core/Loop.cpp:67)
    ==2715080==    by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37)
    ==2715080==    by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so)
    ==2715080==    by 0x4D005E2: clone (in /usr/lib/libc-2.33.so)
    ==2715080==
    ==2715080==
    ==2715080== Exit program on first error (--exit-on-first-error=yes)
2024-01-08 15:41:22 +00:00
Francesco Mazzoli 1963714c0f Remove avoidable stat in collect directories 2023-12-15 21:20:05 +00:00
Francesco Mazzoli 898b85ad9c Tweak GC parameters
We're almost in a steady state, no need to overwhelm the shards.
2023-12-11 15:04:41 +00:00
Francesco Mazzoli 8c172fd2e8 Tiny C++ xmon fix 2023-12-10 11:14:19 +00:00
Francesco Mazzoli 788b5eed57 Fill in current block services before applying the log
It makes a lot more sense to pick outside, given that it involves
randomness. Also, this is in preparation for shuckle picking them
in a smarter way.
2023-12-09 15:20:24 +00:00
Francesco Mazzoli 3394328000 Do not try to close xmon fd if we don't have one
Also, ignore errors if we can't close it. Fixes #134.
2023-12-09 14:50:51 +00:00
Francesco Mazzoli ab1df9137d Fix error logging when inserting stats 2023-12-08 15:57:02 +00:00
Francesco Mazzoli 53049d5779 Shard batch writes, use batch UDP syscalls
The idea is to drain the socket and do a single RocksDB WAL
write/fsync for all the write requests we have found.

The read requests are immediately executed. The reasoning here is
that currently write requests are _a lot_ slower than the read
requests because fsyncing takes ~500us on fsf1. In the future this
might change.

Since we're at it, we also use batch UDP syscalls in the CDC.

Fixes #119.
2023-12-07 14:29:07 +00:00
Francesco Mazzoli 38f3d54ecd Wait forever, rather than having timeouts
The goal here is to not have constant wakeups due to timeout. Do
not attempt to clean things up nicely before termination -- just
terminate instead. We can setup a proper termination system in
the future, I first want to see if this makes a difference.

Also, change xmon to use pipes for communication, so that it can
wait without timers as well.

Also, `write` directly for logging, so that we know the logs will
make it to the file after the logging call returns (since we now
do not have the chance to flush them afterwards).
2023-12-07 10:11:19 +00:00
Francesco Mazzoli 91db9566e1 Remove option to not write out atime which is too recent
This was pretty nasty to begin with, we now do it in the client.
2023-11-23 13:28:23 +00:00
Francesco Mazzoli bcf75d5308 Shut up sanitizer 2023-11-21 17:03:05 +00:00
Francesco Mazzoli 1fca8b84cd Fix type signature 2023-11-17 22:48:31 +00:00
Francesco Mazzoli b964d0632a Add option to not write out atime which is too recent
This is to save on a ton of writes as jobs stat tons of files.
It would maybe be a bit cleaner to do it in the kmod, but this is
much quicker.

Thanks to @sgrusny for the good idea.
2023-11-16 14:45:58 +00:00
Saulius Grusnys 2ce5586eb9 Periodically refresh metadata info in kmod, use two IPs for shuckle
Fixes #112.

Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>
2023-11-14 13:49:36 +00:00
Francesco Mazzoli 3bc17301d6 Switch from tuple to variant for req/resp containers
The `tuple` was for when I thought it'd be useful to leave slots
for each request, but we don't need this anymore, and now leading
up to #66 I want to be able to keep vectors of reqs/resps.
2023-11-09 19:03:37 +00:00
Francesco Mazzoli ad3c969772 Push full RocksDB stats to grafana 2023-11-09 16:48:51 +00:00
Francesco Mazzoli 057be91613 rocksDBStats -> rocksDBMetrics 2023-11-09 13:38:32 +00:00
Francesco Mazzoli c5979a9d90 Expose some RocksDB stats 2023-11-09 13:23:49 +00:00
Francesco Mazzoli d0126d0656 Distinguish IO errors in eggsblocks
See #115 for background.
2023-11-06 19:35:05 +00:00
Francesco Mazzoli 1ec63f9710 Implement scrubbing functionality
Fixes #32. This also involves some reworking of the block request machinery
to make it more robust and faster. The scrubbing is done assuming that
the overwhelming majority of block checking will go through.
2023-11-05 18:33:00 +00:00
Francesco Mazzoli 71556ce933 Switch to restech EggsFS rota 2023-11-03 14:23:44 +00:00
Francesco Mazzoli 64d400fcfe Insert shard/cdc metrics at more regular intervals 2023-11-03 13:49:38 +00:00
Francesco Mazzoli 654c0d4db4 Report CDC queue size in grafana 2023-11-03 13:49:32 +00:00
Francesco Mazzoli c529d96c88 Garbage collect zero block service files mappings.
See #91.
2023-10-21 11:41:33 +00:00
Francesco Mazzoli 24d1588b21 Add quiet window for C++ alerts, too 2023-10-02 23:02:45 +00:00
Francesco Mazzoli 2679ee7c80 Retry RocksDB transactions if appropriate 2023-09-30 10:44:40 +00:00
Francesco Mazzoli 02838e228f Correct xmon app types 2023-09-28 11:53:12 +00:00
Francesco Mazzoli b87a43a297 Continue running GC if servers are down
This was triggered by a server failing hard (fsr13), without any
short term resolution (we've already replaced the mobo, we'll probably
replace the HBA). In this case GC should still run rather than
get stuck.
2023-08-29 12:47:24 +00:00
Francesco Mazzoli 40f229b6f5 Add endpoint to specify which file to get the "reference" block services from
See comments for more details.
2023-08-16 08:40:47 +01:00
Francesco Mazzoli 9405b64a76 Remove ExpireTransientFile, make future cutoff tunable
Fixes #48. Also, reorganize error handling in `eggsblocks` requests,
especially around write requests, which might help with #45.
2023-08-15 12:43:49 +01:00
Francesco Mazzoli a5dbe189e3 Add some block services metrics 2023-08-08 11:48:35 +00:00
Francesco Mazzoli 467fcffefb A few metrics fixes 2023-08-08 09:21:35 +01:00
Francesco Mazzoli e2246afc53 More tweaks to event loops 2023-08-08 09:21:35 +01:00
Francesco Mazzoli 5117ddd16e Add shard/CDC metrics 2023-08-08 09:21:35 +01:00
Francesco Mazzoli 1922cf3c30 Factor out common looping patterns 2023-08-08 09:21:35 +01:00
Francesco Mazzoli 63ed6a90fa Reconnect to xmon on expired heartbeat 2023-08-07 10:06:40 +00:00
Francesco Mazzoli b370118e90 Rate limit binnable xmon requests
This involved clearly separating non-clearable and clearable alerts,
which simplifies the design and I think satisfies all our needs.
2023-08-05 23:41:10 +01:00
Francesco Mazzoli 02a2ca2a6f Wait for block services to come up before restarting the next one
This should already make #43 better.
2023-08-04 13:40:10 +00:00
Francesco Mazzoli 18b2397842 Some Timings.hpp functions 2023-08-03 23:41:11 +00:00
Francesco Mazzoli 698794ac44 Fix bad indexing in Timings.hpp 2023-08-03 21:06:49 +00:00
Francesco Mazzoli ca987ed205 Fix UB in bincode
I `memcpy` a zero-sized string into `NULL`. UBsan rightfully
complains.
2023-08-03 09:58:53 +00:00
Francesco Mazzoli 9ef3162882 Add error count to inspect how things failed 2023-08-03 06:53:35 +00:00
Francesco Mazzoli 5146a80c2d Use homegrown Xmon
I got annoyed at the old lib dropping requests when queue gets
full, I could probably fix but this is almost certainly quicker.
2023-07-30 11:16:35 +00:00
Francesco Mazzoli e851457c52 Do not re-insert requests in C++ xmon code
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli 8e9f4f3d8b Never die because of bad Xmon
It will alert if we're disconnected anyway, and when restarting
everything this causes crashes.
2023-07-28 08:08:03 +00:00
Francesco Mazzoli 7dceb5fda5 More alerts shenanigans 2023-07-27 15:51:15 +00:00
Francesco Mazzoli 889c04766f Do not bump req ids when retrying requests in the CDC
Fixes #29.

The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli f797663d8c Transient alerts for EPERM errors on sendto 2023-07-27 07:31:34 +00:00
Francesco Mazzoli bf447408a6 Actually wait for things to finish terminating before reaping next one
Fixes #27. This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli 0fc80dfe0f Remove additional CDC status fields
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00