Commit Graph

63 Commits

Author SHA1 Message Date
Francesco Mazzoli 8075e99bb6 Graceful shard teardown
See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs
regarding how to terminate threads gracefully.

The goal of this work was for valgrind to work correctly, which in turn
was to investigate #141. It looks like I have succeeded:

    ==2715080== Warning: unimplemented fcntl command: 1036
    ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135
    ==2715080==    at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==2715080==    by 0x3B708E: allocate (new_allocator.h:121)
    ==2715080==    by 0x3B708E: allocate (allocator.h:173)
    ==2715080==    by 0x3B708E: allocate (alloc_traits.h:460)
    ==2715080==    by 0x3B708E: _M_allocate (stl_vector.h:346)
    ==2715080==    by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635)
    ==2715080==    by 0x42BF1C: resize (stl_vector.h:940)
    ==2715080==    by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921)
    ==2715080==    by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034)
    ==2715080==    by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in*, char*, unsigned long) (shard/Shard.cpp:347)
    ==2715080==    by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405)
    ==2715080==    by 0x40B1E8: run (core/Loop.cpp:67)
    ==2715080==    by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37)
    ==2715080==    by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so)
    ==2715080==    by 0x4D005E2: clone (in /usr/lib/libc-2.33.so)
    ==2715080==
    ==2715080==
    ==2715080== Exit program on first error (--exit-on-first-error=yes)
2024-01-08 15:41:22 +00:00
Francesco Mazzoli 788b5eed57 Fill in current block services before applying the log
It makes a lot more sense to pick outside, given that it involves
randomness. Also, this is in preparation for shuckle picking them
in a smarter way.
2023-12-09 15:20:24 +00:00
Francesco Mazzoli 128078988d Get rid of -parallel in GC
With separate workers it's not really needed anymore.
2023-12-08 11:51:21 +00:00
Francesco Mazzoli 5f4467d0c6 Synchronize access to in-memory block service data
This was alread an issue before, but it never surfaced so far.
Today the quants actually hit it.
2023-12-07 16:43:11 +00:00
Francesco Mazzoli 53049d5779 Shard batch writes, use batch UDP syscalls
The idea is to drain the socket and do a single RocksDB WAL
write/fsync for all the write requests we have found.

The read requests are immediately executed. The reasoning here is
that currently write requests are _a lot_ slower than the read
requests because fsyncing takes ~500us on fsf1. In the future this
might change.

Since we're at it, we also use batch UDP syscalls in the CDC.

Fixes #119.
2023-12-07 14:29:07 +00:00
Francesco Mazzoli 38f3d54ecd Wait forever, rather than having timeouts
The goal here is to not have constant wakeups due to timeout. Do
not attempt to clean things up nicely before termination -- just
terminate instead. We can setup a proper termination system in
the future, I first want to see if this makes a difference.

Also, change xmon to use pipes for communication, so that it can
wait without timers as well.

Also, `write` directly for logging, so that we know the logs will
make it to the file after the logging call returns (since we now
do not have the chance to flush them afterwards).
2023-12-07 10:11:19 +00:00
Francesco Mazzoli 91db9566e1 Remove option to not write out atime which is too recent
This was pretty nasty to begin with, we now do it in the client.
2023-11-23 13:28:23 +00:00
Francesco Mazzoli ae765b7581 Consistently check for iterator status 2023-11-16 17:12:38 +00:00
Francesco Mazzoli b964d0632a Add option to not write out atime which is too recent
This is to save on a ton of writes as jobs stat tons of files.
It would maybe be a bit cleaner to do it in the kmod, but this is
much quicker.

Thanks to @sgrusny for the good idea.
2023-11-16 14:45:58 +00:00
Francesco Mazzoli 248abb2681 Fix memory leak in shards 2023-11-15 12:20:16 +00:00
Francesco Mazzoli ad3c969772 Push full RocksDB stats to grafana 2023-11-09 16:48:51 +00:00
Francesco Mazzoli f70c484883 Dump RocksDB full statistics to file 2023-11-09 14:12:54 +00:00
Francesco Mazzoli 057be91613 rocksDBStats -> rocksDBMetrics 2023-11-09 13:38:32 +00:00
Francesco Mazzoli c5979a9d90 Expose some RocksDB stats 2023-11-09 13:23:49 +00:00
Francesco Mazzoli ef1885a4b2 Print out more info when failing because of bad proofs 2023-11-08 11:57:32 +00:00
Francesco Mazzoli 1ec63f9710 Implement scrubbing functionality
Fixes #32. This also involves some reworking of the block request machinery
to make it more robust and faster. The scrubbing is done assuming that
the overwhelming majority of block checking will go through.
2023-11-05 18:33:00 +00:00
Francesco Mazzoli 674c9f22a8 Do not crash shards when swapping blocks fails
Fixes #101
2023-10-31 08:39:32 +00:00
Francesco Mazzoli dd052b1919 Add excel spreadsheet to quickly adjust RocksDB size estimates 2023-10-26 14:32:35 +00:00
Francesco Mazzoli c529d96c88 Garbage collect zero block service files mappings.
See #91.
2023-10-21 11:41:33 +00:00
Francesco Mazzoli 83f38080de Do not return FILE_NOT_FOUND when getting spans of empty transient file 2023-10-13 21:10:44 +00:00
Francesco Mazzoli 9e21969637 Slightly tighter error checks 2023-10-11 13:40:46 +01:00
Francesco Mazzoli 03ed4f951f Alert when block proof is bad (see #89) 2023-10-10 21:37:39 +00:00
Francesco Mazzoli c461872ace Implement dir seeking. Fixes #83. 2023-10-09 22:32:38 +01:00
Francesco Mazzoli 59237ed673 Limit number of open RocksDB files
We got to the point where we had ~4k open SST files per shard, which
meant that we eat up all the available FDs.
2023-09-30 11:08:35 +00:00
Francesco Mazzoli b87a43a297 Continue running GC if servers are down
This was triggered by a server failing hard (fsr13), without any
short term resolution (we've already replaced the mobo, we'll probably
replace the HBA). In this case GC should still run rather than
get stuck.
2023-08-29 12:47:24 +00:00
Francesco Mazzoli 1cab680110 Support arbitrary span/block/... policies in kmod...
...and also update them quickly, by indexing them by (inode, tag).

Currently they only get updated on local renames though, we should
also update them when things are moved around remotely.
2023-08-22 15:01:33 +01:00
Francesco Mazzoli 6fa520c582 Always update directory modification
This fixes a bona-fide bug -- we didn't update the mtime when an
edge was unlocked + moved. However we might as well blindly always
update the mtime, even if there is no POSIX-visible change, to
be on the safe side.
2023-08-21 13:33:30 +00:00
Francesco Mazzoli b25f893403 Update estimates in ShardDB.cpp 2023-08-16 08:41:13 +00:00
Francesco Mazzoli 40f229b6f5 Add endpoint to specify which file to get the "reference" block services from
See comments for more details.
2023-08-16 08:40:47 +01:00
Francesco Mazzoli 9405b64a76 Remove ExpireTransientFile, make future cutoff tunable
Fixes #48. Also, reorganize error handling in `eggsblocks` requests,
especially around write requests, which might help with #45.
2023-08-15 12:43:49 +01:00
Francesco Mazzoli 02a2ca2a6f Wait for block services to come up before restarting the next one
This should already make #43 better.
2023-08-04 13:40:10 +00:00
Francesco Mazzoli 2a1d8a497e Update some size estimates 2023-08-02 12:22:16 +00:00
Francesco Mazzoli 15e59b8e67 More logging when closing (see #27)
It seems that we get the SIGSEGV while closing the DB.
2023-07-26 21:09:29 +00:00
Francesco Mazzoli 60554ec58d Have bigger histograms, remove other metrics entirely
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00
Francesco Mazzoli 37ce3be74c Implement utime-like functions
Also, update atime when opening a file.
2023-07-21 06:28:48 +00:00
Francesco Mazzoli d93df7ef42 Make tests pass for now 2023-07-12 12:22:40 +01:00
Francesco Mazzoli 53598c2fe9 Allow to re-open files as writing if we're already writing them
This makes `cp` work
2023-07-12 12:22:40 +01:00
Francesco Mazzoli 65174341a0 Drop MM after flushing out a transient file 2023-07-12 12:22:40 +01:00
Francesco Mazzoli ff9306f6e3 Add Xmon support to C++ code 2023-07-11 12:13:22 +00:00
Francesco Mazzoli d5fea6c08c Retry when block services are unavailable in kmod 2023-07-06 19:39:12 +01:00
Saulius Grusnys 0360ec85cf Switch cutoff time to blockservice to 1h and set the deadline in shard to 2 2023-07-06 13:28:12 +01:00
Francesco Mazzoli 1a4301a499 Simplify go span read/write code, make it work with broken block services
And some other assorted changes.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli e2dcd43fea Fix bug in CreateLockedCurrentEdge logic
See comment in `msgs.go`. This would normally have required
entirely new transactions, but since we're not in production yet
I'm going to just change the schema and wipe the current FS.

This also adds in an unrelated change regarding more flexible
blacklisting, which will be required for some additional testing
I'm preparing.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli c328cca75b Fix shard bug when returning from idempotent locked edge creation 2023-06-16 15:20:40 +00:00
Francesco Mazzoli 444ffba63f Propagate BS flags 2023-06-15 13:53:40 +00:00
Francesco Mazzoli e26eeaede1 Add "mtu" field to requests that benefit from it
Not used right now, but this way we can easily start stuffing more
data in responses.

I also split off some arguments in `NewClient`, unrelated change
(I wanted to pair the MTU with a single client, but I then realized
that it's enough to have it as some global property for now).
2023-06-15 11:57:05 +00:00
Francesco Mazzoli d4715ea11d Add flags to block services in shards 2023-06-14 14:10:16 +00:00
Francesco Mazzoli 90e8500722 Add atime field to file
Right now it's always the same as mtime, but we'll add an endpoint
to modify it.
2023-06-05 12:19:09 +00:00
Francesco Mazzoli cd86e632e2 Implement RS recovery, although it won't really be used now...
...since it only relies on block service flags, and we don't
set them right now.
2023-06-03 17:27:54 +00:00
Francesco Mazzoli f95d177c34 WIP commit...
...which I mistakely left in and I'm too lazy to fix.
2023-05-26 17:22:30 +00:00