Commit Graph

108 Commits

Author SHA1 Message Date
Francesco Mazzoli f979a67b04 Always set non-zero transient deadline, fixes #145. 2024-01-18 19:04:36 +00:00
Francesco Mazzoli 694e17cbc2 Add alerts for full shard queues 2024-01-16 23:11:41 +00:00
Francesco Mazzoli b6cf2b67a6 Distribute block services from shuckle
This is in preparation for #44, but more immediately, to better
stop writing to full block services.

The previous strategy of setting a flag was flawed since once
the flag was set it stayed set -- i.e. we would not remove it once
files would be deleted.  This consideration should just be integrated
in distributing the block services.
2024-01-16 16:17:27 +00:00
Francesco Mazzoli c80c6269d9 Remove spurious MsgsGen.hpp includes 2024-01-11 16:05:34 +00:00
Francesco Mazzoli c9bf49d387 Fix silly SPSC bug 2024-01-09 11:14:18 +00:00
Francesco Mazzoli 8075e99bb6 Graceful shard teardown
See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs
regarding how to terminate threads gracefully.

The goal of this work was for valgrind to work correctly, which in turn
was to investigate #141. It looks like I have succeeded:

    ==2715080== Warning: unimplemented fcntl command: 1036
    ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135
    ==2715080==    at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==2715080==    by 0x3B708E: allocate (new_allocator.h:121)
    ==2715080==    by 0x3B708E: allocate (allocator.h:173)
    ==2715080==    by 0x3B708E: allocate (alloc_traits.h:460)
    ==2715080==    by 0x3B708E: _M_allocate (stl_vector.h:346)
    ==2715080==    by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635)
    ==2715080==    by 0x42BF1C: resize (stl_vector.h:940)
    ==2715080==    by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921)
    ==2715080==    by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034)
    ==2715080==    by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in*, char*, unsigned long) (shard/Shard.cpp:347)
    ==2715080==    by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405)
    ==2715080==    by 0x40B1E8: run (core/Loop.cpp:67)
    ==2715080==    by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37)
    ==2715080==    by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so)
    ==2715080==    by 0x4D005E2: clone (in /usr/lib/libc-2.33.so)
    ==2715080==
    ==2715080==
    ==2715080== Exit program on first error (--exit-on-first-error=yes)
2024-01-08 15:41:22 +00:00
Francesco Mazzoli 01af461477 Factor out function 2023-12-15 18:30:12 +00:00
Francesco Mazzoli 27bd28ead0 Remove outdated comment 2023-12-10 08:39:17 +00:00
Francesco Mazzoli 788b5eed57 Fill in current block services before applying the log
It makes a lot more sense to pick outside, given that it involves
randomness. Also, this is in preparation for shuckle picking them
in a smarter way.
2023-12-09 15:20:24 +00:00
Francesco Mazzoli 128078988d Get rid of -parallel in GC
With separate workers it's not really needed anymore.
2023-12-08 11:51:21 +00:00
Francesco Mazzoli 5f4467d0c6 Synchronize access to in-memory block service data
This was alread an issue before, but it never surfaced so far.
Today the quants actually hit it.
2023-12-07 16:43:11 +00:00
Francesco Mazzoli 53049d5779 Shard batch writes, use batch UDP syscalls
The idea is to drain the socket and do a single RocksDB WAL
write/fsync for all the write requests we have found.

The read requests are immediately executed. The reasoning here is
that currently write requests are _a lot_ slower than the read
requests because fsyncing takes ~500us on fsf1. In the future this
might change.

Since we're at it, we also use batch UDP syscalls in the CDC.

Fixes #119.
2023-12-07 14:29:07 +00:00
Francesco Mazzoli 38f3d54ecd Wait forever, rather than having timeouts
The goal here is to not have constant wakeups due to timeout. Do
not attempt to clean things up nicely before termination -- just
terminate instead. We can setup a proper termination system in
the future, I first want to see if this makes a difference.

Also, change xmon to use pipes for communication, so that it can
wait without timers as well.

Also, `write` directly for logging, so that we know the logs will
make it to the file after the logging call returns (since we now
do not have the chance to flush them afterwards).
2023-12-07 10:11:19 +00:00
Francesco Mazzoli 91db9566e1 Remove option to not write out atime which is too recent
This was pretty nasty to begin with, we now do it in the client.
2023-11-23 13:28:23 +00:00
Francesco Mazzoli 163d7b3a4d Do not return error on TIME_TOO_RECENT
I thought we only sent it using "dontwait" for atime, but for the
normal utime calls we wait.
2023-11-16 19:08:43 +00:00
Francesco Mazzoli ae765b7581 Consistently check for iterator status 2023-11-16 17:12:38 +00:00
Francesco Mazzoli b964d0632a Add option to not write out atime which is too recent
This is to save on a ton of writes as jobs stat tons of files.
It would maybe be a bit cleaner to do it in the kmod, but this is
much quicker.

Thanks to @sgrusny for the good idea.
2023-11-16 14:45:58 +00:00
Francesco Mazzoli 248abb2681 Fix memory leak in shards 2023-11-15 12:20:16 +00:00
Francesco Mazzoli 340e7f2f37 Harmonize addr-passing, add shuckle beacon and test it in kmod 2023-11-14 13:49:36 +00:00
Francesco Mazzoli 2ad278adaa Add ubuntu image to build, use jemalloc in release build
I want to use the introspection capabilities of jemalloc, and it
should also be much faster. Preserve alpine build for go build,
it's also really useful to test inside the kmod.
2023-11-13 15:44:55 +00:00
Francesco Mazzoli ad3c969772 Push full RocksDB stats to grafana 2023-11-09 16:48:51 +00:00
Francesco Mazzoli f70c484883 Dump RocksDB full statistics to file 2023-11-09 14:12:54 +00:00
Francesco Mazzoli 057be91613 rocksDBStats -> rocksDBMetrics 2023-11-09 13:38:32 +00:00
Francesco Mazzoli c5979a9d90 Expose some RocksDB stats 2023-11-09 13:23:49 +00:00
Francesco Mazzoli 03e9510255 Align xmon's app instances and systemd services 2023-11-08 14:36:58 +00:00
Francesco Mazzoli ef1885a4b2 Print out more info when failing because of bad proofs 2023-11-08 11:57:32 +00:00
Francesco Mazzoli 4cc917a1c7 Expose shard socket buf size to grafana
As a proxy to how behind shards are.
2023-11-07 14:12:55 +00:00
Francesco Mazzoli 1ec63f9710 Implement scrubbing functionality
Fixes #32. This also involves some reworking of the block request machinery
to make it more robust and faster. The scrubbing is done assuming that
the overwhelming majority of block checking will go through.
2023-11-05 18:33:00 +00:00
Francesco Mazzoli 71556ce933 Switch to restech EggsFS rota 2023-11-03 14:23:44 +00:00
Francesco Mazzoli 64d400fcfe Insert shard/cdc metrics at more regular intervals 2023-11-03 13:49:38 +00:00
Francesco Mazzoli 674c9f22a8 Do not crash shards when swapping blocks fails
Fixes #101
2023-10-31 08:39:32 +00:00
Francesco Mazzoli dd052b1919 Add excel spreadsheet to quickly adjust RocksDB size estimates 2023-10-26 14:32:35 +00:00
Francesco Mazzoli c529d96c88 Garbage collect zero block service files mappings.
See #91.
2023-10-21 11:41:33 +00:00
Francesco Mazzoli 83f38080de Do not return FILE_NOT_FOUND when getting spans of empty transient file 2023-10-13 21:10:44 +00:00
Francesco Mazzoli 9e21969637 Slightly tighter error checks 2023-10-11 13:40:46 +01:00
Francesco Mazzoli 03ed4f951f Alert when block proof is bad (see #89) 2023-10-10 21:37:39 +00:00
Francesco Mazzoli c461872ace Implement dir seeking. Fixes #83. 2023-10-09 22:32:38 +01:00
Francesco Mazzoli 59237ed673 Limit number of open RocksDB files
We got to the point where we had ~4k open SST files per shard, which
meant that we eat up all the available FDs.
2023-09-30 11:08:35 +00:00
Francesco Mazzoli 02838e228f Correct xmon app types 2023-09-28 11:53:12 +00:00
Francesco Mazzoli 77ac15af8d Allow to choose xmon env in C++ apps 2023-09-18 11:56:44 +00:00
Francesco Mazzoli b87a43a297 Continue running GC if servers are down
This was triggered by a server failing hard (fsr13), without any
short term resolution (we've already replaced the mobo, we'll probably
replace the HBA). In this case GC should still run rather than
get stuck.
2023-08-29 12:47:24 +00:00
Francesco Mazzoli 1cab680110 Support arbitrary span/block/... policies in kmod...
...and also update them quickly, by indexing them by (inode, tag).

Currently they only get updated on local renames though, we should
also update them when things are moved around remotely.
2023-08-22 15:01:33 +01:00
Francesco Mazzoli 6fa520c582 Always update directory modification
This fixes a bona-fide bug -- we didn't update the mtime when an
edge was unlocked + moved. However we might as well blindly always
update the mtime, even if there is no POSIX-visible change, to
be on the safe side.
2023-08-21 13:33:30 +00:00
Francesco Mazzoli b25f893403 Update estimates in ShardDB.cpp 2023-08-16 08:41:13 +00:00
Francesco Mazzoli 40f229b6f5 Add endpoint to specify which file to get the "reference" block services from
See comments for more details.
2023-08-16 08:40:47 +01:00
Francesco Mazzoli 9405b64a76 Remove ExpireTransientFile, make future cutoff tunable
Fixes #48. Also, reorganize error handling in `eggsblocks` requests,
especially around write requests, which might help with #45.
2023-08-15 12:43:49 +01:00
Francesco Mazzoli e2246afc53 More tweaks to event loops 2023-08-08 09:21:35 +01:00
Francesco Mazzoli b2f28955a5 Log timings 2023-08-08 09:21:35 +01:00
Francesco Mazzoli e686222040 A bit more logging 2023-08-08 09:21:35 +01:00
Francesco Mazzoli 5117ddd16e Add shard/CDC metrics 2023-08-08 09:21:35 +01:00