Commit Graph

265 Commits

Author SHA1 Message Date
Francesco Mazzoli bd278ff6f6 Better metrics for shard responses in CDC 2023-11-29 13:52:44 +00:00
Francesco Mazzoli 4453083aa7 Correctly record request id when picking up transactions after restart 2023-11-29 11:08:07 +00:00
Francesco Mazzoli a367858684 Drop entire CF at once, rather than one-by-one
A dry run of the production upgrade using a backup revealed that
dropping them one-by-one would take ages, since before we kept every
single CDC request.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli 7537bbc6cf Remove useless line 2023-11-29 11:08:07 +00:00
Francesco Mazzoli fac014a864 Self-PR review, part 2 2023-11-29 11:08:07 +00:00
Francesco Mazzoli ba9424e224 Remove unordered_set
Almost certainly irrelevant, but it was bugging me
2023-11-29 11:08:07 +00:00
Francesco Mazzoli 2eab012d76 Fix bug in poll check code 2023-11-29 11:08:07 +00:00
Francesco Mazzoli c94ece50cf Integer sanitizer stuff 2023-11-29 11:08:07 +00:00
Francesco Mazzoli 59abb24a8e Add ceiling on max update size
We don't want it to grow without bound, but we want to maximize
throughput (we'd like for fsync to not be a factor).
2023-11-29 11:08:07 +00:00
Francesco Mazzoli 476009381a Remove maximum enqueued requests limit
We already drop in-flight requests that we're already processing,
so I don't think this matters very much currently.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli c5562c7ca3 Parallelize CDC by directory
Fixes #66.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli e48bb98f73 Remove outdated comment
We now include all dependencies which are needed beyond basic build tools
2023-11-28 21:45:40 +00:00
Francesco Mazzoli 91db9566e1 Remove option to not write out atime which is too recent
This was pretty nasty to begin with, we now do it in the client.
2023-11-23 13:28:23 +00:00
Francesco Mazzoli bcf75d5308 Shut up sanitizer 2023-11-21 17:03:05 +00:00
Francesco Mazzoli 1fca8b84cd Fix type signature 2023-11-17 22:48:31 +00:00
Francesco Mazzoli 163d7b3a4d Do not return error on TIME_TOO_RECENT
I thought we only sent it using "dontwait" for atime, but for the
normal utime calls we wait.
2023-11-16 19:08:43 +00:00
Francesco Mazzoli ae765b7581 Consistently check for iterator status 2023-11-16 17:12:38 +00:00
Francesco Mazzoli b964d0632a Add option to not write out atime which is too recent
This is to save on a ton of writes as jobs stat tons of files.
It would maybe be a bit cleaner to do it in the kmod, but this is
much quicker.

Thanks to @sgrusny for the good idea.
2023-11-16 14:45:58 +00:00
Francesco Mazzoli 248abb2681 Fix memory leak in shards 2023-11-15 12:20:16 +00:00
Francesco Mazzoli 340e7f2f37 Harmonize addr-passing, add shuckle beacon and test it in kmod 2023-11-14 13:49:36 +00:00
Saulius Grusnys 2ce5586eb9 Periodically refresh metadata info in kmod, use two IPs for shuckle
Fixes #112.

Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com>
2023-11-14 13:49:36 +00:00
Francesco Mazzoli 2ad278adaa Add ubuntu image to build, use jemalloc in release build
I want to use the introspection capabilities of jemalloc, and it
should also be much faster. Preserve alpine build for go build,
it's also really useful to test inside the kmod.
2023-11-13 15:44:55 +00:00
Francesco Mazzoli 3bc17301d6 Switch from tuple to variant for req/resp containers
The `tuple` was for when I thought it'd be useful to leave slots
for each request, but we don't need this anymore, and now leading
up to #66 I want to be able to keep vectors of reqs/resps.
2023-11-09 19:03:37 +00:00
Francesco Mazzoli ad3c969772 Push full RocksDB stats to grafana 2023-11-09 16:48:51 +00:00
Francesco Mazzoli f70c484883 Dump RocksDB full statistics to file 2023-11-09 14:12:54 +00:00
Francesco Mazzoli 057be91613 rocksDBStats -> rocksDBMetrics 2023-11-09 13:38:32 +00:00
Francesco Mazzoli c5979a9d90 Expose some RocksDB stats 2023-11-09 13:23:49 +00:00
Francesco Mazzoli 03e9510255 Align xmon's app instances and systemd services 2023-11-08 14:36:58 +00:00
Francesco Mazzoli ef1885a4b2 Print out more info when failing because of bad proofs 2023-11-08 11:57:32 +00:00
Francesco Mazzoli 4cc917a1c7 Expose shard socket buf size to grafana
As a proxy to how behind shards are.
2023-11-07 14:12:55 +00:00
Francesco Mazzoli d0126d0656 Distinguish IO errors in eggsblocks
See #115 for background.
2023-11-06 19:35:05 +00:00
Francesco Mazzoli afc4e78a62 Reduce default CDC queue size 2023-11-05 22:38:57 +00:00
Francesco Mazzoli 1ec63f9710 Implement scrubbing functionality
Fixes #32. This also involves some reworking of the block request machinery
to make it more robust and faster. The scrubbing is done assuming that
the overwhelming majority of block checking will go through.
2023-11-05 18:33:00 +00:00
Francesco Mazzoli 71556ce933 Switch to restech EggsFS rota 2023-11-03 14:23:44 +00:00
Francesco Mazzoli 64d400fcfe Insert shard/cdc metrics at more regular intervals 2023-11-03 13:49:38 +00:00
Francesco Mazzoli 654c0d4db4 Report CDC queue size in grafana 2023-11-03 13:49:32 +00:00
Francesco Mazzoli 674c9f22a8 Do not crash shards when swapping blocks fails
Fixes #101
2023-10-31 08:39:32 +00:00
Francesco Mazzoli dd052b1919 Add excel spreadsheet to quickly adjust RocksDB size estimates 2023-10-26 14:32:35 +00:00
Francesco Mazzoli c529d96c88 Garbage collect zero block service files mappings.
See #91.
2023-10-21 11:41:33 +00:00
Francesco Mazzoli 83f38080de Do not return FILE_NOT_FOUND when getting spans of empty transient file 2023-10-13 21:10:44 +00:00
Francesco Mazzoli 9e21969637 Slightly tighter error checks 2023-10-11 13:40:46 +01:00
Francesco Mazzoli 03ed4f951f Alert when block proof is bad (see #89) 2023-10-10 21:37:39 +00:00
Francesco Mazzoli c461872ace Implement dir seeking. Fixes #83. 2023-10-09 22:32:38 +01:00
Francesco Mazzoli 6726fff0fe Better "innocuous error" handling in CDC 2023-10-04 18:12:15 +01:00
Francesco Mazzoli 440a78510e Add concrete quiet windows to C++ alerts
This together with the previous commits fixes #72.
2023-10-02 23:06:40 +00:00
Francesco Mazzoli 24d1588b21 Add quiet window for C++ alerts, too 2023-10-02 23:02:45 +00:00
Francesco Mazzoli 59237ed673 Limit number of open RocksDB files
We got to the point where we had ~4k open SST files per shard, which
meant that we eat up all the available FDs.
2023-09-30 11:08:35 +00:00
Francesco Mazzoli 2679ee7c80 Retry RocksDB transactions if appropriate 2023-09-30 10:44:40 +00:00
Francesco Mazzoli 1d4c4abafd Correctly check that RocksDB txn succeeded
This was caught anyway by the fact that we check that the log index
is what we expect. Would have been very nasty otherwise.

The right thing to do is to check for `Status::TryAgain()` and
retry. `Status::Busy()` should never happen because we never
run transactions concurrently so far.
2023-09-30 09:51:26 +00:00
Francesco Mazzoli 02838e228f Correct xmon app types 2023-09-28 11:53:12 +00:00