Francesco Mazzoli
27bd28ead0
Remove outdated comment
2023-12-10 08:39:17 +00:00
Francesco Mazzoli
788b5eed57
Fill in current block services before applying the log
...
It makes a lot more sense to pick outside, given that it involves
randomness. Also, this is in preparation for shuckle picking them
in a smarter way.
2023-12-09 15:20:24 +00:00
Francesco Mazzoli
3394328000
Do not try to close xmon fd if we don't have one
...
Also, ignore errors if we can't close it. Fixes #134 .
2023-12-09 14:50:51 +00:00
Francesco Mazzoli
ab1df9137d
Fix error logging when inserting stats
2023-12-08 15:57:02 +00:00
Francesco Mazzoli
128078988d
Get rid of -parallel in GC
...
With separate workers it's not really needed anymore.
2023-12-08 11:51:21 +00:00
Francesco Mazzoli
5f4467d0c6
Synchronize access to in-memory block service data
...
This was alread an issue before, but it never surfaced so far.
Today the quants actually hit it.
2023-12-07 16:43:11 +00:00
Francesco Mazzoli
53049d5779
Shard batch writes, use batch UDP syscalls
...
The idea is to drain the socket and do a single RocksDB WAL
write/fsync for all the write requests we have found.
The read requests are immediately executed. The reasoning here is
that currently write requests are _a lot_ slower than the read
requests because fsyncing takes ~500us on fsf1. In the future this
might change.
Since we're at it, we also use batch UDP syscalls in the CDC.
Fixes #119 .
2023-12-07 14:29:07 +00:00
Francesco Mazzoli
3eae5bbf9b
Use an EMA for the in-flight CDC txns as well
2023-12-07 10:27:32 +00:00
Francesco Mazzoli
38f3d54ecd
Wait forever, rather than having timeouts
...
The goal here is to not have constant wakeups due to timeout. Do
not attempt to clean things up nicely before termination -- just
terminate instead. We can setup a proper termination system in
the future, I first want to see if this makes a difference.
Also, change xmon to use pipes for communication, so that it can
wait without timers as well.
Also, `write` directly for logging, so that we know the logs will
make it to the file after the logging call returns (since we now
do not have the chance to flush them afterwards).
2023-12-07 10:11:19 +00:00
Francesco Mazzoli
af46ab2173
Bump CDC shard response timeout
2023-11-29 15:00:08 +00:00
Francesco Mazzoli
a52efe217b
Tune CDC logging more
2023-11-29 14:40:33 +00:00
Francesco Mazzoli
e4c01e8728
Metrics + logging
2023-11-29 14:32:37 +00:00
Francesco Mazzoli
bd278ff6f6
Better metrics for shard responses in CDC
2023-11-29 13:52:44 +00:00
Francesco Mazzoli
4453083aa7
Correctly record request id when picking up transactions after restart
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
a367858684
Drop entire CF at once, rather than one-by-one
...
A dry run of the production upgrade using a backup revealed that
dropping them one-by-one would take ages, since before we kept every
single CDC request.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
7537bbc6cf
Remove useless line
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
fac014a864
Self-PR review, part 2
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
ba9424e224
Remove unordered_set
...
Almost certainly irrelevant, but it was bugging me
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
2eab012d76
Fix bug in poll check code
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c94ece50cf
Integer sanitizer stuff
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
59abb24a8e
Add ceiling on max update size
...
We don't want it to grow without bound, but we want to maximize
throughput (we'd like for fsync to not be a factor).
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
476009381a
Remove maximum enqueued requests limit
...
We already drop in-flight requests that we're already processing,
so I don't think this matters very much currently.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c5562c7ca3
Parallelize CDC by directory
...
Fixes #66 .
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
e48bb98f73
Remove outdated comment
...
We now include all dependencies which are needed beyond basic build tools
2023-11-28 21:45:40 +00:00
Francesco Mazzoli
91db9566e1
Remove option to not write out atime which is too recent
...
This was pretty nasty to begin with, we now do it in the client.
2023-11-23 13:28:23 +00:00
Francesco Mazzoli
bcf75d5308
Shut up sanitizer
2023-11-21 17:03:05 +00:00
Francesco Mazzoli
1fca8b84cd
Fix type signature
2023-11-17 22:48:31 +00:00
Francesco Mazzoli
163d7b3a4d
Do not return error on TIME_TOO_RECENT
...
I thought we only sent it using "dontwait" for atime, but for the
normal utime calls we wait.
2023-11-16 19:08:43 +00:00
Francesco Mazzoli
ae765b7581
Consistently check for iterator status
2023-11-16 17:12:38 +00:00
Francesco Mazzoli
b964d0632a
Add option to not write out atime which is too recent
...
This is to save on a ton of writes as jobs stat tons of files.
It would maybe be a bit cleaner to do it in the kmod, but this is
much quicker.
Thanks to @sgrusny for the good idea.
2023-11-16 14:45:58 +00:00
Francesco Mazzoli
248abb2681
Fix memory leak in shards
2023-11-15 12:20:16 +00:00
Francesco Mazzoli
340e7f2f37
Harmonize addr-passing, add shuckle beacon and test it in kmod
2023-11-14 13:49:36 +00:00
Saulius Grusnys
2ce5586eb9
Periodically refresh metadata info in kmod, use two IPs for shuckle
...
Fixes #112 .
Co-authored-by: Francesco Mazzoli <francesco.mazzoli@xtxmarkets.com >
2023-11-14 13:49:36 +00:00
Francesco Mazzoli
2ad278adaa
Add ubuntu image to build, use jemalloc in release build
...
I want to use the introspection capabilities of jemalloc, and it
should also be much faster. Preserve alpine build for go build,
it's also really useful to test inside the kmod.
2023-11-13 15:44:55 +00:00
Francesco Mazzoli
3bc17301d6
Switch from tuple to variant for req/resp containers
...
The `tuple` was for when I thought it'd be useful to leave slots
for each request, but we don't need this anymore, and now leading
up to #66 I want to be able to keep vectors of reqs/resps.
2023-11-09 19:03:37 +00:00
Francesco Mazzoli
ad3c969772
Push full RocksDB stats to grafana
2023-11-09 16:48:51 +00:00
Francesco Mazzoli
f70c484883
Dump RocksDB full statistics to file
2023-11-09 14:12:54 +00:00
Francesco Mazzoli
057be91613
rocksDBStats -> rocksDBMetrics
2023-11-09 13:38:32 +00:00
Francesco Mazzoli
c5979a9d90
Expose some RocksDB stats
2023-11-09 13:23:49 +00:00
Francesco Mazzoli
03e9510255
Align xmon's app instances and systemd services
2023-11-08 14:36:58 +00:00
Francesco Mazzoli
ef1885a4b2
Print out more info when failing because of bad proofs
2023-11-08 11:57:32 +00:00
Francesco Mazzoli
4cc917a1c7
Expose shard socket buf size to grafana
...
As a proxy to how behind shards are.
2023-11-07 14:12:55 +00:00
Francesco Mazzoli
d0126d0656
Distinguish IO errors in eggsblocks
...
See #115 for background.
2023-11-06 19:35:05 +00:00
Francesco Mazzoli
afc4e78a62
Reduce default CDC queue size
2023-11-05 22:38:57 +00:00
Francesco Mazzoli
1ec63f9710
Implement scrubbing functionality
...
Fixes #32 . This also involves some reworking of the block request machinery
to make it more robust and faster. The scrubbing is done assuming that
the overwhelming majority of block checking will go through.
2023-11-05 18:33:00 +00:00
Francesco Mazzoli
71556ce933
Switch to restech EggsFS rota
2023-11-03 14:23:44 +00:00
Francesco Mazzoli
64d400fcfe
Insert shard/cdc metrics at more regular intervals
2023-11-03 13:49:38 +00:00
Francesco Mazzoli
654c0d4db4
Report CDC queue size in grafana
2023-11-03 13:49:32 +00:00
Francesco Mazzoli
674c9f22a8
Do not crash shards when swapping blocks fails
...
Fixes #101
2023-10-31 08:39:32 +00:00
Francesco Mazzoli
dd052b1919
Add excel spreadsheet to quickly adjust RocksDB size estimates
2023-10-26 14:32:35 +00:00