Commit Graph

71 Commits

Author SHA1 Message Date
Francesco Mazzoli
53049d5779 Shard batch writes, use batch UDP syscalls
The idea is to drain the socket and do a single RocksDB WAL
write/fsync for all the write requests we have found.

The read requests are immediately executed. The reasoning here is
that currently write requests are _a lot_ slower than the read
requests because fsyncing takes ~500us on fsf1. In the future this
might change.

Since we're at it, we also use batch UDP syscalls in the CDC.

Fixes #119.
2023-12-07 14:29:07 +00:00
Francesco Mazzoli
3eae5bbf9b Use an EMA for the in-flight CDC txns as well 2023-12-07 10:27:32 +00:00
Francesco Mazzoli
38f3d54ecd Wait forever, rather than having timeouts
The goal here is to not have constant wakeups due to timeout. Do
not attempt to clean things up nicely before termination -- just
terminate instead. We can setup a proper termination system in
the future, I first want to see if this makes a difference.

Also, change xmon to use pipes for communication, so that it can
wait without timers as well.

Also, `write` directly for logging, so that we know the logs will
make it to the file after the logging call returns (since we now
do not have the chance to flush them afterwards).
2023-12-07 10:11:19 +00:00
Francesco Mazzoli
af46ab2173 Bump CDC shard response timeout 2023-11-29 15:00:08 +00:00
Francesco Mazzoli
a52efe217b Tune CDC logging more 2023-11-29 14:40:33 +00:00
Francesco Mazzoli
e4c01e8728 Metrics + logging 2023-11-29 14:32:37 +00:00
Francesco Mazzoli
bd278ff6f6 Better metrics for shard responses in CDC 2023-11-29 13:52:44 +00:00
Francesco Mazzoli
4453083aa7 Correctly record request id when picking up transactions after restart 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
7537bbc6cf Remove useless line 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
2eab012d76 Fix bug in poll check code 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c94ece50cf Integer sanitizer stuff 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
59abb24a8e Add ceiling on max update size
We don't want it to grow without bound, but we want to maximize
throughput (we'd like for fsync to not be a factor).
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
476009381a Remove maximum enqueued requests limit
We already drop in-flight requests that we're already processing,
so I don't think this matters very much currently.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c5562c7ca3 Parallelize CDC by directory
Fixes #66.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
057be91613 rocksDBStats -> rocksDBMetrics 2023-11-09 13:38:32 +00:00
Francesco Mazzoli
c5979a9d90 Expose some RocksDB stats 2023-11-09 13:23:49 +00:00
Francesco Mazzoli
03e9510255 Align xmon's app instances and systemd services 2023-11-08 14:36:58 +00:00
Francesco Mazzoli
71556ce933 Switch to restech EggsFS rota 2023-11-03 14:23:44 +00:00
Francesco Mazzoli
64d400fcfe Insert shard/cdc metrics at more regular intervals 2023-11-03 13:49:38 +00:00
Francesco Mazzoli
654c0d4db4 Report CDC queue size in grafana 2023-11-03 13:49:32 +00:00
Francesco Mazzoli
6726fff0fe Better "innocuous error" handling in CDC 2023-10-04 18:12:15 +01:00
Francesco Mazzoli
440a78510e Add concrete quiet windows to C++ alerts
This together with the previous commits fixes #72.
2023-10-02 23:06:40 +00:00
Francesco Mazzoli
02838e228f Correct xmon app types 2023-09-28 11:53:12 +00:00
Francesco Mazzoli
77ac15af8d Allow to choose xmon env in C++ apps 2023-09-18 11:56:44 +00:00
Ivan Korostelev
7ec477ca9f CDC.cpp: minor bugfix with using optional after reset()
harmless in release builds, since optional in questino is POD and destructor is a noop
2023-08-09 10:45:43 +00:00
Francesco Mazzoli
32e2a011ee More grafana fixes 2023-08-08 09:28:07 +00:00
Francesco Mazzoli
467fcffefb A few metrics fixes 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
e2246afc53 More tweaks to event loops 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
5117ddd16e Add shard/CDC metrics 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
1922cf3c30 Factor out common looping patterns 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
93b212c665 Alert while initializing shard DB 2023-08-07 10:16:00 +00:00
Francesco Mazzoli
b370118e90 Rate limit binnable xmon requests
This involved clearly separating non-clearable and clearable alerts,
which simplifies the design and I think satisfies all our needs.
2023-08-05 23:41:10 +01:00
Francesco Mazzoli
9ef3162882 Add error count to inspect how things failed 2023-08-03 06:53:35 +00:00
Francesco Mazzoli
acf4f129f6 Fix CDC txn status tracking
This might be a resolution to #38, although I'm not sure yet.
2023-08-02 13:49:38 +00:00
Francesco Mazzoli
63e2db0889 Cap maximum number of CDC requests
No point letting huge queues build -- especially now that we
deduplicate client requests.
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
fe2ce7aa17 See comments 2023-08-01 21:17:23 +01:00
Francesco Mazzoli
a5eb12a262 Do not alert/log on innocuous shard error in CDC 2023-08-01 13:41:18 +00:00
Francesco Mazzoli
e851457c52 Do not re-insert requests in C++ xmon code
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
7dceb5fda5 More alerts shenanigans 2023-07-27 15:51:15 +00:00
Francesco Mazzoli
a01b1f036d More alert-related fixes 2023-07-27 13:54:51 +00:00
Francesco Mazzoli
6a52a961eb Split CDC timings to distinguish queue time from exec time 2023-07-27 13:14:12 +00:00
Francesco Mazzoli
889c04766f Do not bump req ids when retrying requests in the CDC
Fixes #29.

The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
f797663d8c Transient alerts for EPERM errors on sendto 2023-07-27 07:31:34 +00:00
Francesco Mazzoli
bf447408a6 Actually wait for things to finish terminating before reaping next one
Fixes #27. This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli
dd39466daa Insert CDC stats on shutdown 2023-07-26 20:41:35 +00:00
Francesco Mazzoli
999d2df52b Do not alert for missing CDC request
This is totally normal if the CDC is restarted with queued
transactions.
2023-07-26 19:38:17 +00:00
Francesco Mazzoli
45b2618296 Temporarily put a stop to alert spam 2023-07-26 19:33:00 +00:00
Francesco Mazzoli
b0ff28dc44 Do not alert on error which can happen naturally in GC 2023-07-26 19:28:44 +00:00
Francesco Mazzoli
0fc80dfe0f Remove additional CDC status fields
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
60554ec58d Have bigger histograms, remove other metrics entirely
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00