Commit Graph

67 Commits

Author SHA1 Message Date
Francesco Mazzoli
a52efe217b Tune CDC logging more 2023-11-29 14:40:33 +00:00
Francesco Mazzoli
e4c01e8728 Metrics + logging 2023-11-29 14:32:37 +00:00
Francesco Mazzoli
bd278ff6f6 Better metrics for shard responses in CDC 2023-11-29 13:52:44 +00:00
Francesco Mazzoli
4453083aa7 Correctly record request id when picking up transactions after restart 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
7537bbc6cf Remove useless line 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
2eab012d76 Fix bug in poll check code 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c94ece50cf Integer sanitizer stuff 2023-11-29 11:08:07 +00:00
Francesco Mazzoli
59abb24a8e Add ceiling on max update size
We don't want it to grow without bound, but we want to maximize
throughput (we'd like for fsync to not be a factor).
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
476009381a Remove maximum enqueued requests limit
We already drop in-flight requests that we're already processing,
so I don't think this matters very much currently.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c5562c7ca3 Parallelize CDC by directory
Fixes #66.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
057be91613 rocksDBStats -> rocksDBMetrics 2023-11-09 13:38:32 +00:00
Francesco Mazzoli
c5979a9d90 Expose some RocksDB stats 2023-11-09 13:23:49 +00:00
Francesco Mazzoli
03e9510255 Align xmon's app instances and systemd services 2023-11-08 14:36:58 +00:00
Francesco Mazzoli
71556ce933 Switch to restech EggsFS rota 2023-11-03 14:23:44 +00:00
Francesco Mazzoli
64d400fcfe Insert shard/cdc metrics at more regular intervals 2023-11-03 13:49:38 +00:00
Francesco Mazzoli
654c0d4db4 Report CDC queue size in grafana 2023-11-03 13:49:32 +00:00
Francesco Mazzoli
6726fff0fe Better "innocuous error" handling in CDC 2023-10-04 18:12:15 +01:00
Francesco Mazzoli
440a78510e Add concrete quiet windows to C++ alerts
This together with the previous commits fixes #72.
2023-10-02 23:06:40 +00:00
Francesco Mazzoli
02838e228f Correct xmon app types 2023-09-28 11:53:12 +00:00
Francesco Mazzoli
77ac15af8d Allow to choose xmon env in C++ apps 2023-09-18 11:56:44 +00:00
Ivan Korostelev
7ec477ca9f CDC.cpp: minor bugfix with using optional after reset()
harmless in release builds, since optional in questino is POD and destructor is a noop
2023-08-09 10:45:43 +00:00
Francesco Mazzoli
32e2a011ee More grafana fixes 2023-08-08 09:28:07 +00:00
Francesco Mazzoli
467fcffefb A few metrics fixes 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
e2246afc53 More tweaks to event loops 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
5117ddd16e Add shard/CDC metrics 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
1922cf3c30 Factor out common looping patterns 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
93b212c665 Alert while initializing shard DB 2023-08-07 10:16:00 +00:00
Francesco Mazzoli
b370118e90 Rate limit binnable xmon requests
This involved clearly separating non-clearable and clearable alerts,
which simplifies the design and I think satisfies all our needs.
2023-08-05 23:41:10 +01:00
Francesco Mazzoli
9ef3162882 Add error count to inspect how things failed 2023-08-03 06:53:35 +00:00
Francesco Mazzoli
acf4f129f6 Fix CDC txn status tracking
This might be a resolution to #38, although I'm not sure yet.
2023-08-02 13:49:38 +00:00
Francesco Mazzoli
63e2db0889 Cap maximum number of CDC requests
No point letting huge queues build -- especially now that we
deduplicate client requests.
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
fe2ce7aa17 See comments 2023-08-01 21:17:23 +01:00
Francesco Mazzoli
a5eb12a262 Do not alert/log on innocuous shard error in CDC 2023-08-01 13:41:18 +00:00
Francesco Mazzoli
e851457c52 Do not re-insert requests in C++ xmon code
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
7dceb5fda5 More alerts shenanigans 2023-07-27 15:51:15 +00:00
Francesco Mazzoli
a01b1f036d More alert-related fixes 2023-07-27 13:54:51 +00:00
Francesco Mazzoli
6a52a961eb Split CDC timings to distinguish queue time from exec time 2023-07-27 13:14:12 +00:00
Francesco Mazzoli
889c04766f Do not bump req ids when retrying requests in the CDC
Fixes #29.

The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
f797663d8c Transient alerts for EPERM errors on sendto 2023-07-27 07:31:34 +00:00
Francesco Mazzoli
bf447408a6 Actually wait for things to finish terminating before reaping next one
Fixes #27. This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli
dd39466daa Insert CDC stats on shutdown 2023-07-26 20:41:35 +00:00
Francesco Mazzoli
999d2df52b Do not alert for missing CDC request
This is totally normal if the CDC is restarted with queued
transactions.
2023-07-26 19:38:17 +00:00
Francesco Mazzoli
45b2618296 Temporarily put a stop to alert spam 2023-07-26 19:33:00 +00:00
Francesco Mazzoli
b0ff28dc44 Do not alert on error which can happen naturally in GC 2023-07-26 19:28:44 +00:00
Francesco Mazzoli
0fc80dfe0f Remove additional CDC status fields
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
60554ec58d Have bigger histograms, remove other metrics entirely
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00
Francesco Mazzoli
2b1b1a1c15 Insert stats when shutting down 2023-07-17 12:27:07 +00:00
Francesco Mazzoli
3cc7310a6e Add histograms for all components in /stats 2023-07-17 08:56:09 +00:00
Francesco Mazzoli
ff9306f6e3 Add Xmon support to C++ code 2023-07-11 12:13:22 +00:00
Francesco Mazzoli
4e0e6fe8a8 Configurable CDC shard timeout
Running in valgrind seems to just not be able to process a small
FullReadDirReq in 100ms, which is a bit concerning, but I'll let
it slide for now.
2023-07-04 08:05:42 +00:00