Francesco Mazzoli
bd278ff6f6
Better metrics for shard responses in CDC
2023-11-29 13:52:44 +00:00
Francesco Mazzoli
4453083aa7
Correctly record request id when picking up transactions after restart
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
a367858684
Drop entire CF at once, rather than one-by-one
...
A dry run of the production upgrade using a backup revealed that
dropping them one-by-one would take ages, since before we kept every
single CDC request.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
7537bbc6cf
Remove useless line
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
fac014a864
Self-PR review, part 2
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
ba9424e224
Remove unordered_set
...
Almost certainly irrelevant, but it was bugging me
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
2eab012d76
Fix bug in poll check code
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c94ece50cf
Integer sanitizer stuff
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
59abb24a8e
Add ceiling on max update size
...
We don't want it to grow without bound, but we want to maximize
throughput (we'd like for fsync to not be a factor).
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
476009381a
Remove maximum enqueued requests limit
...
We already drop in-flight requests that we're already processing,
so I don't think this matters very much currently.
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
c5562c7ca3
Parallelize CDC by directory
...
Fixes #66 .
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
340e7f2f37
Harmonize addr-passing, add shuckle beacon and test it in kmod
2023-11-14 13:49:36 +00:00
Francesco Mazzoli
2ad278adaa
Add ubuntu image to build, use jemalloc in release build
...
I want to use the introspection capabilities of jemalloc, and it
should also be much faster. Preserve alpine build for go build,
it's also really useful to test inside the kmod.
2023-11-13 15:44:55 +00:00
Francesco Mazzoli
ad3c969772
Push full RocksDB stats to grafana
2023-11-09 16:48:51 +00:00
Francesco Mazzoli
f70c484883
Dump RocksDB full statistics to file
2023-11-09 14:12:54 +00:00
Francesco Mazzoli
057be91613
rocksDBStats -> rocksDBMetrics
2023-11-09 13:38:32 +00:00
Francesco Mazzoli
c5979a9d90
Expose some RocksDB stats
2023-11-09 13:23:49 +00:00
Francesco Mazzoli
03e9510255
Align xmon's app instances and systemd services
2023-11-08 14:36:58 +00:00
Francesco Mazzoli
afc4e78a62
Reduce default CDC queue size
2023-11-05 22:38:57 +00:00
Francesco Mazzoli
71556ce933
Switch to restech EggsFS rota
2023-11-03 14:23:44 +00:00
Francesco Mazzoli
64d400fcfe
Insert shard/cdc metrics at more regular intervals
2023-11-03 13:49:38 +00:00
Francesco Mazzoli
654c0d4db4
Report CDC queue size in grafana
2023-11-03 13:49:32 +00:00
Francesco Mazzoli
9e21969637
Slightly tighter error checks
2023-10-11 13:40:46 +01:00
Francesco Mazzoli
6726fff0fe
Better "innocuous error" handling in CDC
2023-10-04 18:12:15 +01:00
Francesco Mazzoli
440a78510e
Add concrete quiet windows to C++ alerts
...
This together with the previous commits fixes #72 .
2023-10-02 23:06:40 +00:00
Francesco Mazzoli
59237ed673
Limit number of open RocksDB files
...
We got to the point where we had ~4k open SST files per shard, which
meant that we eat up all the available FDs.
2023-09-30 11:08:35 +00:00
Francesco Mazzoli
2679ee7c80
Retry RocksDB transactions if appropriate
2023-09-30 10:44:40 +00:00
Francesco Mazzoli
1d4c4abafd
Correctly check that RocksDB txn succeeded
...
This was caught anyway by the fact that we check that the log index
is what we expect. Would have been very nasty otherwise.
The right thing to do is to check for `Status::TryAgain()` and
retry. `Status::Busy()` should never happen because we never
run transactions concurrently so far.
2023-09-30 09:51:26 +00:00
Francesco Mazzoli
02838e228f
Correct xmon app types
2023-09-28 11:53:12 +00:00
Francesco Mazzoli
762f047772
Add fsr17 and fsr18 to deployment
2023-09-19 12:56:34 +00:00
Francesco Mazzoli
77ac15af8d
Allow to choose xmon env in C++ apps
2023-09-18 11:56:44 +00:00
Ivan Korostelev
7ec477ca9f
CDC.cpp: minor bugfix with using optional after reset()
...
harmless in release builds, since optional in questino is POD and destructor is a noop
2023-08-09 10:45:43 +00:00
Francesco Mazzoli
32e2a011ee
More grafana fixes
2023-08-08 09:28:07 +00:00
Francesco Mazzoli
467fcffefb
A few metrics fixes
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
e2246afc53
More tweaks to event loops
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
5117ddd16e
Add shard/CDC metrics
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
1922cf3c30
Factor out common looping patterns
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
93b212c665
Alert while initializing shard DB
2023-08-07 10:16:00 +00:00
Francesco Mazzoli
b370118e90
Rate limit binnable xmon requests
...
This involved clearly separating non-clearable and clearable alerts,
which simplifies the design and I think satisfies all our needs.
2023-08-05 23:41:10 +01:00
Francesco Mazzoli
9ef3162882
Add error count to inspect how things failed
2023-08-03 06:53:35 +00:00
Francesco Mazzoli
acf4f129f6
Fix CDC txn status tracking
...
This might be a resolution to #38 , although I'm not sure yet.
2023-08-02 13:49:38 +00:00
Francesco Mazzoli
63e2db0889
Cap maximum number of CDC requests
...
No point letting huge queues build -- especially now that we
deduplicate client requests.
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
fe2ce7aa17
See comments
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
a5eb12a262
Do not alert/log on innocuous shard error in CDC
2023-08-01 13:41:18 +00:00
Francesco Mazzoli
e851457c52
Do not re-insert requests in C++ xmon code
...
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
7dceb5fda5
More alerts shenanigans
2023-07-27 15:51:15 +00:00
Francesco Mazzoli
a01b1f036d
More alert-related fixes
2023-07-27 13:54:51 +00:00
Francesco Mazzoli
6a52a961eb
Split CDC timings to distinguish queue time from exec time
2023-07-27 13:14:12 +00:00
Francesco Mazzoli
889c04766f
Do not bump req ids when retrying requests in the CDC
...
Fixes #29 .
The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
f797663d8c
Transient alerts for EPERM errors on sendto
2023-07-27 07:31:34 +00:00