Francesco Mazzoli
c5562c7ca3
Parallelize CDC by directory
...
Fixes #66 .
2023-11-29 11:08:07 +00:00
Francesco Mazzoli
057be91613
rocksDBStats -> rocksDBMetrics
2023-11-09 13:38:32 +00:00
Francesco Mazzoli
c5979a9d90
Expose some RocksDB stats
2023-11-09 13:23:49 +00:00
Francesco Mazzoli
03e9510255
Align xmon's app instances and systemd services
2023-11-08 14:36:58 +00:00
Francesco Mazzoli
71556ce933
Switch to restech EggsFS rota
2023-11-03 14:23:44 +00:00
Francesco Mazzoli
64d400fcfe
Insert shard/cdc metrics at more regular intervals
2023-11-03 13:49:38 +00:00
Francesco Mazzoli
654c0d4db4
Report CDC queue size in grafana
2023-11-03 13:49:32 +00:00
Francesco Mazzoli
6726fff0fe
Better "innocuous error" handling in CDC
2023-10-04 18:12:15 +01:00
Francesco Mazzoli
440a78510e
Add concrete quiet windows to C++ alerts
...
This together with the previous commits fixes #72 .
2023-10-02 23:06:40 +00:00
Francesco Mazzoli
02838e228f
Correct xmon app types
2023-09-28 11:53:12 +00:00
Francesco Mazzoli
77ac15af8d
Allow to choose xmon env in C++ apps
2023-09-18 11:56:44 +00:00
Ivan Korostelev
7ec477ca9f
CDC.cpp: minor bugfix with using optional after reset()
...
harmless in release builds, since optional in questino is POD and destructor is a noop
2023-08-09 10:45:43 +00:00
Francesco Mazzoli
32e2a011ee
More grafana fixes
2023-08-08 09:28:07 +00:00
Francesco Mazzoli
467fcffefb
A few metrics fixes
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
e2246afc53
More tweaks to event loops
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
5117ddd16e
Add shard/CDC metrics
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
1922cf3c30
Factor out common looping patterns
2023-08-08 09:21:35 +01:00
Francesco Mazzoli
93b212c665
Alert while initializing shard DB
2023-08-07 10:16:00 +00:00
Francesco Mazzoli
b370118e90
Rate limit binnable xmon requests
...
This involved clearly separating non-clearable and clearable alerts,
which simplifies the design and I think satisfies all our needs.
2023-08-05 23:41:10 +01:00
Francesco Mazzoli
9ef3162882
Add error count to inspect how things failed
2023-08-03 06:53:35 +00:00
Francesco Mazzoli
acf4f129f6
Fix CDC txn status tracking
...
This might be a resolution to #38 , although I'm not sure yet.
2023-08-02 13:49:38 +00:00
Francesco Mazzoli
63e2db0889
Cap maximum number of CDC requests
...
No point letting huge queues build -- especially now that we
deduplicate client requests.
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
fe2ce7aa17
See comments
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
a5eb12a262
Do not alert/log on innocuous shard error in CDC
2023-08-01 13:41:18 +00:00
Francesco Mazzoli
e851457c52
Do not re-insert requests in C++ xmon code
...
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
7dceb5fda5
More alerts shenanigans
2023-07-27 15:51:15 +00:00
Francesco Mazzoli
a01b1f036d
More alert-related fixes
2023-07-27 13:54:51 +00:00
Francesco Mazzoli
6a52a961eb
Split CDC timings to distinguish queue time from exec time
2023-07-27 13:14:12 +00:00
Francesco Mazzoli
889c04766f
Do not bump req ids when retrying requests in the CDC
...
Fixes #29 .
The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
f797663d8c
Transient alerts for EPERM errors on sendto
2023-07-27 07:31:34 +00:00
Francesco Mazzoli
bf447408a6
Actually wait for things to finish terminating before reaping next one
...
Fixes #27 . This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli
dd39466daa
Insert CDC stats on shutdown
2023-07-26 20:41:35 +00:00
Francesco Mazzoli
999d2df52b
Do not alert for missing CDC request
...
This is totally normal if the CDC is restarted with queued
transactions.
2023-07-26 19:38:17 +00:00
Francesco Mazzoli
45b2618296
Temporarily put a stop to alert spam
2023-07-26 19:33:00 +00:00
Francesco Mazzoli
b0ff28dc44
Do not alert on error which can happen naturally in GC
2023-07-26 19:28:44 +00:00
Francesco Mazzoli
0fc80dfe0f
Remove additional CDC status fields
...
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
60554ec58d
Have bigger histograms, remove other metrics entirely
...
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00
Francesco Mazzoli
2b1b1a1c15
Insert stats when shutting down
2023-07-17 12:27:07 +00:00
Francesco Mazzoli
3cc7310a6e
Add histograms for all components in /stats
2023-07-17 08:56:09 +00:00
Francesco Mazzoli
ff9306f6e3
Add Xmon support to C++ code
2023-07-11 12:13:22 +00:00
Francesco Mazzoli
4e0e6fe8a8
Configurable CDC shard timeout
...
Running in valgrind seems to just not be able to process a small
FullReadDirReq in 100ms, which is a bit concerning, but I'll let
it slide for now.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli
dd78912c0c
More stuff as debug
2023-06-18 12:50:05 +00:00
Francesco Mazzoli
e26eeaede1
Add "mtu" field to requests that benefit from it
...
Not used right now, but this way we can easily start stuffing more
data in responses.
I also split off some arguments in `NewClient`, unrelated change
(I wanted to pair the MTU with a single client, but I then realized
that it's enough to have it as some global property for now).
2023-06-15 11:57:05 +00:00
Francesco Mazzoli
d1e02e261b
Various QOL improvements
...
Also, try to avoid thundering herds on shuckle from CDC/shards too.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
d076941ce8
Simplify block write/fetch
...
And hopefully reduce the likelihood of bugs. On the write end, given
that we do things less asynchronously, things might be a bit slower,
but I think the simplification is worth it for now.
Also, fix/improve a bunch of other stuff.
2023-06-08 11:59:09 +00:00
Francesco Mazzoli
b041d14860
Add second ip/addr for CDC/shards too
...
This is one of the two data model/protocol changes I want to perform
before going into production, the other being file atime.
Right now the kernel module does not take advantage of this, but
it's OK since I tested the rest of the code reasonably and the goal
here is to perform the protocol/data changes.
2023-06-05 12:14:14 +00:00
Francesco Mazzoli
a12a938c40
syslogify logs
2023-05-29 09:52:01 +00:00
Francesco Mazzoli
1458759534
Allow to enable shard/cdc debugging at runtime using USR2
2023-05-26 10:03:59 +00:00
Francesco Mazzoli
5bff9b8fae
Many, many changes -- tests pass, but FUSE is currently not present
...
The main thing that's added is full RS support, but a lot of things
were rejigged along the way. The tests are still a bit lacking,
and will be augmented in future commits.
2023-03-03 16:42:22 +00:00
Francesco Mazzoli
e1b8de02dc
More assorted improvements
2023-02-15 14:03:53 +00:00