Francesco Mazzoli
fe2ce7aa17
See comments
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
a5eb12a262
Do not alert/log on innocuous shard error in CDC
2023-08-01 13:41:18 +00:00
Francesco Mazzoli
5146a80c2d
Use homegrown Xmon
...
I got annoyed at the old lib dropping requests when queue gets
full, I could probably fix but this is almost certainly quicker.
2023-07-30 11:16:35 +00:00
Francesco Mazzoli
e851457c52
Do not re-insert requests in C++ xmon code
...
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
8e9f4f3d8b
Never die because of bad Xmon
...
It will alert if we're disconnected anyway, and when restarting
everything this causes crashes.
2023-07-28 08:08:03 +00:00
Francesco Mazzoli
7dceb5fda5
More alerts shenanigans
2023-07-27 15:51:15 +00:00
Francesco Mazzoli
a01b1f036d
More alert-related fixes
2023-07-27 13:54:51 +00:00
Francesco Mazzoli
6a52a961eb
Split CDC timings to distinguish queue time from exec time
2023-07-27 13:14:12 +00:00
Francesco Mazzoli
889c04766f
Do not bump req ids when retrying requests in the CDC
...
Fixes #29 .
The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
f797663d8c
Transient alerts for EPERM errors on sendto
2023-07-27 07:31:34 +00:00
Francesco Mazzoli
bf447408a6
Actually wait for things to finish terminating before reaping next one
...
Fixes #27 . This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli
15e59b8e67
More logging when closing (see #27 )
...
It seems that we get the SIGSEGV while closing the DB.
2023-07-26 21:09:29 +00:00
Francesco Mazzoli
dd39466daa
Insert CDC stats on shutdown
2023-07-26 20:41:35 +00:00
Francesco Mazzoli
999d2df52b
Do not alert for missing CDC request
...
This is totally normal if the CDC is restarted with queued
transactions.
2023-07-26 19:38:17 +00:00
Francesco Mazzoli
45b2618296
Temporarily put a stop to alert spam
2023-07-26 19:33:00 +00:00
Francesco Mazzoli
b0ff28dc44
Do not alert on error which can happen naturally in GC
2023-07-26 19:28:44 +00:00
Francesco Mazzoli
0fc80dfe0f
Remove additional CDC status fields
...
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
d918df0fcc
Correctly return errors when failing to connect
...
Triggered by investigating
xmon: could not read message type: unexpected EOF, will reconnect
xmon: connected to xmon REDACTED
Undertaker: hard abort - running abort handlers
Uncaught exception thrown: SyscallException(Xmon.cpp@186, 9/EBADF=Bad file descriptor in void Xmon::run()): setsockopt
which caused crashes in shards/CDC.
2023-07-26 12:59:40 +00:00
Francesco Mazzoli
60554ec58d
Have bigger histograms, remove other metrics entirely
...
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00
Francesco Mazzoli
c2bd882cdc
Allow erasing blocks for decommissioned block services
...
Otherwise GC cannot run after disposing of a broken disk. This
commit also adds various safety checks regarding decommissioned
block services.
2023-07-24 19:03:16 +01:00
Francesco Mazzoli
5776bb6d34
Include duration in mean/stddev stat
2023-07-24 19:03:16 +01:00
Francesco Mazzoli
4dbb6c79ba
Fix bug in Xmon parsing (alert id is 8 bytes, not 4)
2023-07-24 07:40:49 +00:00
Francesco Mazzoli
18b01438d4
Have -short tests actually be short, split out longer tests
2023-07-22 20:17:53 +01:00
Francesco Mazzoli
fe14ec5c22
Aggregate mean/stddev stat into one, together with count
...
This makes more sense so that we can combine multiple ones together
2023-07-22 20:17:53 +01:00
Francesco Mazzoli
37ce3be74c
Implement utime-like functions
...
Also, update atime when opening a file.
2023-07-21 06:28:48 +00:00
Francesco Mazzoli
441eebb514
Do not crash on bad shuckle response
2023-07-20 12:46:38 +00:00
Francesco Mazzoli
ce21016ad9
Fix mean/stddev calculation
2023-07-19 21:44:17 +00:00
Francesco Mazzoli
6aa670b481
Remove mean/stddev computation in C++
...
It's broken (also in Go), will fix in the following days.
2023-07-19 11:48:38 +00:00
Francesco Mazzoli
dce2961d7f
Re-insert xmon requests if we fail to write them
2023-07-18 16:11:01 +00:00
Francesco Mazzoli
6973ed9ff7
Reset xmon buffer before packing stuff in
2023-07-18 16:10:28 +00:00
Francesco Mazzoli
b4613bd47e
Fix other little stddev things
2023-07-18 14:43:29 +00:00
Francesco Mazzoli
5c849c0d96
Fix timings stddev overflow
...
This adds a couple of locks which could be avoided by being a bit
more clever, but almost certainly doesn't matter for now.
2023-07-18 14:37:37 +00:00
Francesco Mazzoli
283f3508b9
Add binary /api endpoint, use it to draw histograms
...
This makes /stats _a lot_ faster.
2023-07-18 12:34:57 +00:00
Francesco Mazzoli
2b1b1a1c15
Insert stats when shutting down
2023-07-17 12:27:07 +00:00
Francesco Mazzoli
dcb76a86c2
Fix _hours operator
2023-07-17 12:26:49 +00:00
Francesco Mazzoli
3cc7310a6e
Add histograms for all components in /stats
2023-07-17 08:56:09 +00:00
Francesco Mazzoli
2f7be11e29
Add query for single block service in shuckle
...
I thought I might need it for some upcoming migration improvements,
I probably don't, but still kinda nice to have.
2023-07-13 09:46:37 +00:00
Francesco Mazzoli
2f1385445b
Tighten up the mtime story for transient files
2023-07-12 12:52:50 +00:00
Francesco Mazzoli
d93df7ef42
Make tests pass for now
2023-07-12 12:22:40 +01:00
Francesco Mazzoli
53598c2fe9
Allow to re-open files as writing if we're already writing them
...
This makes `cp` work
2023-07-12 12:22:40 +01:00
Francesco Mazzoli
65174341a0
Drop MM after flushing out a transient file
2023-07-12 12:22:40 +01:00
Francesco Mazzoli
fe88efb1ce
Remove UB in xmon code
2023-07-11 14:15:33 +00:00
Francesco Mazzoli
ff9306f6e3
Add Xmon support to C++ code
2023-07-11 12:13:22 +00:00
Francesco Mazzoli
d5fea6c08c
Retry when block services are unavailable in kmod
2023-07-06 19:39:12 +01:00
Saulius Grusnys
0360ec85cf
Switch cutoff time to blockservice to 1h and set the deadline in shard to 2
2023-07-06 13:28:12 +01:00
Francesco Mazzoli
1a4301a499
Simplify go span read/write code, make it work with broken block services
...
And some other assorted changes.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli
4e0e6fe8a8
Configurable CDC shard timeout
...
Running in valgrind seems to just not be able to process a small
FullReadDirReq in 100ms, which is a bit concerning, but I'll let
it slide for now.
2023-07-04 08:05:42 +00:00
Francesco Mazzoli
87d0e69f85
Port kmod to new FullReadDir request
2023-07-04 08:05:42 +00:00
Francesco Mazzoli
f0add4d926
Remove C++ varint code, we don't use varints anymore
2023-07-04 08:05:42 +00:00
Francesco Mazzoli
e2dcd43fea
Fix bug in CreateLockedCurrentEdge logic
...
See comment in `msgs.go`. This would normally have required
entirely new transactions, but since we're not in production yet
I'm going to just change the schema and wipe the current FS.
This also adds in an unrelated change regarding more flexible
blacklisting, which will be required for some additional testing
I'm preparing.
2023-07-04 08:05:42 +00:00