Francesco Mazzoli
fe2ce7aa17
See comments
2023-08-01 21:17:23 +01:00
Francesco Mazzoli
a5eb12a262
Do not alert/log on innocuous shard error in CDC
2023-08-01 13:41:18 +00:00
Francesco Mazzoli
1ac282dc08
Update README.md
2023-08-01 13:51:55 +01:00
Francesco Mazzoli
0d1d579449
Update README.md
2023-08-01 13:40:22 +01:00
Francesco Mazzoli
023885a6f6
Smooth out histos slightly
2023-08-01 12:05:02 +00:00
Francesco Mazzoli
a7d119387c
Add buttons in /stat to quickly choose range
2023-08-01 11:47:52 +00:00
Francesco Mazzoli
baf1fd7278
Re-introduce longer timeouts in QEMU
...
Should fix #31
2023-07-31 13:00:04 +00:00
Francesco Mazzoli
9b38509362
Fix CDC request tracing
2023-07-31 12:59:24 +00:00
Francesco Mazzoli
1359a5d752
The timeouts seem to be some genuine errors, but I need more info to debug
2023-07-31 10:42:03 +00:00
Francesco Mazzoli
791d847f2b
Increase timeouts in QEMU
2023-07-31 08:46:45 +00:00
Francesco Mazzoli
03684fee13
Timings -> Histogram, I want to add other kinds of timings soon
2023-07-30 13:18:09 +00:00
Francesco Mazzoli
7ab9c78770
Explain some /stats details
2023-07-30 13:14:44 +00:00
Francesco Mazzoli
5146a80c2d
Use homegrown Xmon
...
I got annoyed at the old lib dropping requests when queue gets
full, I could probably fix but this is almost certainly quicker.
2023-07-30 11:16:35 +00:00
Francesco Mazzoli
f3b0a9be7c
Correct accounting of header size when fetching blocks
...
Fixes #28
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
e851457c52
Do not re-insert requests in C++ xmon code
...
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
5fe63035a8
Fix kernel ci rsync
2023-07-29 16:18:27 +00:00
Francesco Mazzoli
76f581f194
Rationalize timeout handling
2023-07-28 15:01:55 +00:00
Francesco Mazzoli
41cf9b16f9
Much better /stats
2023-07-28 08:12:04 +00:00
Francesco Mazzoli
0ac6252832
Decouple shuckle registration and block counting
...
Block counting was getting slow now, and it could cause mistakenly
marking block services as stale.
2023-07-28 08:09:55 +00:00
Francesco Mazzoli
8e9f4f3d8b
Never die because of bad Xmon
...
It will alert if we're disconnected anyway, and when restarting
everything this causes crashes.
2023-07-28 08:08:03 +00:00
Francesco Mazzoli
7dceb5fda5
More alerts shenanigans
2023-07-27 15:51:15 +00:00
Francesco Mazzoli
a01b1f036d
More alert-related fixes
2023-07-27 13:54:51 +00:00
Francesco Mazzoli
2c3c09180b
Ignore certain errors in shuckle connections
...
See <internal-repo/issues/22#issuecomment-22869>
2023-07-27 13:29:48 +00:00
Francesco Mazzoli
6a52a961eb
Split CDC timings to distinguish queue time from exec time
2023-07-27 13:14:12 +00:00
Francesco Mazzoli
fd132ca4b2
More CI tweaks 🥱
2023-07-27 12:44:44 +00:00
Francesco Mazzoli
85fd84af82
Fix logger confusion
2023-07-27 12:23:34 +00:00
Francesco Mazzoli
39ba351886
Yet improved CI log syncing
2023-07-27 12:15:24 +00:00
Francesco Mazzoli
889c04766f
Do not bump req ids when retrying requests in the CDC
...
Fixes #29 .
The additions to codegen are unrelated -- I was exploring a different
approach based on request equality and I decided to keep those
changes in since they might be useful anyhow.
2023-07-27 11:55:33 +00:00
Francesco Mazzoli
111bde371b
Finally remove duplication between shardreq.go and cdcreq.go
2023-07-27 09:54:49 +00:00
Francesco Mazzoli
f797663d8c
Transient alerts for EPERM errors on sendto
2023-07-27 07:31:34 +00:00
Francesco Mazzoli
bf447408a6
Actually wait for things to finish terminating before reaping next one
...
Fixes #27 . This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli
15e59b8e67
More logging when closing (see #27 )
...
It seems that we get the SIGSEGV while closing the DB.
2023-07-26 21:09:29 +00:00
Francesco Mazzoli
401898ac98
Better /stats commentary.
2023-07-26 20:44:02 +00:00
Francesco Mazzoli
dd39466daa
Insert CDC stats on shutdown
2023-07-26 20:41:35 +00:00
Francesco Mazzoli
9979f54240
Ignore blockservices once they are decommissioned
...
This is so that we can restart the blockservice before removing the
mount, and have GC and generally block erasure to work fine. It should
be fairly safe since we never allow the removal of the decommissioned
flag.
2023-07-26 20:41:27 +00:00
Francesco Mazzoli
999d2df52b
Do not alert for missing CDC request
...
This is totally normal if the CDC is restarted with queued
transactions.
2023-07-26 19:38:17 +00:00
Francesco Mazzoli
45b2618296
Temporarily put a stop to alert spam
2023-07-26 19:33:00 +00:00
Francesco Mazzoli
b0ff28dc44
Do not alert on error which can happen naturally in GC
2023-07-26 19:28:44 +00:00
Francesco Mazzoli
0fc80dfe0f
Remove additional CDC status fields
...
`status()` was racy anyway (the txn might have been gone between
first and second lookup) and these are better solved by the stats
db/graphana anyway.
2023-07-26 19:21:24 +00:00
Francesco Mazzoli
6d07e5d97d
Fix stats with new histos
2023-07-26 13:18:43 +00:00
Francesco Mazzoli
d918df0fcc
Correctly return errors when failing to connect
...
Triggered by investigating
xmon: could not read message type: unexpected EOF, will reconnect
xmon: connected to xmon REDACTED
Undertaker: hard abort - running abort handlers
Uncaught exception thrown: SyscallException(Xmon.cpp@186, 9/EBADF=Bad file descriptor in void Xmon::run()): setsockopt
which caused crashes in shards/CDC.
2023-07-26 12:59:40 +00:00
Francesco Mazzoli
3206d7a564
Fix syncing of kmod test logs
2023-07-26 11:46:24 +00:00
Francesco Mazzoli
e631096df7
Write shuckle stats before quitting
2023-07-26 10:01:27 +00:00
Francesco Mazzoli
60554ec58d
Have bigger histograms, remove other metrics entirely
...
The `uint16_t` -> `size_t` in `packedSize` is because now
insert stats requests are bigger than `uint16_t`.
2023-07-26 10:01:27 +00:00
Francesco Mazzoli
a35ea2dacf
Minor shuckle tweaks
2023-07-26 07:58:17 +00:00
Francesco Mazzoli
f381bb293f
Wait for block services first in tests
...
This ensures that when the shards ask for block services we always
get the full set, which means that we'll be able to write from the
get-go.
2023-07-26 07:58:17 +00:00
Francesco Mazzoli
0b7d432051
Use -verbose in kernel tests, and sync logs
2023-07-25 10:28:30 +00:00
Francesco Mazzoli
4b0554d9a9
Start block services correctly in eggsrun
2023-07-24 21:04:20 +00:00
Francesco Mazzoli
9ce88cd8ed
Don't do that, do the opposite of that
2023-07-24 18:18:48 +00:00
Francesco Mazzoli
704062ed5f
Store dead block services ciphers, and use them
2023-07-24 18:10:25 +00:00