Commit Graph

23 Commits

Author SHA1 Message Date
Miroslav Crnic
30ee029f7e shuckle: make requests interruptable and pass timeout to all operations
This means that they'll be interrupted at shutdown, rather than holding everything up when shuckle is overloaded.
We also detect idle connection or slow transmitting data.
2024-04-02 18:15:29 +01:00
Miroslav Crnic
72c1acaea8 xmon: if too many alerts initialize appType to _parent 2024-03-15 19:39:41 +00:00
Francesco Mazzoli
531f989a06 Correct app type for quiet alert creation 2024-02-20 14:16:52 +00:00
Francesco Mazzoli
303421763a Allow to specify rota per alert in C++ 2024-02-20 12:59:42 +00:00
Francesco Mazzoli
0a6a0c8f24 Process CDC timeouts in a timely manner 2024-01-29 15:08:06 +00:00
Francesco Mazzoli
8075e99bb6 Graceful shard teardown
See <https://mazzo.li/posts/stopping-linux-threads.html> for tradeoffs
regarding how to terminate threads gracefully.

The goal of this work was for valgrind to work correctly, which in turn
was to investigate #141. It looks like I have succeeded:

    ==2715080== Warning: unimplemented fcntl command: 1036
    ==2715080== 20,052 bytes in 5,013 blocks are definitely lost in loss record 133 of 135
    ==2715080==    at 0x483F013: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
    ==2715080==    by 0x3B708E: allocate (new_allocator.h:121)
    ==2715080==    by 0x3B708E: allocate (allocator.h:173)
    ==2715080==    by 0x3B708E: allocate (alloc_traits.h:460)
    ==2715080==    by 0x3B708E: _M_allocate (stl_vector.h:346)
    ==2715080==    by 0x3B708E: std::vector<Crc, std::allocator<Crc> >::_M_default_append(unsigned long) (vector.tcc:635)
    ==2715080==    by 0x42BF1C: resize (stl_vector.h:940)
    ==2715080==    by 0x42BF1C: ShardDBImpl::_fileSpans(rocksdb::ReadOptions&, FileSpansReq const&, FileSpansResp&) (shard/ShardDB.cpp:921)
    ==2715080==    by 0x420867: ShardDBImpl::read(ShardReqContainer const&, ShardRespContainer&) (shard/ShardDB.cpp:1034)
    ==2715080==    by 0x3CB3EE: ShardServer::_handleRequest(int, sockaddr_in*, char*, unsigned long) (shard/Shard.cpp:347)
    ==2715080==    by 0x3C8A39: ShardServer::step() (shard/Shard.cpp:405)
    ==2715080==    by 0x40B1E8: run (core/Loop.cpp:67)
    ==2715080==    by 0x40B1E8: startLoop(void*) (core/Loop.cpp:37)
    ==2715080==    by 0x4BEA258: start_thread (in /usr/lib/libpthread-2.33.so)
    ==2715080==    by 0x4D005E2: clone (in /usr/lib/libc-2.33.so)
    ==2715080==
    ==2715080==
    ==2715080== Exit program on first error (--exit-on-first-error=yes)
2024-01-08 15:41:22 +00:00
Francesco Mazzoli
8c172fd2e8 Tiny C++ xmon fix 2023-12-10 11:14:19 +00:00
Francesco Mazzoli
3394328000 Do not try to close xmon fd if we don't have one
Also, ignore errors if we can't close it. Fixes #134.
2023-12-09 14:50:51 +00:00
Francesco Mazzoli
38f3d54ecd Wait forever, rather than having timeouts
The goal here is to not have constant wakeups due to timeout. Do
not attempt to clean things up nicely before termination -- just
terminate instead. We can setup a proper termination system in
the future, I first want to see if this makes a difference.

Also, change xmon to use pipes for communication, so that it can
wait without timers as well.

Also, `write` directly for logging, so that we know the logs will
make it to the file after the logging call returns (since we now
do not have the chance to flush them afterwards).
2023-12-07 10:11:19 +00:00
Francesco Mazzoli
24d1588b21 Add quiet window for C++ alerts, too 2023-10-02 23:02:45 +00:00
Francesco Mazzoli
02838e228f Correct xmon app types 2023-09-28 11:53:12 +00:00
Francesco Mazzoli
1922cf3c30 Factor out common looping patterns 2023-08-08 09:21:35 +01:00
Francesco Mazzoli
63ed6a90fa Reconnect to xmon on expired heartbeat 2023-08-07 10:06:40 +00:00
Francesco Mazzoli
b370118e90 Rate limit binnable xmon requests
This involved clearly separating non-clearable and clearable alerts,
which simplifies the design and I think satisfies all our needs.
2023-08-05 23:41:10 +01:00
Francesco Mazzoli
5146a80c2d Use homegrown Xmon
I got annoyed at the old lib dropping requests when queue gets
full, I could probably fix but this is almost certainly quicker.
2023-07-30 11:16:35 +00:00
Francesco Mazzoli
e851457c52 Do not re-insert requests in C++ xmon code
It could mess up the ordering.
2023-07-30 10:58:58 +00:00
Francesco Mazzoli
8e9f4f3d8b Never die because of bad Xmon
It will alert if we're disconnected anyway, and when restarting
everything this causes crashes.
2023-07-28 08:08:03 +00:00
Francesco Mazzoli
7dceb5fda5 More alerts shenanigans 2023-07-27 15:51:15 +00:00
Francesco Mazzoli
bf447408a6 Actually wait for things to finish terminating before reaping next one
Fixes #27. This is all kind of clunky right now, it would be much
better to just standardize the `run()` function pattern.
2023-07-26 22:31:42 +00:00
Francesco Mazzoli
4dbb6c79ba Fix bug in Xmon parsing (alert id is 8 bytes, not 4) 2023-07-24 07:40:49 +00:00
Francesco Mazzoli
dce2961d7f Re-insert xmon requests if we fail to write them 2023-07-18 16:11:01 +00:00
Francesco Mazzoli
6973ed9ff7 Reset xmon buffer before packing stuff in 2023-07-18 16:10:28 +00:00
Francesco Mazzoli
ff9306f6e3 Add Xmon support to C++ code 2023-07-11 12:13:22 +00:00