Commit Graph

1014 Commits

Author SHA1 Message Date
Miroslav Crnic aebcce4017 logsdb: fix assert for last relased going backwards 2024-03-25 10:31:58 +00:00
Francesco Mazzoli 6182188511 Split out stale alerts, make them daytime 2024-03-22 11:59:10 +00:00
Francesco Mazzoli c143e48841 Add command to eggscli to read out kernel metrics 2024-03-22 11:34:28 +00:00
Francesco Mazzoli b903875459 Set new creation time when renaming
I thought this would not be necessary due to the fact that
we'd fill it in after revalidation, but we did encounter some
cases where this does not seem to happen.
2024-03-21 20:12:56 +00:00
Francesco Mazzoli 48f9123a5a Adjust attempts check, now we start from 1 2024-03-21 17:04:04 +00:00
Francesco Mazzoli 4b0dd25bdc Correctly record attempts in eggsfs_metadata_request
This got lost in the `net.c` refactor, and it caused the recovery
mechanism on repeated requests to fail.
2024-03-21 14:19:54 +00:00
Francesco Mazzoli 6b382c044b Fix race in async getattr 2024-03-21 14:10:54 +00:00
Francesco Mazzoli 3f4988bb32 Some more metadata debug logging 2024-03-21 14:08:59 +00:00
Saulius Grusnys 2157833680 explain why ftruncate needs to be disabled 2024-03-21 09:24:21 +00:00
Saulius Grusnys 8565726989 sysctl param to disable ftruncate support 2024-03-21 09:24:21 +00:00
Francesco Mazzoli 43e6c940b3 Make sure we have size information wherever we need it 2024-03-20 19:43:31 +00:00
Francesco Mazzoli be2d604d96 Stable shuckle alerts 2024-03-20 17:08:25 +00:00
Miroslav Crnic 7df0a5da89 shard: cli options now match migration phases for LogsDB, and support manual failover 2024-03-20 15:34:55 +00:00
Saulius Grusnys 6f816fb319 improve logging 2024-03-20 15:20:32 +00:00
Saulius Grusnys fd9079febf Rate limited shuckle endpoint to decom blockservices 2024-03-20 15:16:00 +00:00
Francesco Mazzoli 1cf299bfac Use atomics where appropriate 2024-03-20 13:21:18 +00:00
Francesco Mazzoli f85714dbba Use pthread_self() to get pthread thread id 2024-03-20 13:11:14 +00:00
Francesco Mazzoli d512e8d281 Escape file name in backlinks 2024-03-20 13:00:28 +00:00
Francesco Mazzoli 3a6e498664 Make some Loop methods static 2024-03-20 13:00:18 +00:00
Francesco Mazzoli 9bc7e209e4 Safer ShuckleSock 2024-03-20 11:33:39 +00:00
Francesco Mazzoli 66fe0a2621 Correct pthread_timedjoin_np handling 2024-03-20 11:13:26 +00:00
Francesco Mazzoli 8f1ba6361b Resist interruptions when joining threads 2024-03-20 10:32:42 +00:00
Francesco Mazzoli 66ccba6124 Forward termination signal to main thread 2024-03-20 10:32:42 +00:00
Francesco Mazzoli 488f096eb9 Stat files/directories speculatively on readdir
Also, split the timeouts for dentries and for stats. We generally
don't care if stats are out of dates, but dentries should be up
to date.

The code leaves various aspects to be desired:

* No attempt is made to only send stats when needed -- it is always
    done. It might be a good idea to instead wait for the first two
    stats to come back.

* Theres quite a bit of code duplication.

* It's pretty wasteful to have so many different packets for the
    stats. It'd be much better to pack multiple requests and multiple
    responses in single packets.

    This could be done simply by allowing many requests to come
    in the same packet (just one after the other would be fine),
    and same for the responses. We can still use the protocol and
    request id to keep track of things anyway.
2024-03-19 20:29:23 +00:00
Miroslav Crnic c25cb696b4 shard: remove protection that only replica 0 can be leader 2024-03-19 16:29:36 +00:00
Francesco Mazzoli b12cdf7507 Add replicas info to shuckle web ui 2024-03-19 15:55:18 +00:00
Francesco Mazzoli abd7131e88 Fix BlockServicesCacheDB init 2024-03-19 15:26:19 +00:00
Miroslav Crnic 37539e1c5e eggsdbtools: reduce logging, output stats 2024-03-19 15:15:49 +00:00
Miroslav Crnic 938c845a30 eggsdbtool: cli for shard db comparison 2024-03-19 15:00:01 +00:00
Francesco Mazzoli 6d9da0e595 Remove all remnants of block service cache in ShardDB
The previous code was pretty nasty, it reached into the `ShardDB`
column family from another class. All those keys have been deleted
anyway in production.
2024-03-19 14:27:33 +00:00
Miroslav Crnic a4c091c7b2 logsdb: log state at flush to have consistent view 2024-03-19 12:44:56 +00:00
Miroslav Crnic 5ce2efb88b shard: increase number of requests processed in loop when LogsDB is on 2024-03-18 18:06:19 +00:00
Miroslav Crnic 096b9cbe6a logsdb: fix for replication path 2024-03-18 17:29:49 +00:00
Miroslav Crnic 0b7d1c30d3 shard: turn on replication writes 2024-03-18 14:19:50 +00:00
Miroslav Crnic dfcabdba97 LogsDB: tweak catchup timeout 2024-03-18 12:00:27 +00:00
Miroslav Crnic c8cda7e4db logsdb: periodically log status 2024-03-18 09:44:47 +00:00
Miroslav Crnic 72c1acaea8 xmon: if too many alerts initialize appType to _parent 2024-03-15 19:39:41 +00:00
Miroslav Crnic 27faaa45ae ci: add ability to run with LogsDB, shard: add handling of LogsDB messages 2024-03-15 16:49:39 +00:00
Saulius Grusnys 74e81ca836 do not hit production shuckle by default from go apps 2024-03-15 08:46:07 +00:00
Saulius Grusnys e0dc93ded1 additional metrics in eggsblocks (#222) 2024-03-15 05:30:44 +00:00
Francesco Mazzoli 3db003a8f6 Fix bug in BlockServicesCacheDB initialization 2024-03-13 12:07:33 +00:00
Francesco Mazzoli 3fc466f197 Fix alert formatting 2024-03-13 12:04:40 +00:00
Miroslav Crnic ebcdcb650a shard: add support for resetting all data in LogsDB 2024-03-13 11:33:48 +00:00
Francesco Mazzoli 005121bcac Spin block service cache out of ShardDB
This started being a problem since the block service update log
entry does not fit in a UDP packet (it's like 100KB). I think this
approach makes more sense anyway. See comment for `getCache()` for
gotchas.
2024-03-13 11:29:58 +00:00
Miroslav Crnic 52cc5c01df tests: ability to run functional tests in docker 2024-03-13 10:21:56 +00:00
Francesco Mazzoli 6968c25bc5 Allow : in metrics 2024-03-12 14:04:34 +00:00
Miroslav Crnic 13c5df0131 shard: fix name in xmon and add replica id to tag in metrics 2024-03-12 13:40:35 +00:00
Miroslav Crnic b240de53b5 shard: distributed log implementation and shard can use it with a flag set 2024-03-12 11:02:04 +00:00
Francesco Mazzoli d5fb66b694 Test mmap in CI 2024-03-11 15:35:44 +00:00
Francesco Mazzoli e96742c711 Implement readpage, and therefore allow mmap 2024-03-11 15:33:57 +00:00