Commit Graph

703 Commits

Author SHA1 Message Date
Klaas van Schelven fbee32c79a Remove some 'maybe' comments for 'drop immediately' 2024-07-15 11:02:08 +02:00
Klaas van Schelven d5bfe70488 Comments on the finer points of quota 2024-07-15 11:00:20 +02:00
Klaas van Schelven d68aff05ca Quota 2024-07-15 09:37:36 +02:00
Klaas van Schelven c403d906cd stress-test: report on errors 2024-07-15 09:26:27 +02:00
Klaas van Schelven 927bed38d0 Annotate _prev_tup and a small 'fix'
'fix' in scare-quotes, because the previous implementation of first_chunk was
wrong, but never led to actually wrong outcomes, only to one-too-many recursive
call (for seconds, minutes)
2024-07-12 14:51:03 +02:00
Klaas van Schelven 0ef72a7461 Clarify name of test 2024-07-12 12:33:03 +02:00
Klaas van Schelven 49a395fb86 use the envelope_header's DSN if it is available 2024-07-12 10:41:16 +02:00
Klaas van Schelven b5321f3685 Notes on streaming 2024-07-12 10:12:07 +02:00
Klaas van Schelven 6767ea593a Fix port number in example nginx conf 2024-07-12 08:39:50 +02:00
Klaas van Schelven eb01c61947 Fix typo in comment 2024-07-10 14:00:59 +02:00
Klaas van Schelven 90a55e522b Remove 2 'untested behavior' notes
I _think_ I meant that I had never actually seen those code-paths in action
(i.e. the note was not about automated tests but rather any kind of visual
confirmation that it worked) but I have seen that now
2024-07-10 14:00:47 +02:00
Klaas van Schelven 1ed6522126 Clarify why get_pc_registry must be done before event-creation 2024-07-09 13:27:16 +02:00
Klaas van Schelven eb23d44962 Enforce a single pc_registry for a single ingesting process
Using a pid-file that's implied by the ingestion directory.

We do this in `get_pc_registry`, i.e. on the first request. This means failure is
in the first request on the 2nd process.

Why not on startup? Because we don't have a configtest or generic on-startup location
(yet). Making _that_ could be another source of fragility, and getting e.g. the nr
of processes might be non-trivial / config-dependent.
2024-07-09 13:14:27 +02:00
Klaas van Schelven e2cb6654cf Removed stray TODO 2024-07-09 11:12:22 +02:00
Klaas van Schelven edff0e219c PeriodCounter: remove event-based approach
Replacing it with passing the thresholds on each call to `inc`.

The event-based approach was broken in a multi-process setup (such as having a separate
gunicorn and snappea), because the unmute events would be registered GUI-side
(gunicorn), and the single process where the counting happened had a different PC
instance.

The solution is to get rid of the event-listener approach, and just make an inventory of
the threshold-checks that need to be done right before each call to `inc`. Because the
calls to `inc` happen in a single process (we [will] enforce this elsewhere) this fixes
the problem.

During refactoring it became clear that this is probably a good idea anyway: many
comments about corner-cases could be removed.

Other things I found:

* The now-removed `_digest_event_python_postprocessing` did more than Python alone (it
  also touched the DB for unmutes) so that was probably a separate bug (now fixed).

* In the event-listener-based code, I foresaw the need for `on_become_false` (but did
  not use it yet). The idea was probably that this could be useful in the quota setting
  (a quota can become unmet after a while) but in fact it isn't useful, because when a
  quota becomes unmet you'd still need to check all quota and OR them.

Tests have not been truly refactored (the new architecture probably points to a new
desired set of tests) but rather have been made to run in the simplest way possible.
2024-07-09 09:31:36 +02:00
Klaas van Schelven 41985cf507 Document performance implications of better connection closing 2024-07-08 11:18:46 +02:00
Klaas van Schelven 6a5521472a Eviction: notes on 'just-drop' saved 2024-07-08 10:52:26 +02:00
Klaas van Schelven c2ec150f52 Cost of connection.close and subsequent reopen documented 0.1.6 2024-07-08 09:53:15 +02:00
Klaas van Schelven 5120475683 Remove stray trailing qoute from logs 2024-07-08 09:52:40 +02:00
Klaas van Schelven b6cc268333 Remove connection_close
this was never supposed to have been committed, mistake in c453ca00e5
2024-07-08 09:41:57 +02:00
Klaas van Schelven 2ddc33017a Add performance context_manager for timings
was useful during debugging
2024-07-08 09:40:47 +02:00
Klaas van Schelven 0e5bec721e Snappea: use the timed_sqlite_backend
Because the 5s query limit applies even more so for snappea
2024-07-08 09:38:01 +02:00
Klaas van Schelven 078a6504c5 Add 'immediate_semaphore' 2024-07-05 17:50:05 +02:00
Klaas van Schelven c453ca00e5 Snappea connection_close 2024-07-05 16:28:23 +02:00
Klaas van Schelven 1228199d96 Fix (example) PrintOnClose
because __setattr__ wasn't implemented in the delegator pattern, the autocommit
property would not be propagated to the delegate.
2024-07-05 16:00:59 +02:00
Klaas van Schelven 1eb65a7790 Release worker_semaphore when failing to create worker
exposed when playing around with arbitrary Tasks in a shell; this created
workers I could not run, which would put the foreman in a 'waiting for available threads'
mode.

I briefly looked at the rest of that loop to see whether more exception handling
is necessary, but TBH I don't think we can reasonably recover from e.g. task.delete()
failing (or at least I don't want to think about it now)
2024-07-05 15:58:15 +02:00
Klaas van Schelven 259069f6e2 Explicit error message for malformed task name 2024-07-05 15:54:55 +02:00
Klaas van Schelven 97e8e6fe45 PoC for printing on db connection open/close 2024-07-05 13:01:55 +02:00
Klaas van Schelven 253380bf2f Foreman: document current understanding of connection.close() 2024-07-05 13:00:08 +02:00
Klaas van Schelven 7d32f27f00 Close connection on MainThread that we open ourselves in the debug server 2024-07-05 10:36:25 +02:00
Klaas van Schelven bf5d221a03 Snappea: fixes on 'atomic' call for Task-getting
prompted by the work in the previous commit; but somewhat separate from it
0.1.5
2024-07-04 14:05:04 +02:00
Klaas van Schelven 4daa6c9e09 Close database connections in snappea 2024-07-04 14:04:03 +02:00
Klaas van Schelven 14302783aa Eviction: Age based irrelevance with a base of 4 2024-07-02 09:03:17 +02:00
Klaas van Schelven b145ef6631 Eviction: 500 per-eviction is a hard-limit; even for lowered max-events 2024-07-01 15:02:39 +02:00
Klaas van Schelven ec01a64651 Evictions: delete_with_limit (don't overshoot) 2024-07-01 14:07:10 +02:00
Klaas van Schelven 8f401dafd6 Document some findings (Friday's work) 2024-07-01 13:29:56 +02:00
Klaas van Schelven 471b69e956 Stress test: ability to generate random event types: 2024-06-27 10:49:25 +02:00
Klaas van Schelven c5df10e9cf Stress test: ability to use multiple dsns (projects) 0.1.4 2024-06-27 09:52:10 +02:00
Klaas van Schelven e9ed7835c1 eviction_target bugfix; delete 'never_evict' if nothing else remains 2024-06-26 11:06:04 +02:00
Klaas van Schelven 833ebfe9ac Move code around & document it 2024-06-26 10:11:59 +02:00
Klaas van Schelven e4bad2c4f5 Fix existing tests (add field to factory call) 2024-06-26 10:03:25 +02:00
Klaas van Schelven f45995ce19 Evinction lowered target: no more lowering than 500 (for large quota) 2024-06-26 09:53:16 +02:00
Klaas van Schelven 9a96ab767a retention insights: don't ignore never_evict=True 2024-06-26 09:38:34 +02:00
Klaas van Schelven 653739a8f6 Eviction: use deletion counts to keep track of the work
This saves a query in the (small) loop (namely: selection counts of remaining items)

It also allows us to stop sooner (evict less).
2024-06-26 09:26:34 +02:00
Klaas van Schelven fe6c955465 never_evict events that are a Historic Turning Point
Both for technical (foreign keys) and business reasons (these are events you
care about)
2024-06-24 22:50:00 +02:00
Klaas van Schelven adda019cef Add an index to the Event model for eviction
Unscientifically (n=1, changing circumstances), this improved times like so when the max was 10k:

* 573.56ms EVICT; down to 8813, max irr. from 15 to 13 in 171ms+402ms and 5+4 queries  (pre-index)
* 229.34ms EVICT; down to 7643, max irr. from 15 to 12 in 7ms+222ms and 5+7 queries    (post-index)

The order of the index was chosen because we have 3 types of queries in our algo:

* on Project -> irrelevance <= amount of work
* on Project, timestamp -> irrelevance <= observed irrelevances
* on Project, timestamp, irrelevance -> deletion
2024-06-24 14:29:01 +02:00
Klaas van Schelven 2bdc357a87 Eviction: logging 2024-06-24 13:58:45 +02:00
Klaas van Schelven 69a40480fd Retention/eviction: more small fixes/cleanup 2024-06-24 11:48:21 +02:00
Klaas van Schelven bdc6193214 Add tool to generate insight in retention (and fix bugs that that insight revelead) 2024-06-24 10:59:04 +02:00
Klaas van Schelven 63afba020a Eviction: 95% 'lowered target' 2024-06-24 09:24:03 +02:00