Commit Graph

738 Commits

Author SHA1 Message Date
Klaas van Schelven 98a4973de0 Create Action tests.yml
Using Github's template for Django; adopting Python versions with currently supported ones
2024-08-21 08:39:23 +02:00
Klaas van Schelven 03f0a2f673 Add README 2024-08-21 08:30:46 +02:00
Klaas van Schelven 6560767764 Tailwind: Explicitly point to python file containing class 2024-08-19 22:12:20 +02:00
Klaas van Schelven ff48ffb977 Query counts for DEBUG=False contexts (i.e. our own 'playground' etc) 2024-08-19 15:14:24 +02:00
Klaas van Schelven 9aaec10f95 Fix in_app display for 'playground' install
(also done on sparrow)
2024-08-19 14:49:56 +02:00
Klaas van Schelven 01839a887d try to send more info when debugging my own install
(I've simultaneously updated 'playground' to do the same)
2024-08-19 14:27:12 +02:00
Klaas van Schelven d6c61c25bb When resolving, show history as 'resolved' (bugfix) 2024-08-19 11:43:14 +02:00
Klaas van Schelven 37b11e16ec ALLOWED_HOSTS: use {{ host }}
as per what's documented (commented out, i.e. privately) in the website:
getting nginx to play nicely for 443/default_server is 'hardish'
2024-08-19 10:56:00 +02:00
Klaas van Schelven 63417d555f Explain why we deal with SIGTERM as we do
(from memory, in response to the glib remarks in b09cfb21c3, ('as demanded by systemd')
2024-07-26 16:22:30 +02:00
Klaas van Schelven 22fdd8adae Docs moved to website 2024-07-26 15:37:16 +02:00
Klaas van Schelven 8cee902db4 WHITENOISE_USE_FINDERS to avoid collectstatic 2024-07-26 15:28:33 +02:00
Klaas van Schelven b76e474ef1 Navigation: fix for missing events
Now that we have eviction, events may disappear. Deal with it:

* event-specific 404 that still allows for navigation
* first/last buttons
* navigation to prev/next when prev/next is not just 1 step away
* don't use HttpRedirect for "lookup based" views
    in principle, the thing you were looking for might go missing in-between
    drawback: these URLs are not "stable"
2024-07-19 11:03:08 +02:00
Klaas van Schelven aec78f6318 Clarify a docstring 2024-07-18 15:13:08 +02:00
Klaas van Schelven c5dc3014ea Migrations: don't limit queries in runtime 2024-07-18 15:07:14 +02:00
Klaas van Schelven 63cfbb2acd Hide version info in the HTML source
so that we at least can quickly inspect it
2024-07-18 14:59:01 +02:00
Klaas van Schelven 3128392d9a Distinguish ingested_at and digested_at 2024-07-18 14:45:59 +02:00
Klaas van Schelven d23f1f0a3b Remove load_performance_fixture
in the belief that it was only (mostly?) useful in the context of
the pc-registry performance tests. (We do our other tests end-to-end
using the stress tests)
2024-07-18 13:17:45 +02:00
Klaas van Schelven 717a632b7d check_for_thresholds refactoring: 'metadata' is superfluous
because it was basically the input-tuple (in a different format)
2024-07-18 09:43:37 +02:00
Klaas van Schelven d731dee4f6 Note on stress testing 2024-07-18 09:34:32 +02:00
Klaas van Schelven e40e652722 Remove duplicate test-factory function 2024-07-18 09:33:50 +02:00
Klaas van Schelven b211ba4c1e Document possible way forward for counting all ingested events 2024-07-18 09:21:59 +02:00
Klaas van Schelven f48c48f7e5 Implement 429 for the deprecated 'store' endpoint too 2024-07-18 09:19:37 +02:00
Klaas van Schelven 36f74acb2d Document performance surprise 2024-07-18 09:18:11 +02:00
Klaas van Schelven 927587c132 Stress test interuptible, still show results 2024-07-17 17:33:30 +02:00
Klaas van Schelven ec0877edb7 Document yet another problem with 'real streaming' and Nginx 2024-07-17 17:13:30 +02:00
Klaas van Schelven 65ea181f37 vbc-unmute: reduce calls to the expensive check
as done in the previous commit for project quota
2024-07-17 15:33:15 +02:00
Klaas van Schelven 51a53c09a4 quota: check as little as possible & check-on-digest
Also fix various off-by-one errors with the help of tests
2024-07-17 14:48:19 +02:00
Klaas van Schelven 8849a3e44b Don't write to the DB on-ingest
In the previous commit I put the code for a small performance-experiment.
The results are (very) obvious: don't do this. Response times go through
the roof, and more importantly, the server becomes unreliable. Reason:
time-outs caused by waiting for the write-lock.
2024-07-16 16:39:12 +02:00
Klaas van Schelven 0c964cfcc8 Add project.ingested_event_count (input for performance-experiment) 2024-07-16 15:48:16 +02:00
Klaas van Schelven c01d332e18 Rename ingest_order to digest_order and clarify event_count
* issue.event_count to digested_event_count
* event.ingest_order to event.digest_order
* issue.ingest_order to digest_order

This is generally more correct/explicit, and is also in preparation
of doing work on-digest (which may or may not happen)
2024-07-16 15:23:40 +02:00
Klaas van Schelven d56a8663a7 Remove the periodCounter and the PC registry
direct consequence of switching to SQL-based counting
2024-07-16 15:08:05 +02:00
Klaas van Schelven 5ce840f62f Move period_utils to separate file 2024-07-15 14:38:35 +02:00
Klaas van Schelven 93365f4c8d Period-counting using SQL instead of custom-made (PoC)
The direct cause for this was the following observation: there was no mechanism
in place to safeguard counted events across evictions, i.e. the following order
of events was not accounted for:

* ingest/digest a bunch of events (PCs correctly updated)
* eviction (PC still correct)
* server/snappea restart (PC reloaded, but based on new events. not correct).

I though about various approaches to fix this (e.g. snapshotting) but in the end
such approaches added even more complexity to the PC mechanism. I decided to first
check how non-performant the SQL route would be, and this PoC seems to say: just
go SQL.

There's also a small semantic change (probably in the direction of what you'd
expect), namely: the periods are no longer 'calendar' periods.
2024-07-15 14:28:13 +02:00
Klaas van Schelven c42c85c050 Quota: only trigger when _over_ quota 2024-07-15 13:40:58 +02:00
Klaas van Schelven 89cb4a0594 Notes on PC before the code is removed 2024-07-15 11:11:21 +02:00
Klaas van Schelven fbee32c79a Remove some 'maybe' comments for 'drop immediately' 2024-07-15 11:02:08 +02:00
Klaas van Schelven d5bfe70488 Comments on the finer points of quota 2024-07-15 11:00:20 +02:00
Klaas van Schelven d68aff05ca Quota 2024-07-15 09:37:36 +02:00
Klaas van Schelven c403d906cd stress-test: report on errors 2024-07-15 09:26:27 +02:00
Klaas van Schelven 927bed38d0 Annotate _prev_tup and a small 'fix'
'fix' in scare-quotes, because the previous implementation of first_chunk was
wrong, but never led to actually wrong outcomes, only to one-too-many recursive
call (for seconds, minutes)
2024-07-12 14:51:03 +02:00
Klaas van Schelven 0ef72a7461 Clarify name of test 2024-07-12 12:33:03 +02:00
Klaas van Schelven 49a395fb86 use the envelope_header's DSN if it is available 2024-07-12 10:41:16 +02:00
Klaas van Schelven b5321f3685 Notes on streaming 2024-07-12 10:12:07 +02:00
Klaas van Schelven 6767ea593a Fix port number in example nginx conf 2024-07-12 08:39:50 +02:00
Klaas van Schelven eb01c61947 Fix typo in comment 2024-07-10 14:00:59 +02:00
Klaas van Schelven 90a55e522b Remove 2 'untested behavior' notes
I _think_ I meant that I had never actually seen those code-paths in action
(i.e. the note was not about automated tests but rather any kind of visual
confirmation that it worked) but I have seen that now
2024-07-10 14:00:47 +02:00
Klaas van Schelven 1ed6522126 Clarify why get_pc_registry must be done before event-creation 2024-07-09 13:27:16 +02:00
Klaas van Schelven eb23d44962 Enforce a single pc_registry for a single ingesting process
Using a pid-file that's implied by the ingestion directory.

We do this in `get_pc_registry`, i.e. on the first request. This means failure is
in the first request on the 2nd process.

Why not on startup? Because we don't have a configtest or generic on-startup location
(yet). Making _that_ could be another source of fragility, and getting e.g. the nr
of processes might be non-trivial / config-dependent.
2024-07-09 13:14:27 +02:00
Klaas van Schelven e2cb6654cf Removed stray TODO 2024-07-09 11:12:22 +02:00
Klaas van Schelven edff0e219c PeriodCounter: remove event-based approach
Replacing it with passing the thresholds on each call to `inc`.

The event-based approach was broken in a multi-process setup (such as having a separate
gunicorn and snappea), because the unmute events would be registered GUI-side
(gunicorn), and the single process where the counting happened had a different PC
instance.

The solution is to get rid of the event-listener approach, and just make an inventory of
the threshold-checks that need to be done right before each call to `inc`. Because the
calls to `inc` happen in a single process (we [will] enforce this elsewhere) this fixes
the problem.

During refactoring it became clear that this is probably a good idea anyway: many
comments about corner-cases could be removed.

Other things I found:

* The now-removed `_digest_event_python_postprocessing` did more than Python alone (it
  also touched the DB for unmutes) so that was probably a separate bug (now fixed).

* In the event-listener-based code, I foresaw the need for `on_become_false` (but did
  not use it yet). The idea was probably that this could be useful in the quota setting
  (a quota can become unmet after a while) but in fact it isn't useful, because when a
  quota becomes unmet you'd still need to check all quota and OR them.

Tests have not been truly refactored (the new architecture probably points to a new
desired set of tests) but rather have been made to run in the simplest way possible.
2024-07-09 09:31:36 +02:00