89 Commits

Author SHA1 Message Date
Klaas van Schelven 524f5ea45e Issue Tag display: for low event-counts, show more tags
and for high event-counts, display a warning about what is hidden
2025-03-31 09:56:31 +02:00
Klaas van Schelven cd7f3978cf Improve tag-overview performance
* denormalize IssueTag.key; this allows for key to be used in and index
  (issue, key, count).

* rewrite to grouping-first, per-key-query-second. i.e. reverts part of
  bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always
  guessing correctly, and we don't dynamically determine that yet. Which
  means that (in the single query version) if you'd have a per-event value for
  some tag, you could end up iterating over as many values as there are events,
  which won't work.

* in tags.py, do the tab-check first to avoid doing the tag-calculation twice.

* further denormalation (of key__key, of value__str) actually turns out to not
  be required for both the grouping and indivdual queries to be fast.

Performance tests, as always, against sqlite3.

--

Roads not taken/background

* This commit removes a future TODO that "A point _could_ be made for
  ['issue', '?value?' 'count']", I tried both versions of that index
  (against the group-then-query version, the only one which I trust)
  but without denormalization of key, I could not get it to be fast.

* I thought about a hybrid approach (for those keys with low counts of values
  do the single-query thing) but as it stands the extra complexity isn't worth
  it.

---
on the 1.2M events, 3 (user defined) tags / event test env this
basically lowers the time from "seconds" to "miliseconds".
2025-03-12 14:14:05 +01:00
Klaas van Schelven 3ee6f29f9c tags: fix the indexes
this is the part I was able to do with careful reading (and rerunning the
tests); actual performance implications will be checked based on this
2025-03-07 20:59:21 +01:00
Klaas van Schelven f76d3f4f40 Merge branch 'main' into tag-search 2025-03-05 16:05:17 +01:00
Klaas van Schelven 381a5caae4 Issue.calculated_* fields: fix lengths
as in a717dd7374, but for Issue as well as Event.
The need for this was exposed by running the testsuite
against mysql; this commit fixes the tests.
2025-03-05 11:14:19 +01:00
Klaas van Schelven 0de8261440 Restore mostly_unique filter
botched in bbfee84c6a
2025-03-03 15:58:20 +01:00
Klaas van Schelven bbfee84c6a issue tags: single query rather than one-per-tag 2025-03-03 13:42:18 +01:00
Klaas van Schelven e6bc660731 Add note about per-key tag pages 2025-03-03 13:25:41 +01:00
Klaas van Schelven 1ae5bb3fd1 Tags: no cutoff when there are many
this idea was superceded by doing it explicitly in 00c49443eb
2025-03-03 13:23:52 +01:00
Klaas van Schelven 5930740e0b Tags: as a separate tab 2025-03-03 12:56:20 +01:00
Klaas van Schelven 124f90b403 'Issue Tags' box: show on all issue-related pages
now that it's no longer tied to the event...
2025-03-03 11:00:11 +01:00
Klaas van Schelven 00c49443eb Add 'mostly_unique' property to tags 2025-03-03 10:52:28 +01:00
Klaas van Schelven 7a30de3840 Issue Tags: select_related 2025-02-28 10:09:53 +01:00
Klaas van Schelven 60e25dac42 Issue Tags display: 'Other', sorting (WIP) 2025-02-28 09:48:01 +01:00
Klaas van Schelven 2c444c6e80 Display Issue (not event) tags in the RHS detail; WIP 2025-02-28 09:33:58 +01:00
Klaas van Schelven 10f8e10607 DB indexes for the issue-lits (including filters)
simply by reasoning about what they should be; no performance testing (on the issue-list
and on the event-ingestion) was done for these)
2025-02-18 10:32:06 +01:00
Klaas van Schelven 615d2da4c8 Chache stored_event_count (on Issue and Projet)
"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.

Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)

Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".

This commit also optimizes the .count() on the issue-detail event list (via
Paginator).

This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
2025-02-06 16:24:25 +01:00
Klaas van Schelven c42aa9118a Describe role of Grouping and how it relates to Issue
third time's a charm (5e5b53abed, 48307daa0f)
2025-01-31 15:25:18 +01:00
Klaas van Schelven 6497f482ae Correctly order Turningpoints (as per comment) 2024-12-16 22:04:03 +01:00
Klaas van Schelven 68f2e714d5 Fix resolve-from-list on MySQL
Mysteriously, "Truncated incorrect DOUBLE value". But we have no Double fields.
Answer: adding a value to a field (with "+") tries to convert to Double first
on MySQL. Using Concat solves it.

Showed up in all paths exept "resolved by next".

Fix #14
2024-11-22 17:32:20 +01:00
Klaas van Schelven db486adb35 Rewrite comments on 'reopen' and 'issue_is_regression' 2024-09-17 23:01:41 +02:00
Klaas van Schelven eb08bd562c When there's no (meaningful) release info, don't display it 2024-09-12 13:58:36 +02:00
Klaas van Schelven e59fd3a225 Implement 'occurs_in_last_release' 2024-09-12 09:49:22 +02:00
Klaas van Schelven 3128392d9a Distinguish ingested_at and digested_at 2024-07-18 14:45:59 +02:00
Klaas van Schelven 717a632b7d check_for_thresholds refactoring: 'metadata' is superfluous
because it was basically the input-tuple (in a different format)
2024-07-18 09:43:37 +02:00
Klaas van Schelven 65ea181f37 vbc-unmute: reduce calls to the expensive check
as done in the previous commit for project quota
2024-07-17 15:33:15 +02:00
Klaas van Schelven c01d332e18 Rename ingest_order to digest_order and clarify event_count
* issue.event_count to digested_event_count
* event.ingest_order to event.digest_order
* issue.ingest_order to digest_order

This is generally more correct/explicit, and is also in preparation
of doing work on-digest (which may or may not happen)
2024-07-16 15:23:40 +02:00
Klaas van Schelven 5ce840f62f Move period_utils to separate file 2024-07-15 14:38:35 +02:00
Klaas van Schelven 93365f4c8d Period-counting using SQL instead of custom-made (PoC)
The direct cause for this was the following observation: there was no mechanism
in place to safeguard counted events across evictions, i.e. the following order
of events was not accounted for:

* ingest/digest a bunch of events (PCs correctly updated)
* eviction (PC still correct)
* server/snappea restart (PC reloaded, but based on new events. not correct).

I though about various approaches to fix this (e.g. snapshotting) but in the end
such approaches added even more complexity to the PC mechanism. I decided to first
check how non-performant the SQL route would be, and this PoC seems to say: just
go SQL.

There's also a small semantic change (probably in the direction of what you'd
expect), namely: the periods are no longer 'calendar' periods.
2024-07-15 14:28:13 +02:00
Klaas van Schelven edff0e219c PeriodCounter: remove event-based approach
Replacing it with passing the thresholds on each call to `inc`.

The event-based approach was broken in a multi-process setup (such as having a separate
gunicorn and snappea), because the unmute events would be registered GUI-side
(gunicorn), and the single process where the counting happened had a different PC
instance.

The solution is to get rid of the event-listener approach, and just make an inventory of
the threshold-checks that need to be done right before each call to `inc`. Because the
calls to `inc` happen in a single process (we [will] enforce this elsewhere) this fixes
the problem.

During refactoring it became clear that this is probably a good idea anyway: many
comments about corner-cases could be removed.

Other things I found:

* The now-removed `_digest_event_python_postprocessing` did more than Python alone (it
  also touched the DB for unmutes) so that was probably a separate bug (now fixed).

* In the event-listener-based code, I foresaw the need for `on_become_false` (but did
  not use it yet). The idea was probably that this could be useful in the quota setting
  (a quota can become unmet after a while) but in fact it isn't useful, because when a
  quota becomes unmet you'd still need to check all quota and OR them.

Tests have not been truly refactored (the new architecture probably points to a new
desired set of tests) but rather have been made to run in the simplest way possible.
2024-07-09 09:31:36 +02:00
Klaas van Schelven fe6c955465 never_evict events that are a Historic Turning Point
Both for technical (foreign keys) and business reasons (these are events you
care about)
2024-06-24 22:50:00 +02:00
Klaas van Schelven 5e2cc0575f Retention, small fixes (from Friday) 2024-06-23 22:20:18 +02:00
Klaas van Schelven cef1127e48 Make user-model swappable
I may just need this later, and doing it this late was already painful enough.
2024-05-29 10:22:57 +02:00
Klaas van Schelven 41a4913299 Implement SNAPPEA_TASK_ALWAYS_EAGER 2024-04-19 21:41:42 +02:00
Klaas van Schelven c50780ab4e Use atomic transactions in views 2024-04-18 13:15:46 +02:00
Klaas van Schelven d75bede5dd Show current status for issues 2024-04-16 21:54:36 +02:00
Klaas van Schelven d89e3d4dd5 Add 'next-materialized historic annotation 2024-04-16 09:31:12 +02:00
Klaas van Schelven 875f306079 Reduce queries of 'history' view
* select_related for users (which are displayed in many locations)
* use 'xxx_id' if that's all you need
2024-04-15 15:06:27 +02:00
Klaas van Schelven 8e44f7f68e Unmute reason: show in email alert 2024-04-15 10:17:18 +02:00
Klaas van Schelven ad93e22fff Fix the double-creating of TurningPoints for time-based-unmute 2024-04-15 09:55:22 +02:00
Klaas van Schelven 490899975b Add tests for TurningPoint creation
this also proves one existing bug: the double-creating of TurningPoints
for time-based-unmute
2024-04-15 09:51:30 +02:00
Klaas van Schelven 280bd2172b History page: 'mostly done' (a first setup) 2024-04-12 16:07:25 +02:00
Klaas van Schelven 1cf19c83d5 Various code-clarification 2024-04-12 08:38:46 +02:00
Klaas van Schelven 4dfefec468 denormalize/cache last_frame_* and transaction on Event and Issue
for performance, but also fixes:

* not just the 'last frame' but the 'last relevant frame' (in-app)
* truncation is properly done (matching the DB size, and for each of the fields)
2024-04-10 09:12:15 +02:00
Klaas van Schelven d46cb7f6e8 DB: unique_together and PositiveIntegerField 2024-04-09 12:34:29 +02:00
Klaas van Schelven 21c4904524 Implement friendly_id 2024-04-09 11:09:31 +02:00
Klaas van Schelven 652823f8c3 Store calculated type and value on issue and event and use these values in the templates 2024-04-08 15:30:41 +02:00
Klaas van Schelven 48307daa0f Introduce 'Grouping' data-modeling 2024-04-08 11:41:15 +02:00
Klaas van Schelven d94bfa8aa6 Log Messages: my first take
they should somehow show up in the title; in the interface it should be clear that we're
dealing with log messages (rather than exceptions)
2024-04-04 15:40:31 +02:00
Klaas van Schelven f69befd20a Harmonize displayed timestamps
I picked server-time as the thing which has the most likeliness of being correct
2024-04-01 23:00:19 +02:00