99 Commits

Author SHA1 Message Date
Klaas van Schelven
354af7ea0a Fix issues as reported by bandit or mark as nosec
Nothing worrying, but good to have checked this regardless
and important to have a green pipeline.

Fix #175
2025-07-30 12:16:40 +02:00
Klaas van Schelven
308034aadd Issue-delete from the UI (in the list-view)
See #50
2025-07-04 21:25:57 +02:00
Klaas van Schelven
28b2ce0eaf Various models: .project SET_NULL => DO_NOTHING
Like e45c61d6f0, but for .project.

I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.

Now that we have explicit Project-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).

As a result, in the test for project deletion (which has deletes for many
of the altered models), the following 12 queries are no longer done:

```
SELECT "projects_project"."id", [..many fields..] FROM "projects_project" WHERE "projects_project"."id" = 1
DELETE FROM "projects_projectmembership" WHERE "projects_projectmembership"."project_id" IN (1)
DELETE FROM "alerts_messagingserviceconfig" WHERE "alerts_messagingserviceconfig"."project_id" IN (1)
UPDATE "releases_release" SET "project_id" = NULL WHERE "releases_release"."project_id" IN (1)
UPDATE "issues_issue" SET "project_id" = NULL WHERE "issues_issue"."project_id" IN (1)
UPDATE "issues_grouping" SET "project_id" = NULL WHERE "issues_grouping"."project_id" IN (1)
UPDATE "events_event" SET "project_id" = NULL WHERE "events_event"."project_id" IN (1)
UPDATE "tags_tagkey" SET "project_id" = NULL WHERE "tags_tagkey"."project_id" IN (1)
UPDATE "tags_tagvalue" SET "project_id" = NULL WHERE "tags_tagvalue"."project_id" IN (1)
UPDATE "tags_eventtag" SET "project_id" = NULL WHERE "tags_eventtag"."project_id" IN (1)
UPDATE "tags_issuetag" SET "project_id" = NULL WHERE "tags_issuetag"."project_id" IN (1)
```
2025-07-03 21:49:49 +02:00
Klaas van Schelven
6b9e4d8011 Project.delete_deferred(): first version (WIP)
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.

* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Projects which are delete-in-progress from the UI

* lack of DRY
* some unnessary work (needed in the Issue-context, but not here)
  is still being done.

See #50
2025-07-03 21:01:28 +02:00
Klaas van Schelven
2baa4446fd Issue deletion: event pre-deletes (storage, project-counts) 2025-07-03 13:18:28 +02:00
Klaas van Schelven
e45c61d6f0 Various models: .issue and .grouping; SET_NULL => DO_NOTHING
I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.

Now that we have explicit Issue-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).

As a result, in the test for issue deletion (which has deletes for many
of the altered models), the following 8 queries are no longer done:

```
SELECT "issues_grouping"."id", [..many fields..] FROM "issues_grouping" WHERE "issues_grouping"."id" IN (1)
UPDATE "events_event" SET "grouping_id" = NULL WHERE "events_event"."grouping_id" IN (1)

[.. a few moments later..]

SELECT "issues_issue"."id", [..many fields..] FROM "issues_issue" WHERE "issues_issue"."id" = 'uuid'
UPDATE "issues_grouping" SET "issue_id" = NULL WHERE "issues_grouping"."issue_id" IN ('uuid')
UPDATE "issues_turningpoint" SET "issue_id" = NULL WHERE "issues_turningpoint"."issue_id" IN ('uuid')
UPDATE "events_event" SET "issue_id" = NULL WHERE "events_event"."issue_id" IN ('uuid')
UPDATE "tags_eventtag" SET "issue_id" = NULL WHERE "tags_eventtag"."issue_id" IN ('uuid')
UPDATE "tags_issuetag" SET "issue_id" = NULL WHERE "tags_issuetag"."issue_id" IN ('uuid')
```

(breaks the tests b/c of constraints and not always using factories; will fix next)
2025-07-03 11:33:58 +02:00
Klaas van Schelven
e5dbeae514 Issue.delete_deferred(): first version (WIP)
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.

* no tests yet.
* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Issues which are delete-in-progress from the UI
* file storage not yet cleaned up
* project issue counts not yet updated
* dangling tag values: no cleanup mechanism yet.

See #50
2025-06-27 12:52:59 +02:00
Klaas van Schelven
aad0f624f9 Fix: issue-list indexes must have project first
because we always filter by project before ordering;

the now-removed first_seen index was simply unused
2025-05-06 22:19:31 +02:00
Klaas van Schelven
49e6700d4a Grouping.grouping_key: hash it for the index 2025-05-06 11:32:19 +02:00
Klaas van Schelven
392f5a30be Add index for Grouping.grouping_key (and project) 2025-05-05 22:45:33 +02:00
Klaas van Schelven
524f5ea45e Issue Tag display: for low event-counts, show more tags
and for high event-counts, display a warning about what is hidden
2025-03-31 09:56:31 +02:00
Klaas van Schelven
cd7f3978cf Improve tag-overview performance
* denormalize IssueTag.key; this allows for key to be used in and index
  (issue, key, count).

* rewrite to grouping-first, per-key-query-second. i.e. reverts part of
  bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always
  guessing correctly, and we don't dynamically determine that yet. Which
  means that (in the single query version) if you'd have a per-event value for
  some tag, you could end up iterating over as many values as there are events,
  which won't work.

* in tags.py, do the tab-check first to avoid doing the tag-calculation twice.

* further denormalation (of key__key, of value__str) actually turns out to not
  be required for both the grouping and indivdual queries to be fast.

Performance tests, as always, against sqlite3.

--

Roads not taken/background

* This commit removes a future TODO that "A point _could_ be made for
  ['issue', '?value?' 'count']", I tried both versions of that index
  (against the group-then-query version, the only one which I trust)
  but without denormalization of key, I could not get it to be fast.

* I thought about a hybrid approach (for those keys with low counts of values
  do the single-query thing) but as it stands the extra complexity isn't worth
  it.

---
on the 1.2M events, 3 (user defined) tags / event test env this
basically lowers the time from "seconds" to "miliseconds".
2025-03-12 14:14:05 +01:00
Klaas van Schelven
3ee6f29f9c tags: fix the indexes
this is the part I was able to do with careful reading (and rerunning the
tests); actual performance implications will be checked based on this
2025-03-07 20:59:21 +01:00
Klaas van Schelven
f76d3f4f40 Merge branch 'main' into tag-search 2025-03-05 16:05:17 +01:00
Klaas van Schelven
381a5caae4 Issue.calculated_* fields: fix lengths
as in a717dd7374, but for Issue as well as Event.
The need for this was exposed by running the testsuite
against mysql; this commit fixes the tests.
2025-03-05 11:14:19 +01:00
Klaas van Schelven
0de8261440 Restore mostly_unique filter
botched in bbfee84c6a
2025-03-03 15:58:20 +01:00
Klaas van Schelven
bbfee84c6a issue tags: single query rather than one-per-tag 2025-03-03 13:42:18 +01:00
Klaas van Schelven
e6bc660731 Add note about per-key tag pages 2025-03-03 13:25:41 +01:00
Klaas van Schelven
1ae5bb3fd1 Tags: no cutoff when there are many
this idea was superceded by doing it explicitly in 00c49443eb
2025-03-03 13:23:52 +01:00
Klaas van Schelven
5930740e0b Tags: as a separate tab 2025-03-03 12:56:20 +01:00
Klaas van Schelven
124f90b403 'Issue Tags' box: show on all issue-related pages
now that it's no longer tied to the event...
2025-03-03 11:00:11 +01:00
Klaas van Schelven
00c49443eb Add 'mostly_unique' property to tags 2025-03-03 10:52:28 +01:00
Klaas van Schelven
7a30de3840 Issue Tags: select_related 2025-02-28 10:09:53 +01:00
Klaas van Schelven
60e25dac42 Issue Tags display: 'Other', sorting (WIP) 2025-02-28 09:48:01 +01:00
Klaas van Schelven
2c444c6e80 Display Issue (not event) tags in the RHS detail; WIP 2025-02-28 09:33:58 +01:00
Klaas van Schelven
10f8e10607 DB indexes for the issue-lits (including filters)
simply by reasoning about what they should be; no performance testing (on the issue-list
and on the event-ingestion) was done for these)
2025-02-18 10:32:06 +01:00
Klaas van Schelven
615d2da4c8 Chache stored_event_count (on Issue and Projet)
"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.

Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)

Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".

This commit also optimizes the .count() on the issue-detail event list (via
Paginator).

This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
2025-02-06 16:24:25 +01:00
Klaas van Schelven
c42aa9118a Describe role of Grouping and how it relates to Issue
third time's a charm (5e5b53abed, 48307daa0f)
2025-01-31 15:25:18 +01:00
Klaas van Schelven
6497f482ae Correctly order Turningpoints (as per comment) 2024-12-16 22:04:03 +01:00
Klaas van Schelven
68f2e714d5 Fix resolve-from-list on MySQL
Mysteriously, "Truncated incorrect DOUBLE value". But we have no Double fields.
Answer: adding a value to a field (with "+") tries to convert to Double first
on MySQL. Using Concat solves it.

Showed up in all paths exept "resolved by next".

Fix #14
2024-11-22 17:32:20 +01:00
Klaas van Schelven
db486adb35 Rewrite comments on 'reopen' and 'issue_is_regression' 2024-09-17 23:01:41 +02:00
Klaas van Schelven
eb08bd562c When there's no (meaningful) release info, don't display it 2024-09-12 13:58:36 +02:00
Klaas van Schelven
e59fd3a225 Implement 'occurs_in_last_release' 2024-09-12 09:49:22 +02:00
Klaas van Schelven
3128392d9a Distinguish ingested_at and digested_at 2024-07-18 14:45:59 +02:00
Klaas van Schelven
717a632b7d check_for_thresholds refactoring: 'metadata' is superfluous
because it was basically the input-tuple (in a different format)
2024-07-18 09:43:37 +02:00
Klaas van Schelven
65ea181f37 vbc-unmute: reduce calls to the expensive check
as done in the previous commit for project quota
2024-07-17 15:33:15 +02:00
Klaas van Schelven
c01d332e18 Rename ingest_order to digest_order and clarify event_count
* issue.event_count to digested_event_count
* event.ingest_order to event.digest_order
* issue.ingest_order to digest_order

This is generally more correct/explicit, and is also in preparation
of doing work on-digest (which may or may not happen)
2024-07-16 15:23:40 +02:00
Klaas van Schelven
5ce840f62f Move period_utils to separate file 2024-07-15 14:38:35 +02:00
Klaas van Schelven
93365f4c8d Period-counting using SQL instead of custom-made (PoC)
The direct cause for this was the following observation: there was no mechanism
in place to safeguard counted events across evictions, i.e. the following order
of events was not accounted for:

* ingest/digest a bunch of events (PCs correctly updated)
* eviction (PC still correct)
* server/snappea restart (PC reloaded, but based on new events. not correct).

I though about various approaches to fix this (e.g. snapshotting) but in the end
such approaches added even more complexity to the PC mechanism. I decided to first
check how non-performant the SQL route would be, and this PoC seems to say: just
go SQL.

There's also a small semantic change (probably in the direction of what you'd
expect), namely: the periods are no longer 'calendar' periods.
2024-07-15 14:28:13 +02:00
Klaas van Schelven
edff0e219c PeriodCounter: remove event-based approach
Replacing it with passing the thresholds on each call to `inc`.

The event-based approach was broken in a multi-process setup (such as having a separate
gunicorn and snappea), because the unmute events would be registered GUI-side
(gunicorn), and the single process where the counting happened had a different PC
instance.

The solution is to get rid of the event-listener approach, and just make an inventory of
the threshold-checks that need to be done right before each call to `inc`. Because the
calls to `inc` happen in a single process (we [will] enforce this elsewhere) this fixes
the problem.

During refactoring it became clear that this is probably a good idea anyway: many
comments about corner-cases could be removed.

Other things I found:

* The now-removed `_digest_event_python_postprocessing` did more than Python alone (it
  also touched the DB for unmutes) so that was probably a separate bug (now fixed).

* In the event-listener-based code, I foresaw the need for `on_become_false` (but did
  not use it yet). The idea was probably that this could be useful in the quota setting
  (a quota can become unmet after a while) but in fact it isn't useful, because when a
  quota becomes unmet you'd still need to check all quota and OR them.

Tests have not been truly refactored (the new architecture probably points to a new
desired set of tests) but rather have been made to run in the simplest way possible.
2024-07-09 09:31:36 +02:00
Klaas van Schelven
fe6c955465 never_evict events that are a Historic Turning Point
Both for technical (foreign keys) and business reasons (these are events you
care about)
2024-06-24 22:50:00 +02:00
Klaas van Schelven
5e2cc0575f Retention, small fixes (from Friday) 2024-06-23 22:20:18 +02:00
Klaas van Schelven
cef1127e48 Make user-model swappable
I may just need this later, and doing it this late was already painful enough.
2024-05-29 10:22:57 +02:00
Klaas van Schelven
41a4913299 Implement SNAPPEA_TASK_ALWAYS_EAGER 2024-04-19 21:41:42 +02:00
Klaas van Schelven
c50780ab4e Use atomic transactions in views 2024-04-18 13:15:46 +02:00
Klaas van Schelven
d75bede5dd Show current status for issues 2024-04-16 21:54:36 +02:00
Klaas van Schelven
d89e3d4dd5 Add 'next-materialized historic annotation 2024-04-16 09:31:12 +02:00
Klaas van Schelven
875f306079 Reduce queries of 'history' view
* select_related for users (which are displayed in many locations)
* use 'xxx_id' if that's all you need
2024-04-15 15:06:27 +02:00
Klaas van Schelven
8e44f7f68e Unmute reason: show in email alert 2024-04-15 10:17:18 +02:00
Klaas van Schelven
ad93e22fff Fix the double-creating of TurningPoints for time-based-unmute 2024-04-15 09:55:22 +02:00