Like e45c61d6f0, but for .project.
I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.
Now that we have explicit Project-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).
As a result, in the test for project deletion (which has deletes for many
of the altered models), the following 12 queries are no longer done:
```
SELECT "projects_project"."id", [..many fields..] FROM "projects_project" WHERE "projects_project"."id" = 1
DELETE FROM "projects_projectmembership" WHERE "projects_projectmembership"."project_id" IN (1)
DELETE FROM "alerts_messagingserviceconfig" WHERE "alerts_messagingserviceconfig"."project_id" IN (1)
UPDATE "releases_release" SET "project_id" = NULL WHERE "releases_release"."project_id" IN (1)
UPDATE "issues_issue" SET "project_id" = NULL WHERE "issues_issue"."project_id" IN (1)
UPDATE "issues_grouping" SET "project_id" = NULL WHERE "issues_grouping"."project_id" IN (1)
UPDATE "events_event" SET "project_id" = NULL WHERE "events_event"."project_id" IN (1)
UPDATE "tags_tagkey" SET "project_id" = NULL WHERE "tags_tagkey"."project_id" IN (1)
UPDATE "tags_tagvalue" SET "project_id" = NULL WHERE "tags_tagvalue"."project_id" IN (1)
UPDATE "tags_eventtag" SET "project_id" = NULL WHERE "tags_eventtag"."project_id" IN (1)
UPDATE "tags_issuetag" SET "project_id" = NULL WHERE "tags_issuetag"."project_id" IN (1)
```
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.
* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Projects which are delete-in-progress from the UI
* lack of DRY
* some unnessary work (needed in the Issue-context, but not here)
is still being done.
See #50
I originally thought `SET_NULL` would be a good way to "do stuff later", but
that's only so the degree that [1] updates are cheaper than deletes and [2]
2nd-order effects (further deletes in the dep-tree) are avoided.
Now that we have explicit Issue-deletion (deps-first, delayed, properly batched)
the SET_NULL behavior is always a no-op (but with cost in queries).
As a result, in the test for issue deletion (which has deletes for many
of the altered models), the following 8 queries are no longer done:
```
SELECT "issues_grouping"."id", [..many fields..] FROM "issues_grouping" WHERE "issues_grouping"."id" IN (1)
UPDATE "events_event" SET "grouping_id" = NULL WHERE "events_event"."grouping_id" IN (1)
[.. a few moments later..]
SELECT "issues_issue"."id", [..many fields..] FROM "issues_issue" WHERE "issues_issue"."id" = 'uuid'
UPDATE "issues_grouping" SET "issue_id" = NULL WHERE "issues_grouping"."issue_id" IN ('uuid')
UPDATE "issues_turningpoint" SET "issue_id" = NULL WHERE "issues_turningpoint"."issue_id" IN ('uuid')
UPDATE "events_event" SET "issue_id" = NULL WHERE "events_event"."issue_id" IN ('uuid')
UPDATE "tags_eventtag" SET "issue_id" = NULL WHERE "tags_eventtag"."issue_id" IN ('uuid')
UPDATE "tags_issuetag" SET "issue_id" = NULL WHERE "tags_issuetag"."issue_id" IN ('uuid')
```
(breaks the tests b/c of constraints and not always using factories; will fix next)
Implemented using a batch-wise dependency-scanner in delayed
(snappea) style.
* no tests yet.
* no real point-of-entry in the (regular, non-admin) UI yet.
* no hiding of Issues which are delete-in-progress from the UI
* file storage not yet cleaned up
* project issue counts not yet updated
* dangling tag values: no cleanup mechanism yet.
See #50
* denormalize IssueTag.key; this allows for key to be used in and index
(issue, key, count).
* rewrite to grouping-first, per-key-query-second. i.e. reverts part of
bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always
guessing correctly, and we don't dynamically determine that yet. Which
means that (in the single query version) if you'd have a per-event value for
some tag, you could end up iterating over as many values as there are events,
which won't work.
* in tags.py, do the tab-check first to avoid doing the tag-calculation twice.
* further denormalation (of key__key, of value__str) actually turns out to not
be required for both the grouping and indivdual queries to be fast.
Performance tests, as always, against sqlite3.
--
Roads not taken/background
* This commit removes a future TODO that "A point _could_ be made for
['issue', '?value?' 'count']", I tried both versions of that index
(against the group-then-query version, the only one which I trust)
but without denormalization of key, I could not get it to be fast.
* I thought about a hybrid approach (for those keys with low counts of values
do the single-query thing) but as it stands the extra complexity isn't worth
it.
---
on the 1.2M events, 3 (user defined) tags / event test env this
basically lowers the time from "seconds" to "miliseconds".
"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.
Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)
Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".
This commit also optimizes the .count() on the issue-detail event list (via
Paginator).
This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
Mysteriously, "Truncated incorrect DOUBLE value". But we have no Double fields.
Answer: adding a value to a field (with "+") tries to convert to Double first
on MySQL. Using Concat solves it.
Showed up in all paths exept "resolved by next".
Fix#14
* issue.event_count to digested_event_count
* event.ingest_order to event.digest_order
* issue.ingest_order to digest_order
This is generally more correct/explicit, and is also in preparation
of doing work on-digest (which may or may not happen)
The direct cause for this was the following observation: there was no mechanism
in place to safeguard counted events across evictions, i.e. the following order
of events was not accounted for:
* ingest/digest a bunch of events (PCs correctly updated)
* eviction (PC still correct)
* server/snappea restart (PC reloaded, but based on new events. not correct).
I though about various approaches to fix this (e.g. snapshotting) but in the end
such approaches added even more complexity to the PC mechanism. I decided to first
check how non-performant the SQL route would be, and this PoC seems to say: just
go SQL.
There's also a small semantic change (probably in the direction of what you'd
expect), namely: the periods are no longer 'calendar' periods.
Replacing it with passing the thresholds on each call to `inc`.
The event-based approach was broken in a multi-process setup (such as having a separate
gunicorn and snappea), because the unmute events would be registered GUI-side
(gunicorn), and the single process where the counting happened had a different PC
instance.
The solution is to get rid of the event-listener approach, and just make an inventory of
the threshold-checks that need to be done right before each call to `inc`. Because the
calls to `inc` happen in a single process (we [will] enforce this elsewhere) this fixes
the problem.
During refactoring it became clear that this is probably a good idea anyway: many
comments about corner-cases could be removed.
Other things I found:
* The now-removed `_digest_event_python_postprocessing` did more than Python alone (it
also touched the DB for unmutes) so that was probably a separate bug (now fixed).
* In the event-listener-based code, I foresaw the need for `on_become_false` (but did
not use it yet). The idea was probably that this could be useful in the quota setting
(a quota can become unmet after a while) but in fact it isn't useful, because when a
quota becomes unmet you'd still need to check all quota and OR them.
Tests have not been truly refactored (the new architecture probably points to a new
desired set of tests) but rather have been made to run in the simplest way possible.