Counting incurs looking at all records which is too expensive if you have e.g.
1_000_000 issues.
Note that we take a different approach than the one for Events (where we
count-with-timeout). Reason for switching:
https://sqlite.org/forum/forumpost/fa65709226
For Events we have a known count for the non-query case (denormalized/counted
value), so we preserve what we had there. For Issues the trouble of keeping
counts right for muted/etc. is not (currently) worth it.
This will hopefully help when getting issue-reports for those that
have not set up dogfooding.
See [Dogfooding Bugsink](https://www.bugsink.com/docs/dogfooding/)
no need to calculate event_qs_count (which is potentially expensive) if that's
not used in display.
when counting was moved from the template to the view (in 1eea9268a5) it was
made unconditional; here we restore that behavior.
When searching by tag, there is no need to join with Event; especially when
just counting results or determining first/last digest_order (for navigation).
(For the above "no need" to be actually true, digest_order was denormalized
into EventTag).
The above is implemented in `search_events_optimized`.
Further improvements:
* the bounds of `digest_order` are fetched only once; for first/last this info
is reused.
* explicitly pass `event_qs_count` to the templates
* non-event pages used to calculate a "last event" to generate a tab with a
correct event.id; since we simply have the "last" idiom, better use that.
this also makes clear the "none" idiom was never needed, we remove it again.
Results:
Locally (60K event DB, 30K events on largest issue) my testbatch now
runs in 25% of time (overall).
* The effect on the AND-ing are in fact very large (13% runtime remaining)
* The event details page is not noticably improved.
* denormalize IssueTag.key; this allows for key to be used in and index
(issue, key, count).
* rewrite to grouping-first, per-key-query-second. i.e. reverts part of
bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always
guessing correctly, and we don't dynamically determine that yet. Which
means that (in the single query version) if you'd have a per-event value for
some tag, you could end up iterating over as many values as there are events,
which won't work.
* in tags.py, do the tab-check first to avoid doing the tag-calculation twice.
* further denormalation (of key__key, of value__str) actually turns out to not
be required for both the grouping and indivdual queries to be fast.
Performance tests, as always, against sqlite3.
--
Roads not taken/background
* This commit removes a future TODO that "A point _could_ be made for
['issue', '?value?' 'count']", I tried both versions of that index
(against the group-then-query version, the only one which I trust)
but without denormalization of key, I could not get it to be fast.
* I thought about a hybrid approach (for those keys with low counts of values
do the single-query thing) but as it stands the extra complexity isn't worth
it.
---
on the 1.2M events, 3 (user defined) tags / event test env this
basically lowers the time from "seconds" to "miliseconds".
Done by denormalizing EventTag.issue, and adding that into an index. Targets:
* get-event-within-query (when it's 'last' or 'first')
* .count (of search query results)
* min/max (for the first/prev/next/last buttons)
(The min/max query's performance significantly improved by the addition of
the index, but was also rewritten into a simple SELECT rather than MIN/MAX).
When this code was written, I thought I had spectacularly improved performance.
I now believe this was based on an error in my measurements, but that this
still represents (mostly) an improvement, so I'll let it stand and will take
it from here in subsequent commits.
prompted by a user being confused about the number of events in their DB;
not 100% sure I'll keep this info here, but I'm introducing it for now
at least
In b76e474ef1, the event-navigation was changed into the next/prev idiom (I
think completely, i.e. also from the .html files, but did not check) but the
elif structure and error message did not fully reflect that (it still talked
about digest_order/id, but nav is now one of the primary methods)
I briefly considered removing the lookup-by-digest-order-only, but I figure it
may come in handy at some point (if only for users to directly edit the url)
and did not check whether this is actually unused.