Improve tag-overview performance

* denormalize IssueTag.key; this allows for key to be used in and index (issue, key, count). * rewrite to grouping-first, per-key-query-second. i.e. reverts part of bbfee84c6a. Reasoning: I don't want to rely on "mostly unique" always guessing correctly, and we don't dynamically determine that yet. Which means that (in the single query version) if you'd have a per-event value for some tag, you could end up iterating over as many values as there are events, which won't work. * in tags.py, do the tab-check first to avoid doing the tag-calculation twice. * further denormalation (of key__key, of value__str) actually turns out to not be required for both the grouping and indivdual queries to be fast. Performance tests, as always, against sqlite3. -- Roads not taken/background * This commit removes a future TODO that "A point _could_ be made for ['issue', '?value?' 'count']", I tried both versions of that index (against the group-then-query version, the only one which I trust) but without denormalization of key, I could not get it to be fast. * I thought about a hybrid approach (for those keys with low counts of values do the single-query thing) but as it stands the extra complexity isn't worth it. --- on the 1.2M events, 3 (user defined) tags / event test env this basically lowers the time from "seconds" to "miliseconds".
2026-05-07 23:39:59 -05:00 · 2025-03-11 22:06:27 +01:00
parent b031792784
commit cd7f3978cf
5 changed files with 42 additions and 31 deletions
@@ -1,7 +1,6 @@
 import json
 import uuid
 from functools import partial
-from itertools import groupby

 from django.db import models, transaction
 from django.db.models import F, Value
@@ -128,22 +127,25 @@ class Issue(models.Model):
    def _get_issue_tags(self, other_cutoff, other_label):
        result = []

-        all_issue_tags = (
-            IssueTag.objects
-            .filter(issue=self, value__key__mostly_unique=False)  # note: project is implied through issue
-            .order_by("value__key__key", "-count")
-            .select_related("value", "value__key"))
+        ds = self.tags \
+            .filter(key__mostly_unique=False)\
+            .values("key")\
+            .annotate(count_sum=models.Sum("count"))\
+            .distinct()\
+            .order_by("key__key")

-        # NOTE: at some point in the past we did grouping in SQL first; and then did 1 per-group query to get the
-        # the actual values. This was less efficient in terms of the number of queries, but had the advantage of quickly
-        # stopping work when there were very many values for a certain key. We now rely on the 'mostly_unique' property
-        # for the line below to not do way too much work; if it ever turns out to be so, we may even push the
-        # 'mostly_unique' property to be dynamically calculated (setting it to True when very many values are observed)
-        grouped = groupby(all_issue_tags, lambda x: x.value.key.key)
+        for d in ds:
+            issue_tags = [
+                issue_tag
+                for issue_tag in
+                (IssueTag.objects
+                 .filter(issue=self, key=d['key'])  # note: project is implied through issue
+                 .order_by("-count")
+                 .select_related("value", "key")[:other_cutoff + 1]  # +1 to see if we need to add "Other"
+                 )
+            ]

-        for key, group in grouped:
-            issue_tags = list(group)
-            total_seen = sum(issue_tag.count for issue_tag in issue_tags)
+            total_seen = d["count_sum"]

            seen_till_now = 0
            if len(issue_tags) > other_cutoff: