"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.
Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)
Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".
This commit also optimizes the .count() on the issue-detail event list (via
Paginator).
This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
Triggered by issue_event_list being more than 5s on "emu" (my 1,500,000 event
test-machine). Reason: sorting those events on non-indexed field. Switching
to a field-with-index solved it.
I then analysed (grepped) for "ordering" and "order_by" and set indexes
accordingly and more or less indiscriminately (i.e. even on tables that are
assumed to have relatively few rows, such as Project & Team).
## Goal
Reduce the number of migrations for _fresh installs_ of Bugsink. This implies: squash as
broadly as possible.
## How?
"throw-away-and-rerun". In particular, for a given app:
* throw away the migrations from some starting point up until and including the last one.
* run "makemigrations" for that app. Django will see what's missing and just redo it
* rename to 000n_b_squashed or similar.
* manually set a `replaces` list on the migration to the just-removed migrations
* manually check dependencies; check that they are:
* as low as possible, e.g. an FK should only depend on existence. this reduces the
risk of circular dependencies.
* pointing to "original migrations", i.e. not to a just-created squashed migration.
because the squashed migrations "contain a lot" they increase the risk of circular
dependencies.
* restore (git checkout) the thrown-away migration
## Further tips:
* "Some starting point" is often not 0000, but some higher number (see e.g. the outcome
in the present commit). Leaving the migrations for creation of base models (Event,
Issue, Project) in place saves you from a lot of circular dependency problems.
* Move db.sqlite3 out of the way to avoid superfluous warnings.
## RunPython worries
I grepped for RunPython in the replaced migrations, with the following results:
* phonehome's create_installation_id was copied-over to the squashed migration.
* all others where ignored, because:
* they "do something with events", i.e. only when events are present will they have
an effect. This means they are no-ops for _new installs_.
* for existing installs, for any given app, they will only be missed (replaced) when
the first replaced migration is not yet executed.
I used the following command (reading from the bottom) to establish that this means only
people that did a fresh install after 8ad6059722 (June 14, 2024), but before
c01d332e18 (July 16) _and then never did any upgrades_ would be affected. There are no
such people.
git log --name-only \
events/migrations/0004_event_irrelevance_for_retention.py \
issues/migrations/0004_rename_event_count_issue_digested_event_count.py \
phonehome/migrations/0001_initial.py \
projects/migrations/0002_initial.py \
teams/migrations/0001_initial.py
Note that the above observation still be true for the next squashmigration (assuming
squashing starting at the same starting migrations).
## Cleanup of the replaced migrations
Django says:
> Once you’ve squashed your migration, you should then commit it alongside the
> migrations it replaces and distribute this change to all running instances of your
> application, making sure that they run migrate to store the change in their database.
Given that I'm not in control of all running instances of my application, this means the
cleanup must not happen "too soon", and only after announcing a migration path ("update
to version X before updating to version Y").
## Roads not taken
Q: Why not just do squashmigrations? A: It didn't work reliably (for me), presumably b/c
of the high number of strongly interdependant apps in combination with some RunPython.
Seen after I was mostly done, not explored seriously (yet):
* https://github.com/3YOURMIND/django-replace-migrations
* https://pypi.org/project/django-squash/
* https://django-extensions.readthedocs.io/en/latest/delete_squashed_migrations.html
during user-testing, it was revealed that people think there is something
wrong when they see 'DivisionByZero' when this is in fact precisely what
was intended. Hopefully the new text removes this confusion
Fixes#7
I could not find any documentation on what the "standard" is, but I know that
we'll pick it up just fine either way (because Django's UUID field does the
magic for us).
Given that it's impossible to setup your JS client with the dashes, they should
simply be removed.
This was probably about the making Project.objects.get(id, sentry_key)
more efficient, but
* I don't have an indication it's a bottleneck
* It may very well be turned into a get-by-id, check-for-key idiom,
in which case the index won't help.
For now, sticking to "how it actually works" is more important than "what we
might need in the future". We don't have an admin-group with actually handed
out perms yet, and we do have the recommendation to create a superuser in the
installation docs. So let's just explicity check for that.
(I found these checks while working on actual breakage in the previous commit)
In the previous commit I put the code for a small performance-experiment.
The results are (very) obvious: don't do this. Response times go through
the roof, and more importantly, the server becomes unreliable. Reason:
time-outs caused by waiting for the write-lock.