this command to store all events on the local filesystem was useful
while 'scaffolding'; getting my hands on some initial event-data in
the early days of Bugsink, but it was never meant as a permanent tool
An (optional) way to store the `event_data` (full event as JSON)
outside the DB. This is expected to be useful for larger setups,
because it gives you:
* A more portable database (e.g. backups); (depeding on event size
the impact on your DB is ~50x.
* Less worries about hitting "physical" limits (e.g. disk size, max
file size) for your DB.
Presumably (more testing will happen going forwards) it will:
* Speed up migrations (especially on sqlite, which does full table
copies)
* Speed up event ingestion(?)
Further improvements in this commit:
* `delete_with_limit` was removed; this removes one tie-in to MySQL/Sqlite
(See #21 for this bullet)
It was optional in anticipation of other methods of eviction, but YAGNI,
and the idea of evicting in batches of 500 is baked in quite hard (for
good reasons).
Avoiding any (1406, "Data too long for column ...") on MySQL.
For the 'plainly provided' fields I followed the documented maximums which are
also our DB maximums. For calculated_* I harmonized with what Sentry &
GlitchTip both do (and which was already partially reflected in the code), i.e.
128 and 1024.
"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.
Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)
Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".
This commit also optimizes the .count() on the issue-detail event list (via
Paginator).
This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
"in normal environments" this shouldn't be necessary. But I recently played with
(backwards) migrations, and thus violated the expectation that "adding RunPython to the
squashed migrations will not be necessary because no one will be having events _and_
using the squashed migration. (I still think that general idea holds, but it won't if
you move back in time through the migrations explicitly).
I considered adding a warning to the *_b_* migrations, but in the end considered it
unnessary, since I would be the only user of such a warning. (Just adding the warning to
the *_b_* migrations is enough, since moving back through the RunPython-route is already
impossible because 'backwards' is not implemented in those as per:
```
git ls-files | grep py | x grep RunPython -l
```
## Goal
Reduce the number of migrations for _fresh installs_ of Bugsink. This implies: squash as
broadly as possible.
## How?
"throw-away-and-rerun". In particular, for a given app:
* throw away the migrations from some starting point up until and including the last one.
* run "makemigrations" for that app. Django will see what's missing and just redo it
* rename to 000n_b_squashed or similar.
* manually set a `replaces` list on the migration to the just-removed migrations
* manually check dependencies; check that they are:
* as low as possible, e.g. an FK should only depend on existence. this reduces the
risk of circular dependencies.
* pointing to "original migrations", i.e. not to a just-created squashed migration.
because the squashed migrations "contain a lot" they increase the risk of circular
dependencies.
* restore (git checkout) the thrown-away migration
## Further tips:
* "Some starting point" is often not 0000, but some higher number (see e.g. the outcome
in the present commit). Leaving the migrations for creation of base models (Event,
Issue, Project) in place saves you from a lot of circular dependency problems.
* Move db.sqlite3 out of the way to avoid superfluous warnings.
## RunPython worries
I grepped for RunPython in the replaced migrations, with the following results:
* phonehome's create_installation_id was copied-over to the squashed migration.
* all others where ignored, because:
* they "do something with events", i.e. only when events are present will they have
an effect. This means they are no-ops for _new installs_.
* for existing installs, for any given app, they will only be missed (replaced) when
the first replaced migration is not yet executed.
I used the following command (reading from the bottom) to establish that this means only
people that did a fresh install after 8ad6059722 (June 14, 2024), but before
c01d332e18 (July 16) _and then never did any upgrades_ would be affected. There are no
such people.
git log --name-only \
events/migrations/0004_event_irrelevance_for_retention.py \
issues/migrations/0004_rename_event_count_issue_digested_event_count.py \
phonehome/migrations/0001_initial.py \
projects/migrations/0002_initial.py \
teams/migrations/0001_initial.py
Note that the above observation still be true for the next squashmigration (assuming
squashing starting at the same starting migrations).
## Cleanup of the replaced migrations
Django says:
> Once you’ve squashed your migration, you should then commit it alongside the
> migrations it replaces and distribute this change to all running instances of your
> application, making sure that they run migrate to store the change in their database.
Given that I'm not in control of all running instances of my application, this means the
cleanup must not happen "too soon", and only after announcing a migration path ("update
to version X before updating to version Y").
## Roads not taken
Q: Why not just do squashmigrations? A: It didn't work reliably (for me), presumably b/c
of the high number of strongly interdependant apps in combination with some RunPython.
Seen after I was mostly done, not explored seriously (yet):
* https://github.com/3YOURMIND/django-replace-migrations
* https://pypi.org/project/django-squash/
* https://django-extensions.readthedocs.io/en/latest/delete_squashed_migrations.html
An event always has a single (automatically calculated) Grouping associated with it.
We add this info to the Event model (we'll soon display it in the UI, and as per the
now-removed comment it's simply the consistent thing to do)
* As per the spec 'takes precendence'
* Also fixes the reported bug on Laravel, which apparently doesn't send event_id
as part of the event payload.
* Fixes the envelope tests (they were doing nothing when I moved the
data samples around recently)
* Adds a 'no event_id in data, but yes in envelope' test to that test.
* Adds handling to send_json such that we can send envelopes when the event_id
is missing from the event data.