Commit Graph

144 Commits

Author SHA1 Message Date
Klaas van Schelven f76d3f4f40 Merge branch 'main' into tag-search 2025-03-05 16:05:17 +01:00
Klaas van Schelven f25d693804 Add postgres to GitHub CI
A first step towards (experimental) postgres support, see #21
2025-03-05 12:20:51 +01:00
Klaas van Schelven b98c1d4f44 MySQL recover from IntegrityError
as we do for the sqlite case

the need for this was revealed by running the testsuite against mysql
2025-03-05 11:04:41 +01:00
Klaas van Schelven 4cde74d7cb Event search: first version 2025-03-04 13:51:56 +01:00
Klaas van Schelven a00a815261 Merge branch 'main' into tag-search 2025-03-03 15:02:13 +01:00
Klaas van Schelven c8ecf508de Tags: on event details page show calculated tags
(not just the explicitly provided ones)
2025-03-03 11:29:07 +01:00
Klaas van Schelven adf92f6b1b make_consistent: update has_releases when needed
See #50
2025-03-03 09:06:31 +01:00
Klaas van Schelven e10c1bf7ca Remove 'store_events' command
this command to store all events on the local filesystem was useful
while 'scaffolding'; getting my hands on some initial event-data in
the early days of Bugsink, but it was never meant as a permanent tool
2025-02-27 15:14:05 +01:00
Klaas van Schelven 7a19e2d277 Tags; deducing tags; search on tags; WIP 2025-02-27 13:12:49 +01:00
Klaas van Schelven 4b7ed8f4ec Rename get_contexts_enriched_with_ua
more closely match what's going on
2025-02-26 18:20:52 +01:00
Klaas van Schelven 073bc7aaec make_consistent: add --dry-run 2025-02-19 12:06:16 +01:00
Klaas van Schelven f4250d2db8 make_consistent: document possible way forward 2025-02-19 11:56:21 +01:00
Klaas van Schelven 92425890cd make_consistent: take 'points to missing' into account while deleting
also: switch a number of deletions from in-the-loop deletions to
delete-using-sql.
2025-02-19 11:42:44 +01:00
Klaas van Schelven 14bc3688c7 retention: deletion counts, more defensive idiom
the dict as returned by Django won't contain 'events.Event' if none are deleted;
no observed bug for this line, but good measure to fix it anyway
2025-02-17 21:36:08 +01:00
Klaas van Schelven 3a1fe9acec allow long-running queries on 'nuke' and 'make_consistent' 2025-02-17 21:32:39 +01:00
Klaas van Schelven 5766fb8485 Make_consistent: run in a transaction
'Nice consistency you got here. Be a shame if anything happened to it.'
2025-02-17 21:11:44 +01:00
Klaas van Schelven e37274c9aa nuke_events: improvements
* better name
* better confirmation box
* more complete deletion (turning points, groupings)
* run in transaction
2025-02-17 21:00:12 +01:00
Klaas van Schelven b4bf6d01c3 Make import lazy for performance reasons ('cold start') 2025-02-17 13:31:03 +01:00
Klaas van Schelven d3342f2671 Rename to event-storage for consistency 2025-02-14 17:08:04 +01:00
Klaas van Schelven 212882e65d Add cleanup_eventstorage command 2025-02-14 17:06:56 +01:00
Klaas van Schelven bfd6610f83 Add migrate_to_current_eventstore command 2025-02-14 14:34:29 +01:00
Klaas van Schelven 3ccef7fd50 FileEventStorage: create dir on-demand; fix and add tests 2025-02-12 21:19:18 +01:00
Klaas van Schelven 5559fba754 Introduce FileEventStorage
An (optional) way to store the `event_data` (full event as JSON)
outside the DB. This is expected to be useful for larger setups,
because it gives you:

* A more portable database (e.g. backups); (depeding on event size
  the impact on your DB is ~50x.
* Less worries about hitting "physical" limits (e.g. disk size, max
  file size) for your DB.

Presumably (more testing will happen going forwards) it will:

* Speed up migrations (especially on sqlite, which does full table
  copies)
* Speed up event ingestion(?)

Further improvements in this commit:

* `delete_with_limit` was removed; this removes one tie-in to MySQL/Sqlite
    (See #21 for this bullet)
2025-02-12 17:11:24 +01:00
Klaas van Schelven 4921f5a05d Remove retention/eviction test-scripts
we've been having 'the real thing' for a while, no more need for
'scaffolding'
2025-02-12 09:55:28 +01:00
Klaas van Schelven b3eb5acc04 Retention/eviction: add a minimal test 2025-02-12 09:53:36 +01:00
Klaas van Schelven 9f61602fc1 Retention, internal: make max_event_count non-optional
It was optional in anticipation of other methods of eviction, but YAGNI,
and the idea of evicting in batches of 500 is baked in quite hard (for
good reasons).
2025-02-12 09:00:12 +01:00
Klaas van Schelven 14d8e9e2fb Fix: remove missed line
Should have been removed in cc861c8ba3
2025-02-11 11:00:23 +01:00
Klaas van Schelven cc861c8ba3 Remove 2 fields that were "temporary [..] to get a sense of the shape of the data 2025-02-09 21:12:25 +01:00
Klaas van Schelven a717dd7374 Truncate input-data that exceeds max_length
Avoiding any (1406, "Data too long for column ...") on MySQL.

For the 'plainly provided' fields I followed the documented maximums which are
also our DB maximums. For calculated_* I harmonized with what Sentry &
GlitchTip both do (and which was already partially reflected in the code), i.e.
128 and 1024.
2025-02-08 21:21:55 +01:00
Klaas van Schelven 561c1d324a event.data getters
in preparation for scenarios where the dumped data is not stored in the DB
2025-02-07 17:09:36 +01:00
Klaas van Schelven 12d7ce5629 Flake8: for migrations _just_ ignore the whitespace errors
this helps catching some _real_ errors while saving us from having to format
automatically generated code
2025-02-06 16:41:43 +01:00
Klaas van Schelven 615d2da4c8 Chache stored_event_count (on Issue and Projet)
"possibly expensive" turned out to be "actually expensive". On 'emu', with 1.5M
events, the counts take 85 and 154 ms for Project and Issue respectively;
bottlenecking our digestion to ~3 events/s.

Note: this is single-issue, single-project (presumably, the cost would be lower
for more spread-out cases)

Note on indexes: Event already has indexes for both Project & Issue (though as
the first item in a multi-column index). Without checking further: that appears
to not "magically solve counting".

This commit also optimizes the .count() on the issue-detail event list (via
Paginator).

This commit also slightly changes the value passed as `stored_event_count` to
be used for `get_random_irrelevance` to be the post-evication value. That won't
matter much in practice, but is slightly more correct IMHO.
2025-02-06 16:24:25 +01:00
Klaas van Schelven 95e8aba23e make_consistent: set never_evict
"in normal environments" this shouldn't be necessary. But I recently played with
(backwards) migrations, and thus violated the expectation that "adding RunPython to the
squashed migrations will not be necessary because no one will be having events _and_
using the squashed migration. (I still think that general idea holds, but it won't if
you move back in time through the migrations explicitly).

I considered adding a warning to the *_b_* migrations, but in the end considered it
unnessary, since I would be the only user of such a warning. (Just adding the warning to
the *_b_* migrations is enough, since moving back through the RunPython-route is already
impossible because 'backwards' is not implemented in those as per:

```
git ls-files | grep py | x grep RunPython -l
```
2025-02-06 14:04:41 +01:00
Klaas van Schelven 0b42d3ff1e Semi-manual squash-migrations
## Goal

Reduce the number of migrations for _fresh installs_ of Bugsink. This implies: squash as
broadly as possible.

## How?

"throw-away-and-rerun". In particular, for a given app:

* throw away the migrations from some starting point up until and including the last one.
* run "makemigrations" for that app. Django will see what's missing and just redo it
* rename to 000n_b_squashed or similar.
* manually set a `replaces` list on the migration to the just-removed migrations
* manually check dependencies; check that they are:
    * as low as possible, e.g. an FK should only depend on existence. this reduces the
      risk of circular dependencies.
    * pointing to "original migrations", i.e. not to a just-created squashed migration.
      because the squashed migrations "contain a lot" they increase the risk of circular
      dependencies.
* restore (git checkout) the thrown-away migration

## Further tips:

* "Some starting point" is often not 0000, but some higher number (see e.g. the outcome
  in the present commit). Leaving the migrations for creation of base models (Event,
  Issue, Project) in place saves you from a lot of circular dependency problems.
* Move db.sqlite3 out of the way to avoid superfluous warnings.

## RunPython worries

I grepped for RunPython in the replaced migrations, with the following results:

* phonehome's create_installation_id was copied-over to the squashed migration.
* all others where ignored, because:
    * they "do something with events", i.e. only when events are present will they have
      an effect. This means they are no-ops for _new installs_.
    * for existing installs, for any given app, they will only be missed (replaced) when
      the first replaced migration is not yet executed.

I used the following command (reading from the bottom) to establish that this means only
people that did a fresh install after 8ad6059722 (June 14, 2024), but before
c01d332e18 (July 16) _and then never did any upgrades_ would be affected. There are no
such people.

git log --name-only \
    events/migrations/0004_event_irrelevance_for_retention.py \
    issues/migrations/0004_rename_event_count_issue_digested_event_count.py \
    phonehome/migrations/0001_initial.py \
    projects/migrations/0002_initial.py \
    teams/migrations/0001_initial.py

Note that the above observation still be true for the next squashmigration (assuming
squashing starting at the same starting migrations).

## Cleanup of the replaced migrations

Django says:

> Once you’ve squashed your migration, you should then commit it alongside the
> migrations it replaces and distribute this change to all running instances of your
> application, making sure that they run migrate to store the change in their database.

Given that I'm not in control of all running instances of my application, this means the
cleanup must not happen "too soon", and only after announcing a migration path ("update
to version X before updating to version Y").

## Roads not taken

Q: Why not just do squashmigrations? A: It didn't work reliably (for me), presumably b/c
of the high number of strongly interdependant apps in combination with some RunPython.

Seen after I was mostly done, not explored seriously (yet):

* https://github.com/3YOURMIND/django-replace-migrations
* https://pypi.org/project/django-squash/
* https://django-extensions.readthedocs.io/en/latest/delete_squashed_migrations.html
2025-02-03 16:06:17 +01:00
Klaas van Schelven 0ec809cbb3 Simplify migration deps and document them 2025-02-03 14:04:44 +01:00
Klaas van Schelven 9ee623de6b Add Event.grouping field and fill it
An event always has a single (automatically calculated) Grouping associated with it.
We add this info to the Event model (we'll soon display it in the UI, and as per the
now-removed comment it's simply the consistent thing to do)
2025-01-31 16:16:07 +01:00
Klaas van Schelven cf3b588eb7 Setting of value: outside 'try'
if inside, an error in self.get_tenant() results in an
non-executable finally block.

Also: make the 'del' more defensive by making it a 'pop'
2025-01-29 13:37:31 +01:00
Klaas van Schelven a54ec29433 Reraise rather than assert-for-nonexception
this expresses more clearly what the intention is, and the result is more clear too
(one less level of exception-chaining)
2025-01-17 17:04:20 +01:00
Klaas van Schelven a5bc27032a Visualize trimmed data ('x items trimmed')
Fix #18

Similar to [the request for the same feature in Sentry](https://github.com/getsentry/sentry/issues/68426)

SDK-side complaints:

* https://github.com/getsentry/sentry-python/issues/377
* https://github.com/getsentry/sentry-python/issues/805
* https://github.com/getsentry/sentry-python/issues/1041
* https://github.com/getsentry/sentry-python/issues/1105
* https://github.com/getsentry/sentry-python/issues/2121
* https://github.com/getsentry/sentry-python/issues/2682
* https://github.com/getsentry/sentry-python/issues/3209
* https://github.com/getsentry/sentry-python/issues/3634
* https://github.com/getsentry/sentry-python/issues/3740
2024-12-18 17:05:26 +01:00
Klaas van Schelven b597d91af7 Become robust for lack of .values key in exception
Fix #16
2024-12-16 11:22:32 +01:00
Klaas van Schelven fce40f0581 Add some cleanup of objects to make_consistent command 2024-12-13 16:07:01 +01:00
Klaas van Schelven 1455d4dbeb UA parsing: deal with lists
See #13
2024-11-22 09:21:36 +01:00
Klaas van Schelven d7b46265d1 UA Agent Parsing should never crash the event-view
See #13
2024-11-15 10:55:16 +01:00
Klaas van Schelven 3222c0d85e Fix the tests (missing initial data in transactiontestcase) 2024-11-15 10:14:01 +01:00
Klaas van Schelven c56611bc82 Note that MySQL supports DELETE w/ LIMIT too 2024-10-09 09:58:47 +02:00
Klaas van Schelven f2a78fed9d Use the envelope's event_id when using the envelope endpoint
* As per the spec 'takes precendence'
* Also fixes the reported bug on Laravel, which apparently doesn't send event_id
  as part of the event payload.
* Fixes the envelope tests (they were doing nothing when I moved the
  data samples around recently)
* Adds a 'no event_id in data, but yes in envelope' test to that test.
* Adds handling to send_json such that we can send envelopes when the event_id
  is missing from the event data.
2024-09-18 11:36:47 +02:00
Klaas van Schelven 278630b529 Show Browser and OS info, and Contexts more generally 2024-09-16 16:55:34 +02:00
Klaas van Schelven eec8d51491 Remove various non-TODOs
either already done, or more of a 'this is a way this code could potentially
evolve in the future' (but not a 'we must do this')
2024-09-13 10:05:22 +02:00
Klaas van Schelven ff618099dc event-model TODOs: reorganize and reflect that these are not TODOs for now 2024-09-13 09:51:30 +02:00
Klaas van Schelven 1bfac5d8c6 assertEquals -> assertEual (Python 3.12)
<<insert remarks about fashion police>>
2024-08-21 08:49:49 +02:00