From fb66b04be97ef2924eb0ddd476170a63fbd6af89 Mon Sep 17 00:00:00 2001
From: Klaas van Schelven <klaas@vanschelven.com>
Date: Thu, 23 May 2024 14:18:22 +0200
Subject: [PATCH] Document playground.bugsink.com performance findings

---
 DESIGN-performance.md | 56 +++++++++++++++++++++++++++++++++++++++++++
 ingest/filestore.py   |  3 ++-
 2 files changed, 58 insertions(+), 1 deletion(-)
 create mode 100644 DESIGN-performance.md

diff --git a/DESIGN-performance.md b/DESIGN-performance.md
new file mode 100644
index 0000000..85f5cf8
--- /dev/null
+++ b/DESIGN-performance.md
@@ -0,0 +1,56 @@
+## Some thoughts on performance
+
+Now that we have playground.bugsink.com, I could get some real data on that system too.
+
+I suppose the most "interesting" finding is that the ~30/s events I can handle seem to be entirely limited by the
+(https?) nginx stack.
+
+This also means that, in this setup, snappea is able to deal with "postponed" work basically as fast as the frontend can
+deliver it, i.e. there is no actual backlog. Which raises some serious(?) questions about snappea in this setup.
+
+Some things I played with (more or less in order I did them):
+
+* try to remove the (physical) network from the equation by doing local-loopback
+* use compression (brotli) to avoid network overhead
+* compare with my local laptop
+* drop actual handling of the request, i.e. just do a `request.read(); return HttpResponse()`
+* remove nginx from the equation and just connect on `:8000`
+
+Some numbers:
+
+All measurements are with a 50k event.
+
+* Starting point is ~30/s. local to playground; actual (non-immediate) handling of events. varying number of gunicorn
+  and snappea workers doesn't seem to do much.
+
+* local loopback on playground.bugsink.com: ~21/s. i.e. it's slower. Presumably: the cost of running the stress test.
+
+* local loopback on playground.bugsink.com, but dropping the request on the floor: ~25/s.
+
+* compressing as brotli and doing local -> playgrond: ~18/s. Surprisingly the cost of unpacking is larger than the
+  advantage of having to deal with less data.
+
+* locally (laptop), I got to ~280/s with actual handling turned on. This is where I (slightly) outrun snappea.
+
+* locally with drop-to-floor I got to ~455/s. Noteworthy: this is not even twice as fast as the "real" (postponed)
+  handling. i.e. we're already close to our limits with that.
+
+* turning off nginx, local -> playground: 146/s. Noteworthy: this is the only thing on playground that helped me go
+  faster. But we don't actually want to recommend that, of course. Also: this is the only setup where I was able to
+  outrun snappea (for a short while). Note that tuning thread for gunicorn / stress-test matters here. (I used 25)
+
+* playground locally w/o nginx and w/ drop-to-floor: 400/s. Noteworthy: very close to what I get on my laptop.
+
+
+
+Some conclusions:
+
+* 30/s is still "a lot"; that's 2.5M/day or 77M/month, which is _more_ than the maximum Sentry allows you to select in
+  the pricing page. (50M maxes out at $5,795.50 prepaid per month)
+
+* Still, the above raises some questions on "is snappea worth it in this setup". Counterpoints (stability,
+  predictability, the fact that there may be other slow async things) still apply.
+
+* I never really got a chance to tune my setup. I did raise gunicorn workers to "enough to deal with the number of
+  threads" which was in the 16 - 32 range. But with snappea without a backlog the number of workers is not material to
+  the performance.
diff --git a/ingest/filestore.py b/ingest/filestore.py
index c7c6178..1c64470 100644
--- a/ingest/filestore.py
+++ b/ingest/filestore.py
@@ -4,6 +4,7 @@ from bugsink.app_settings import get_settings
 
 def get_filename_for_event_id(event_id):
     # TODO: the idea of having some levels of directories here (to avoid too many files in a single dir) is not yet
-    # implemented.
+    # implemented. Counterpoint: when doing stress tests, it was quite hard to get a serious backlog going (snappea was
+    # very well able to play catch-up). So this might not be necessary.
 
     return os.path.join(get_settings().INGEST_STORE_BASE_DIR, event_id)