From fb66b04be97ef2924eb0ddd476170a63fbd6af89 Mon Sep 17 00:00:00 2001 From: Klaas van Schelven Date: Thu, 23 May 2024 14:18:22 +0200 Subject: [PATCH] Document playground.bugsink.com performance findings --- DESIGN-performance.md | 56 +++++++++++++++++++++++++++++++++++++++++++ ingest/filestore.py | 3 ++- 2 files changed, 58 insertions(+), 1 deletion(-) create mode 100644 DESIGN-performance.md diff --git a/DESIGN-performance.md b/DESIGN-performance.md new file mode 100644 index 0000000..85f5cf8 --- /dev/null +++ b/DESIGN-performance.md @@ -0,0 +1,56 @@ +## Some thoughts on performance + +Now that we have playground.bugsink.com, I could get some real data on that system too. + +I suppose the most "interesting" finding is that the ~30/s events I can handle seem to be entirely limited by the +(https?) nginx stack. + +This also means that, in this setup, snappea is able to deal with "postponed" work basically as fast as the frontend can +deliver it, i.e. there is no actual backlog. Which raises some serious(?) questions about snappea in this setup. + +Some things I played with (more or less in order I did them): + +* try to remove the (physical) network from the equation by doing local-loopback +* use compression (brotli) to avoid network overhead +* compare with my local laptop +* drop actual handling of the request, i.e. just do a `request.read(); return HttpResponse()` +* remove nginx from the equation and just connect on `:8000` + +Some numbers: + +All measurements are with a 50k event. + +* Starting point is ~30/s. local to playground; actual (non-immediate) handling of events. varying number of gunicorn + and snappea workers doesn't seem to do much. + +* local loopback on playground.bugsink.com: ~21/s. i.e. it's slower. Presumably: the cost of running the stress test. + +* local loopback on playground.bugsink.com, but dropping the request on the floor: ~25/s. + +* compressing as brotli and doing local -> playgrond: ~18/s. Surprisingly the cost of unpacking is larger than the + advantage of having to deal with less data. + +* locally (laptop), I got to ~280/s with actual handling turned on. This is where I (slightly) outrun snappea. + +* locally with drop-to-floor I got to ~455/s. Noteworthy: this is not even twice as fast as the "real" (postponed) + handling. i.e. we're already close to our limits with that. + +* turning off nginx, local -> playground: 146/s. Noteworthy: this is the only thing on playground that helped me go + faster. But we don't actually want to recommend that, of course. Also: this is the only setup where I was able to + outrun snappea (for a short while). Note that tuning thread for gunicorn / stress-test matters here. (I used 25) + +* playground locally w/o nginx and w/ drop-to-floor: 400/s. Noteworthy: very close to what I get on my laptop. + + + +Some conclusions: + +* 30/s is still "a lot"; that's 2.5M/day or 77M/month, which is _more_ than the maximum Sentry allows you to select in + the pricing page. (50M maxes out at $5,795.50 prepaid per month) + +* Still, the above raises some questions on "is snappea worth it in this setup". Counterpoints (stability, + predictability, the fact that there may be other slow async things) still apply. + +* I never really got a chance to tune my setup. I did raise gunicorn workers to "enough to deal with the number of + threads" which was in the 16 - 32 range. But with snappea without a backlog the number of workers is not material to + the performance. diff --git a/ingest/filestore.py b/ingest/filestore.py index c7c6178..1c64470 100644 --- a/ingest/filestore.py +++ b/ingest/filestore.py @@ -4,6 +4,7 @@ from bugsink.app_settings import get_settings def get_filename_for_event_id(event_id): # TODO: the idea of having some levels of directories here (to avoid too many files in a single dir) is not yet - # implemented. + # implemented. Counterpoint: when doing stress tests, it was quite hard to get a serious backlog going (snappea was + # very well able to play catch-up). So this might not be necessary. return os.path.join(get_settings().INGEST_STORE_BASE_DIR, event_id)