Compare commits

...

6 Commits

Author SHA1 Message Date
Bhagya Amarasinghe
85e4af22c9 docs: detail current envoy route coverage 2026-03-26 20:20:30 +05:30
Bhagya Amarasinghe
dd0bcc7bf5 fix: expand envoy demo coverage 2026-03-26 19:53:51 +05:30
Bhagya Amarasinghe
b843efd74e docs: add envoy poc meeting brief 2026-03-26 18:41:12 +05:30
Bhagya Amarasinghe
1dfa97f341 docs: add envoy poc demo runbook 2026-03-26 18:35:12 +05:30
Bhagya Amarasinghe
c8f31770b8 fix: improve envoy rate-limit burst checks 2026-03-26 18:08:29 +05:30
Bhagya Amarasinghe
ab6a90b109 chore: add envoy rate-limit poc support 2026-03-25 11:40:17 +05:30
6 changed files with 1241 additions and 0 deletions

View File

@@ -0,0 +1,85 @@
# Envoy Rate-Limit POC Route Inventory
This document maps the current Redis-backed rate-limit surface to the Envoy Gateway staging POC for `formbricks/internal#1483`.
## Gateway-managed in the POC
### IP-keyed public traffic
- `auth.login`
- App config: `rateLimitConfigs.auth.login`
- App behavior: `10 / 15 minutes`
- Gateway POC: `POST /api/auth/callback/credentials`
- Gateway note: approximated as `40 / hour` because Envoy Gateway global rate limits only support whole-unit windows.
- `auth.verifyEmail`
- App config: `rateLimitConfigs.auth.verifyEmail`
- App behavior: `10 / hour`
- Gateway POC: `POST /api/auth/callback/token`
- `api.client`
- App config: `rateLimitConfigs.api.client`
- App behavior: `100 / minute`
- Gateway POC:
- `^/api/v1/client/[^/]+/(environment|responses(?:/[^/]+)?|displays|user)$`
- `^/api/v2/client/[^/]+/responses(?:/[^/]+)?$`
- `^/api/v2/client/[^/]+/displays$`
- `storage.upload`
- App config: `rateLimitConfigs.storage.upload`
- App behavior: `5 / minute`
- Gateway POC:
- `POST ^/api/v1/client/[^/]+/storage$`
- `POST ^/api/v2/client/[^/]+/storage$`
### Header-keyed API traffic
- `api.v1`
- App config: `rateLimitConfigs.api.v1`
- App behavior: `100 / minute`
- Gateway POC:
- `^/api/v1/management/` when `x-api-key` is present
- `^/api/v1/webhooks/` when `x-api-key` is present
- `storage.upload`
- App config: `rateLimitConfigs.storage.upload`
- App behavior: `5 / minute`
- Gateway POC:
- `POST /api/v1/management/storage` when `x-api-key` is present
- `storage.delete`
- App config: `rateLimitConfigs.storage.delete`
- App behavior: `5 / minute`
- Gateway POC:
- `DELETE ^/storage/[^/]+/(public|private)/.+$` when `x-api-key` is present
## Left in the app on purpose
- `rateLimitConfigs.auth.signup`
- `rateLimitConfigs.auth.forgotPassword`
- profile email update actions
- follow-up dispatch
- link survey email sending
- license recheck
- user/session/org keyed authenticated flows
- all runtime logic in:
- `apps/web/app/lib/api/with-api-logging.ts`
- `apps/web/modules/auth/lib/authOptions.ts`
- `apps/web/modules/core/rate-limit/rate-limit-configs.ts`
## Negative controls
- `/api/v1/client/og` must stay unthrottled at the gateway layer.
- `/api/v2/health` stays outside the gateway path for the staging POC.
- `OPTIONS` stays unthrottled because Envoy policy rules only match the explicitly listed methods.
## How to interpret failures
- Gateway `429`
- look for `x-envoy-ratelimited`
- body will not use the Formbricks `code: "too_many_requests"` JSON shape
- App `429`
- V1 responses use `apps/web/app/lib/api/response.ts`
- V2 responses use `apps/web/modules/api/v2/lib/response.ts`
- V3 responses use `apps/web/app/api/v3/lib/response.ts`

237
scripts/rate-limit/DEMO.md Normal file
View File

@@ -0,0 +1,237 @@
# Envoy POC Demo Runbook
This runbook is for a live staging demo of the Envoy Gateway rate-limit POC.
## Demo Goal
Show four things:
1. the selected staging routes now traverse the gateway path
2. public client traffic is rate-limited at the gateway
3. API-key-authenticated management traffic is rate-limited at the gateway
4. excluded routes remain unthrottled by the gateway policy set
## Required Inputs
- `ENVIRONMENT_ID`
- staging environment ID
- `API_KEY`
- single-environment staging API key
## Demo Script
Use [demo.sh](/Users/bhagya/work/formbricks/formbricks/scripts/rate-limit/demo.sh).
Supported modes:
- `preflight`
- `public`
- `management`
- `negative`
- `evidence`
- `all`
### Full Demo
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
ENVIRONMENT_ID='<environment_id>' \
API_KEY='<api_key>' \
PUBLIC_COUNT=125 \
PUBLIC_CONCURRENCY=20 \
MANAGEMENT_COUNT=200 \
MANAGEMENT_CONCURRENCY=40 \
NEGATIVE_COUNT=25 \
NEGATIVE_CONCURRENCY=10 \
scripts/rate-limit/demo.sh all
```
### Step-by-Step
Preflight:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
ENVIRONMENT_ID='<environment_id>' \
API_KEY='<api_key>' \
scripts/rate-limit/demo.sh preflight
```
Public route demo:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
ENVIRONMENT_ID='<environment_id>' \
PUBLIC_COUNT=125 \
PUBLIC_CONCURRENCY=20 \
scripts/rate-limit/demo.sh public
```
Management API-key demo:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
API_KEY='<api_key>' \
MANAGEMENT_COUNT=200 \
MANAGEMENT_CONCURRENCY=40 \
scripts/rate-limit/demo.sh management
```
Excluded-route demo:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
NEGATIVE_COUNT=25 \
NEGATIVE_CONCURRENCY=10 \
scripts/rate-limit/demo.sh negative
```
Recent Envoy log evidence:
```bash
cd /Users/bhagya/work/formbricks/formbricks
LOG_WINDOW=5m \
scripts/rate-limit/demo.sh evidence
```
## Recommended Live Sequence
Use this order:
1. `preflight`
2. `public`
3. `management`
4. `negative`
5. `evidence`
This gives you a complete story:
- the traffic path is on Envoy
- public traffic is blocked at the gateway
- API-key traffic is blocked at the gateway
- excluded routes remain open
- Envoy logs confirm the decisions server-side
## What To Say During The Demo
### 1. Gateway Path Is Active
The preflight step should report:
- `status=200 source=gateway` for `v1-client-environment`
- `status=200 source=gateway` for `management-api-key`
That proves the response is coming through the Envoy path rather than directly from the old app ingress path.
### 2. Public Client Route Is Rate-Limited At The Gateway
The public burst step targets:
- `GET /api/v1/client/<environment_id>/environment`
Success criteria:
- the summary contains `status=429 source=gateway`
- the summary prints `gateway_429s=<n>` with `n > 0`
### 3. API-Key Management Route Is Rate-Limited At The Gateway
The management burst step targets:
- `GET /api/v1/management/me`
Success criteria:
- the summary contains `status=429 source=gateway`
- the summary prints `gateway_429s=<n>` with `n > 0`
### 4. Excluded Health Route Is Not Rate-Limited
The excluded-route step targets:
- `GET /api/v2/health`
Success criteria:
- the summary contains no `429` responses
- `gateway_429s=0`
- `app_429s=0`
### 5. Live Envoy Evidence
The evidence step prints matching Envoy log lines for:
- `formbricks-stage-v1-client`
- `formbricks-stage-v1-management`
- `request_rate_limited`
That gives you an infrastructure-side proof in addition to the client-side summary.
## Expected Caveat
Staging can still show intermittent `500` or `503` responses under high burst load on the environment route.
For the demo, this does **not** invalidate the POC if:
- the preflight shows `source=gateway`
- the burst summary shows `status=429 source=gateway`
That means the gateway path and rate-limiting policy are working, and the remaining issue is staging stability on the upstream route under burst load.
## Useful Supporting Commands
Show one direct public probe:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
ENVIRONMENT_ID='<environment_id>' \
COUNT=1 \
scripts/rate-limit/burst-test.sh v1-client-environment
```
Show one direct management probe:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
API_KEY='<api_key>' \
COUNT=1 \
scripts/rate-limit/burst-test.sh management-api-key
```
Show the excluded route probe:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
COUNT=1 \
scripts/rate-limit/burst-test.sh v2-health
```
Show recent Envoy route hits during the demo:
```bash
kubectl logs -n formbricks-stage deploy/formbricks-stage-envoy -c envoy --since=2m | \
rg 'formbricks-stage-v1-client|formbricks-stage-v1-management|request_rate_limited'
```
## Routes To Avoid In The Demo
Do not use the storage upload scenarios in the live demo.
The current dummy payloads intentionally trigger validation `400`s, which makes the demo noisy and does not cleanly demonstrate gateway limiting.

View File

@@ -0,0 +1,359 @@
# Envoy Rate-Limit POC Meeting Brief
This brief is for the meeting about the current Envoy rate-limiting POC, what it does today, what is still missing, and what the next development steps should be before production rollout.
## Objective
Align on:
1. what the current staging POC actually proves
2. what is still unstable or incomplete
3. what the productionization path should be
4. which next engineering steps to prioritize
## Current Scope
The current POC is:
- Kubernetes-native
- Envoy Gateway based
- running on EKS staging
- enforcing rate limits in parallel with existing app-side Redis rate limits
It is **not** yet a production-ready rollout.
## Current Architecture
For the selected staging routes, the path is now:
- Cloudflare
- staging ALB
- Envoy Gateway
- Formbricks staging web service
The old catch-all app ingress still serves the rest of the app directly. Only selected API prefixes are routed through Envoy for this POC.
## Current Envoy-Covered Route Set
There are two different scopes to keep separate:
1. routes that currently traverse Envoy on staging
2. routes that currently have an active Envoy rate-limit policy
### Routes Currently Routed Through Envoy
These prefixes are currently sent to Envoy first on staging:
- `/api/auth/callback`
- `/api/v1/client`
- `/api/v2/client`
- `/api/v1/management`
- `/api/v1/webhooks`
- `/storage`
The ALB health check path `/health` is also wired through Envoy so the staging Envoy service can be health-checked cleanly.
### Routes Currently Rate-Limited By Envoy
The active `BackendTrafficPolicy` resources currently cover these route groups:
- auth callbacks by client IP:
- `POST /api/auth/callback/credentials`
- `40 / hour` at the gateway
- this is an approximation of the stricter app-side `10 / 15 min` limit, because Envoy Gateway global rate limits only support whole-unit windows
- `POST /api/auth/callback/token`
- `10 / hour`
- V1 client routes by client IP:
- `POST /api/v1/client/{environmentId}/storage`
- `5 / min`
- `GET|POST|PUT|PATCH|DELETE /api/v1/client/{environmentId}/environment`
- `100 / min`
- `GET|POST|PUT|PATCH|DELETE /api/v1/client/{environmentId}/responses`
- `100 / min`
- `GET|POST|PUT|PATCH|DELETE /api/v1/client/{environmentId}/responses/{responseId}`
- `100 / min`
- `GET|POST|PUT|PATCH|DELETE /api/v1/client/{environmentId}/displays`
- `100 / min`
- `GET|POST|PUT|PATCH|DELETE /api/v1/client/{environmentId}/user`
- `100 / min`
- V2 client routes by client IP:
- `POST|PUT /api/v2/client/{environmentId}/responses`
- `100 / min`
- `POST|PUT /api/v2/client/{environmentId}/responses/{responseId}`
- `100 / min`
- `POST /api/v2/client/{environmentId}/displays`
- `100 / min`
- `POST /api/v2/client/{environmentId}/storage`
- `5 / min`
- V1 management routes by `x-api-key`:
- `POST /api/v1/management/storage`
- `5 / min`
- `GET|POST|PUT|PATCH|DELETE /api/v1/management/*`
- `100 / min`
- V1 webhooks routes by `x-api-key`:
- `GET|POST|PUT|PATCH|DELETE /api/v1/webhooks/*`
- `100 / min`
- storage delete by `x-api-key`:
- `DELETE /storage/{environmentId}/{public|private}/...`
- `5 / min`
### Explicitly Not Covered By Envoy Rate Limiting
Important examples that are **not** currently rate-limited by Envoy:
- `/api/v2/health`
- not routed through Envoy in the current POC
- this is the negative-control route used in the demo
- `/api/v1/client/og`
- routed under the broader `/api/v1/client` prefix, but not matched by the active V1 client rate-limit regex
- routes outside the listed prefixes above
- still go straight through the old staging app ingress
## Relevant PRs
- Formbricks app support PR: [formbricks#7583](https://github.com/formbricks/formbricks/pull/7583)
- Infra POC PR: [infra#145](https://github.com/formbricks/infra/pull/145)
- GitOps staging ingress-order PR: [gitops#70](https://github.com/formbricks/gitops/pull/70)
## What Works Today
### 1. Gateway Pathing Is Working
We validated that staging requests for the selected routes now traverse Envoy.
Evidence:
- the burst tooling reports `source=gateway`
- Envoy access logs show real routed traffic for:
- `formbricks-stage-v1-client`
- `formbricks-stage-v1-management`
### 2. Gateway Rate Limiting Is Working
We validated gateway `429`s on staging for:
- public client route:
- `GET /api/v1/client/[environmentId]/environment`
- API-key route:
- `GET /api/v1/management/me`
Evidence:
- demo/burst output shows `status=429 source=gateway`
- Envoy logs show:
- `response_code: 429`
- `response_code_details: request_rate_limited`
- `response_flags: RL`
### 3. Shared ALB Routing Issue Was Fixed
The initial POC looked broken because traffic was still bypassing Envoy. The cause was shared-ALB ingress ordering.
The fix was:
- Envoy ingress priority higher
- old catch-all staging ingress priority lower
That fix is now represented in:
- [envoy-gateway.tf](/Users/bhagya/work/formbricks/infra/platform/core-eks/envoy-gateway.tf)
- [values-stage.yaml](/Users/bhagya/work/formbricks/gitops/formbricks/values-stage.yaml)
## What Is Not Clean Yet
### 1. Intermittent Burst Instability Still Exists
Under high-concurrency bursts, the environment route can still produce intermittent non-rate-limit failures.
Observed behavior:
- external staging path:
- expected gateway `429`s
- intermittent `503`s
- direct in-cluster through Envoy:
- `98 x 200`
- `40 x 429`
- `2 x 500`
- direct in-cluster to the app service, bypassing Envoy:
- `99 x 200`
- `41 x 429`
- `0 x 503`
Interpretation:
- the rate-limiting path is working
- there is also an upstream app instability on the environment route under burst
- the external `503`s are a secondary symptom on top of that upstream instability
### 2. Environment Route Is the Main Hotspot
The problematic route is:
- [route.ts](/Users/bhagya/work/formbricks/formbricks/apps/web/app/api/v1/client/[environmentId]/environment/route.ts)
It depends on:
- [environmentState.ts](/Users/bhagya/work/formbricks/formbricks/apps/web/app/api/v1/client/[environmentId]/environment/lib/environmentState.ts)
- [data.ts](/Users/bhagya/work/formbricks/formbricks/apps/web/app/api/v1/client/[environmentId]/environment/lib/data.ts)
- [service.ts](/Users/bhagya/work/formbricks/formbricks/packages/cache/src/service.ts)
### 3. Redis/Cache Errors Are Visible During the Burst Window
During the same period, the staging app logs show repeated Redis cache failures for:
- `fb:env:<environmentId>:state`
Examples seen:
- `Cache get operation failed`
- `Cache set operation failed`
This strongly suggests the route stability problem is entangled with the cache path on the environment-state endpoint.
### 4. Staging Has Only One App Replica
The staging Formbricks deployment is still a single replica.
Implication:
- if that pod stalls or responds slowly under burst load, there is no buffer
- liveness/readiness probe failures immediately translate into a noisy external path
## Demo Status
The demo is ready.
Use:
- [DEMO.md](/Users/bhagya/work/formbricks/formbricks/scripts/rate-limit/DEMO.md)
- [demo.sh](/Users/bhagya/work/formbricks/formbricks/scripts/rate-limit/demo.sh)
Recommended demo focus:
1. prove gateway pathing with one-request probes
2. prove public-route gateway `429`s
3. prove API-key-route gateway `429`s
Do not use storage upload scenarios in the demo right now. The current dummy payloads produce validation `400`s and make the demonstration noisy.
## Suggested Meeting Narrative
### What the POC proves
- We can run Kubernetes-native gateway rate limiting in front of Formbricks on staging.
- We can enforce both IP-keyed and API-key-keyed limits at the gateway.
- We can do this without removing the existing app-side Redis rate limits.
### What the POC does not yet prove
- that the current selected upstream routes are stable enough under burst for production
- that the same setup is production-ready operationally
- that the GKE/KSA side is solved
## Production Gaps
Before production rollout, the main gaps are:
### 1. Fix the environment route instability
Priority: highest
Why:
- this is the route most likely to be called at scale
- it already shows intermittent app `500`s under burst
- those propagate into external `503`s
### 2. Improve upstream resilience
At minimum:
- increase staging replica count for realistic soak testing
- verify HPA behavior
- verify probe behavior under burst load
### 3. Harden observability
Need clearer signals for:
- gateway `429`s
- upstream `500`s
- external `503`s
- Redis/cache failures on hot environment-state keys
### 4. Merge and stabilize the routing source of truth
The ALB ordering fix depends on the GitOps ingress-order change being merged and synced.
### 5. Decide production rollout shape
Open choices:
- keep app-side Redis rate limits in parallel initially
- or later remove overlap for routes fully covered by the gateway
For now, parallel mode is the safer production introduction.
## Recommended Next Engineering Steps
### Short Term
1. Merge the GitOps ingress-order fix so staging routing does not drift.
2. Investigate and fix intermittent `500`s on the environment endpoint.
3. Increase staging app replicas to reduce single-pod fragility during validation.
4. Re-run burst and soak tests after the route fix.
### Medium Term
1. Define the initial production route set.
2. Add production-grade monitoring and alerting around Envoy and upstream route health.
3. Run a controlled rollout in production with app-side Redis limits still active in parallel.
### Later
1. Extend the same pattern to GKE/KSA if desired.
2. Revisit whether app-side overlap should be removed for gateway-managed routes.
## Decisions To Drive In The Meeting
These are the concrete decisions worth getting:
1. Is the current staging POC accepted as proof of concept, despite the known upstream instability?
2. Should we prioritize fixing the environment route before any production discussion?
3. Should staging be moved to 2 replicas before further validation?
4. Should the first production rollout keep app Redis limits active in parallel?
5. Which route set should be included in phase 1 production rollout?
6. Is GKE/KSA explicitly phase 2, or should it be planned in parallel?
## Command Summary
Full demo:
```bash
cd /Users/bhagya/work/formbricks/formbricks
HOST=https://staging.app.formbricks.com \
ENVIRONMENT_ID='<environment_id>' \
API_KEY='<api_key>' \
./scripts/rate-limit/demo.sh all
```
Show recent gateway evidence:
```bash
kubectl logs -n formbricks-stage deploy/formbricks-stage-envoy -c envoy --since=2m | \
rg 'formbricks-stage-v1-client|formbricks-stage-v1-management|request_rate_limited'
```
Show upstream app errors:
```bash
kubectl logs -n formbricks-stage deploy/formbricks -c formbricks --since=10m | \
rg 'Cache get operation failed|Cache set operation failed|Error in GET /api/v1/client/\\[environmentId\\]/environment|V1 API Error Details'
```
## Bottom Line
The Envoy POC is successful as a staging proof of gateway-based rate limiting.
The next step is **not** to redesign the gateway path again. The next step is to harden the upstream environment route and staging resilience so the validated gateway path can be taken seriously as a production candidate.

View File

@@ -0,0 +1,58 @@
# Rate-Limit Burst Checks
These scripts are for validating the Envoy Gateway staging POC without changing runtime behavior in the app.
For a live demo flow, use [DEMO.md](/Users/bhagya/work/formbricks/formbricks/scripts/rate-limit/DEMO.md) and
[demo.sh](/Users/bhagya/work/formbricks/formbricks/scripts/rate-limit/demo.sh).
## What the script reports
For each request it prints:
- request number
- scenario name
- HTTP status
- response source guess
`source=gateway` means the response included `x-envoy-ratelimited`.
For the staging Envoy POC, the script also treats the standard `x-ratelimit-*` headers and empty-body `429`
responses as gateway hits, because those are the headers currently visible on the live gateway path.
`source=app` means the response body matched the Formbricks `too_many_requests` JSON shape.
`source=unknown` means the response was neither of those and should be inspected manually.
## Required environment variables
- `HOST`
- defaults to `https://staging.app.formbricks.com`
- `ENVIRONMENT_ID`
- required for client API scenarios
- `API_KEY`
- required for management, webhooks, and storage-delete scenarios
## Optional environment variables
- `COUNT`
- number of requests to send
- `CONCURRENCY`
- number of in-flight requests to run in parallel
- `SLEEP_SECONDS`
- delay between requests
- `RESPONSE_ID`
- used by the `v2-responses-put` scenario
- `WEBHOOK_ID`
- used by the `webhooks-api-key` scenario
- `FILE_KEY`
- used by the `storage-delete-api-key` scenario
## Example
```bash
HOST=https://staging.app.formbricks.com \
ENVIRONMENT_ID=<environment_id> \
COUNT=110 \
CONCURRENCY=20 \
scripts/rate-limit/burst-test.sh v1-client-environment
```

218
scripts/rate-limit/burst-test.sh Executable file
View File

@@ -0,0 +1,218 @@
#!/usr/bin/env bash
set -euo pipefail
SCENARIO="${1:-}"
HOST="${HOST:-https://staging.app.formbricks.com}"
ENVIRONMENT_ID="${ENVIRONMENT_ID:-}"
API_KEY="${API_KEY:-}"
COUNT="${COUNT:-20}"
CONCURRENCY="${CONCURRENCY:-1}"
SLEEP_SECONDS="${SLEEP_SECONDS:-0}"
RESPONSE_ID="${RESPONSE_ID:-envoy-poc-response}"
WEBHOOK_ID="${WEBHOOK_ID:-envoy-poc-webhook}"
FILE_KEY="${FILE_KEY:-envoy-poc-file.txt}"
if [[ -z "$SCENARIO" ]]; then
echo "usage: scripts/rate-limit/burst-test.sh <scenario>" >&2
exit 1
fi
require_env_id() {
if [[ -z "$ENVIRONMENT_ID" ]]; then
echo "ENVIRONMENT_ID is required for scenario '$SCENARIO'" >&2
exit 1
fi
}
require_api_key() {
if [[ -z "$API_KEY" ]]; then
echo "API_KEY is required for scenario '$SCENARIO'" >&2
exit 1
fi
}
METHOD="GET"
URL=""
BODY=""
CONTENT_TYPE=""
EXTRA_HEADERS=()
case "$SCENARIO" in
login)
METHOD="POST"
URL="$HOST/api/auth/callback/credentials"
BODY="email=rate-limit%40example.com&password=wrong-password"
CONTENT_TYPE="application/x-www-form-urlencoded"
;;
verify-token)
METHOD="POST"
URL="$HOST/api/auth/callback/token"
BODY="token=invalid-token"
CONTENT_TYPE="application/x-www-form-urlencoded"
;;
v1-client-environment)
require_env_id
URL="$HOST/api/v1/client/$ENVIRONMENT_ID/environment"
;;
v1-client-storage)
require_env_id
METHOD="POST"
URL="$HOST/api/v1/client/$ENVIRONMENT_ID/storage"
BODY='{}'
CONTENT_TYPE="application/json"
;;
v2-responses-post)
require_env_id
METHOD="POST"
URL="$HOST/api/v2/client/$ENVIRONMENT_ID/responses"
BODY='{}'
CONTENT_TYPE="application/json"
;;
v2-responses-put)
require_env_id
METHOD="PUT"
URL="$HOST/api/v2/client/$ENVIRONMENT_ID/responses/$RESPONSE_ID"
BODY='{}'
CONTENT_TYPE="application/json"
;;
v2-displays-post)
require_env_id
METHOD="POST"
URL="$HOST/api/v2/client/$ENVIRONMENT_ID/displays"
BODY='{}'
CONTENT_TYPE="application/json"
;;
v2-client-storage)
require_env_id
METHOD="POST"
URL="$HOST/api/v2/client/$ENVIRONMENT_ID/storage"
BODY='{}'
CONTENT_TYPE="application/json"
;;
v2-health)
URL="$HOST/api/v2/health"
;;
management-api-key)
require_api_key
URL="$HOST/api/v1/management/me"
EXTRA_HEADERS+=("x-api-key: $API_KEY")
;;
management-storage-api-key)
require_api_key
METHOD="POST"
URL="$HOST/api/v1/management/storage"
BODY='{}'
CONTENT_TYPE="application/json"
EXTRA_HEADERS+=("x-api-key: $API_KEY")
;;
webhooks-api-key)
require_api_key
URL="$HOST/api/v1/webhooks/$WEBHOOK_ID"
EXTRA_HEADERS+=("x-api-key: $API_KEY")
;;
storage-delete-api-key)
require_env_id
require_api_key
METHOD="DELETE"
URL="$HOST/storage/$ENVIRONMENT_ID/public/$FILE_KEY"
EXTRA_HEADERS+=("x-api-key: $API_KEY")
;;
*)
echo "unknown scenario: $SCENARIO" >&2
exit 1
;;
esac
TMP_DIR="$(mktemp -d)"
trap 'rm -rf "$TMP_DIR"' EXIT
run_request() {
local i="$1"
local header_file
local body_file
local status_code
local source
local header_summary
local has_gateway_headers="false"
header_file="$TMP_DIR/$i.headers"
body_file="$TMP_DIR/$i.body"
curl_args=(
-sS
-D "$header_file"
-o "$body_file"
-X "$METHOD"
)
if [[ -n "$CONTENT_TYPE" ]]; then
curl_args+=(-H "content-type: $CONTENT_TYPE")
fi
# Bash 3.x + `set -u` treats empty arrays as unset during expansion, so guard the loop.
if [[ ${#EXTRA_HEADERS[@]:-0} -gt 0 ]]; then
for header in "${EXTRA_HEADERS[@]}"; do
curl_args+=(-H "$header")
done
fi
if [[ -n "$BODY" ]]; then
curl_args+=(--data "$BODY")
fi
status_code="$(curl "${curl_args[@]}" -w '%{http_code}' "$URL")"
source="unknown"
if rg -q '"code":"too_many_requests"' "$body_file"; then
source="app"
else
if rg -qi '^(x-envoy-ratelimited|x-ratelimit-limit|x-ratelimit-remaining|x-ratelimit-reset):' "$header_file"; then
has_gateway_headers="true"
fi
if [[ "$has_gateway_headers" == "true" ]]; then
source="gateway"
elif [[ "$status_code" == "429" && ! -s "$body_file" ]]; then
source="gateway"
fi
fi
printf '%03d scenario=%s status=%s source=%s\n' "$i" "$SCENARIO" "$status_code" "$source"
if [[ "$status_code" == "429" ]]; then
header_summary="$(
{
tr -d '\r' < "$header_file" |
rg -i '^(x-envoy-ratelimited|x-ratelimit-limit|x-ratelimit-remaining|x-ratelimit-reset|content-type|retry-after):' |
paste -sd '; ' -
} || true
)"
printf ' headers: %s\n' "${header_summary:-<none>}"
fi
if [[ "$SLEEP_SECONDS" != "0" ]]; then
sleep "$SLEEP_SECONDS"
fi
}
if (( CONCURRENCY <= 1 )); then
for i in $(seq 1 "$COUNT"); do
run_request "$i"
done
else
pids=()
for i in $(seq 1 "$COUNT"); do
run_request "$i" &
pids+=("$!")
if (( ${#pids[@]} >= CONCURRENCY )); then
wait "${pids[0]}"
pids=("${pids[@]:1}")
fi
done
for pid in "${pids[@]}"; do
wait "$pid"
done
fi

284
scripts/rate-limit/demo.sh Executable file
View File

@@ -0,0 +1,284 @@
#!/usr/bin/env bash
set -euo pipefail
MODE="${1:-all}"
HOST="${HOST:-https://staging.app.formbricks.com}"
ENVIRONMENT_ID="${ENVIRONMENT_ID:-}"
API_KEY="${API_KEY:-}"
PUBLIC_COUNT="${PUBLIC_COUNT:-125}"
PUBLIC_CONCURRENCY="${PUBLIC_CONCURRENCY:-20}"
MANAGEMENT_COUNT="${MANAGEMENT_COUNT:-200}"
MANAGEMENT_CONCURRENCY="${MANAGEMENT_CONCURRENCY:-40}"
NEGATIVE_COUNT="${NEGATIVE_COUNT:-25}"
NEGATIVE_CONCURRENCY="${NEGATIVE_CONCURRENCY:-10}"
LOG_WINDOW="${LOG_WINDOW:-5m}"
WORKDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BURST_SCRIPT="$WORKDIR/burst-test.sh"
TMP_DIR="$(mktemp -d)"
trap 'rm -rf "$TMP_DIR"' EXIT
usage() {
cat <<'EOF'
usage: scripts/rate-limit/demo.sh [preflight|public|management|negative|evidence|all]
Required environment variables:
ENVIRONMENT_ID Staging environment ID for public client route checks
API_KEY Single-environment staging API key for management route checks
Optional environment variables:
HOST Defaults to https://staging.app.formbricks.com
PUBLIC_COUNT Defaults to 125
PUBLIC_CONCURRENCY Defaults to 20
MANAGEMENT_COUNT Defaults to 200
MANAGEMENT_CONCURRENCY Defaults to 40
NEGATIVE_COUNT Defaults to 25
NEGATIVE_CONCURRENCY Defaults to 10
LOG_WINDOW Defaults to 5m
EOF
}
require_env_id() {
if [[ -z "$ENVIRONMENT_ID" ]]; then
echo "ENVIRONMENT_ID is required" >&2
exit 1
fi
}
require_api_key() {
if [[ -z "$API_KEY" ]]; then
echo "API_KEY is required" >&2
exit 1
fi
}
section() {
printf '\n== %s ==\n' "$1"
}
run_and_capture() {
local output_file="$1"
shift
"$@" | tee "$output_file"
}
summarize_output() {
local output_file="$1"
awk '
/scenario=/ {
status = ""
source = ""
for (i = 1; i <= NF; i++) {
if ($i ~ /^status=/) {
status = substr($i, 8)
}
if ($i ~ /^source=/) {
source = substr($i, 8)
}
}
if (status != "" && source != "") {
counts[status "|" source]++
}
}
END {
for (key in counts) {
split(key, parts, "|")
printf "status=%s source=%s count=%d\n", parts[1], parts[2], counts[key]
}
}
' "$output_file" | sort
}
print_summary_insights() {
local output_file="$1"
local gateway_429_count
local app_429_count
local unknown_429_count
local server_error_count
gateway_429_count="$(count_matches 'status=429 source=gateway' "$output_file")"
app_429_count="$(count_matches 'status=429 source=app' "$output_file")"
unknown_429_count="$(count_matches 'status=429 source=unknown' "$output_file")"
server_error_count="$(count_matches 'status=5[0-9][0-9] source=' "$output_file")"
echo "gateway_429s=$gateway_429_count"
echo "app_429s=$app_429_count"
echo "unknown_429s=$unknown_429_count"
echo "server_errors=$server_error_count"
}
count_matches() {
local pattern="$1"
local input_file="$2"
local count
count="$(rg -c "$pattern" "$input_file" 2>/dev/null || true)"
echo "${count:-0}"
}
assert_gateway_probe() {
local output_file="$1"
if ! rg -q 'source=gateway' "$output_file"; then
echo "Expected a gateway-tagged response in probe output, but none was found." >&2
exit 1
fi
}
assert_gateway_rate_limit() {
local output_file="$1"
if ! rg -q 'status=429 source=gateway' "$output_file"; then
echo "Expected at least one gateway 429 in burst output, but none was found." >&2
exit 1
fi
}
assert_no_429() {
local output_file="$1"
if rg -q 'status=429 source=' "$output_file"; then
echo "Expected no 429s in excluded-route output, but at least one was found." >&2
exit 1
fi
}
show_envoy_log_evidence() {
local pattern="$1"
section "Recent Envoy Evidence"
if ! command -v kubectl >/dev/null 2>&1; then
echo "kubectl not available; skipping live Envoy log evidence."
return
fi
if ! kubectl logs -n formbricks-stage deploy/formbricks-stage-envoy -c envoy --since="$LOG_WINDOW" 2>/dev/null | \
rg "$pattern" | \
rg 'request_rate_limited|response_flags":"RL"'; then
echo "No matching Envoy log lines found in the last $LOG_WINDOW."
fi
}
print_known_caveat() {
cat <<'EOF'
Known staging caveat:
- intermittent 500/503 responses can still appear under high burst load on the environment route
- this is a staging stability issue on top of the Envoy POC, not a sign that the gateway path is bypassed
- the demo still passes if you see gateway-tagged 429 responses
EOF
}
run_preflight() {
require_env_id
require_api_key
section "Preflight"
echo "Host: $HOST"
echo "Environment ID: $ENVIRONMENT_ID"
echo "API key: provided"
section "Public Route Probe"
public_probe_output="$TMP_DIR/public-probe.txt"
run_and_capture \
"$public_probe_output" \
env HOST="$HOST" ENVIRONMENT_ID="$ENVIRONMENT_ID" COUNT=1 "$BURST_SCRIPT" v1-client-environment
assert_gateway_probe "$public_probe_output"
section "Management Route Probe"
management_probe_output="$TMP_DIR/management-probe.txt"
run_and_capture \
"$management_probe_output" \
env HOST="$HOST" API_KEY="$API_KEY" COUNT=1 "$BURST_SCRIPT" management-api-key
assert_gateway_probe "$management_probe_output"
print_known_caveat
}
run_public_demo() {
require_env_id
section "Public IP Demo"
echo "Route: GET /api/v1/client/$ENVIRONMENT_ID/environment"
echo "Expected: gateway 429 after threshold"
public_output="$TMP_DIR/public-burst.txt"
run_and_capture \
"$public_output" \
env HOST="$HOST" ENVIRONMENT_ID="$ENVIRONMENT_ID" COUNT="$PUBLIC_COUNT" CONCURRENCY="$PUBLIC_CONCURRENCY" \
"$BURST_SCRIPT" v1-client-environment
section "Public IP Summary"
summarize_output "$public_output"
print_summary_insights "$public_output"
assert_gateway_rate_limit "$public_output"
show_envoy_log_evidence 'formbricks-stage-v1-client'
}
run_management_demo() {
require_api_key
section "API Key Demo"
echo "Route: GET /api/v1/management/me"
echo "Expected: gateway 429 after threshold"
management_output="$TMP_DIR/management-burst.txt"
run_and_capture \
"$management_output" \
env HOST="$HOST" API_KEY="$API_KEY" COUNT="$MANAGEMENT_COUNT" CONCURRENCY="$MANAGEMENT_CONCURRENCY" \
"$BURST_SCRIPT" management-api-key
section "API Key Summary"
summarize_output "$management_output"
print_summary_insights "$management_output"
assert_gateway_rate_limit "$management_output"
show_envoy_log_evidence 'formbricks-stage-v1-management'
}
run_negative_demo() {
section "Excluded Route Demo"
echo "Route: GET /api/v2/health"
echo "Expected: no 429 responses because this route is excluded from the gateway policy set"
negative_output="$TMP_DIR/negative-burst.txt"
run_and_capture \
"$negative_output" \
env HOST="$HOST" COUNT="$NEGATIVE_COUNT" CONCURRENCY="$NEGATIVE_CONCURRENCY" \
"$BURST_SCRIPT" v2-health
section "Excluded Route Summary"
summarize_output "$negative_output"
print_summary_insights "$negative_output"
assert_no_429 "$negative_output"
}
run_evidence_only() {
show_envoy_log_evidence 'formbricks-stage-v1-client|formbricks-stage-v1-management'
}
case "$MODE" in
preflight)
run_preflight
;;
public)
run_public_demo
;;
management)
run_management_demo
;;
negative)
run_negative_demo
;;
evidence)
run_evidence_only
;;
all)
run_preflight
run_public_demo
run_management_demo
run_negative_demo
;;
-h|--help|help)
usage
;;
*)
usage >&2
exit 1
;;
esac