matt
9e14814acb
Feat: OLAP Payload Cutover Job ( #2618 )
...
* feat: migration
* feat: queries
* feat: overwrite queries
* fix: bug
* feat: first pass
* fix: more olap job wiring
* fix: signature
* fix: refs to a bunch of funcs
* feat: job
* fix: table names
* fix: span name
* chore: lint
* fix: redundant error check
* fix: naming
* fix: handle nil external id
* fix: order payload partitions descending
* fix: param for limiting which partitions get processed
* fix: olap
2025-12-09 12:33:07 -05:00
matt
35d1cff963
refactor: simplify external store signature ( #2616 )
2025-12-08 14:53:52 -05:00
matt
bede3efe0d
Feat: Process all old partitions in a loop ( #2613 )
...
* feat: process old partitions in a loop
* fix: param
* fix: query return
* feat: add spans
* fix: naming
2025-12-08 11:00:24 -05:00
matt
34090a71f2
fix: add validation ( #2610 )
2025-12-05 14:30:05 -05:00
matt
7e48ac7d02
Fix: Leasing for payload job ( #2609 )
...
* refactor: acquire a lease instead of an advisory lock
* refactor: partition dates
* fix: single query to acquire / extend
* fix: explicit alias
* fix: unwind
* fix: hwere clause
* fix: handle no rows
* fix: lease bug
* fix: rm debug
* fix: comment for clarity
* fix: syntax that doesn't actually matter
* fix: error
2025-12-05 13:55:59 -05:00
matt
18940869ae
Feat: Job for payload cutovers to external ( #2586 )
...
* feat: initial payload cutover job
* refactor: fix a couple things
* feat: start wiring up writes
* feat: only run job if external store is enabled
* fix: add some notes, add loop
* feat: function for reading out payloads
* fix: date handling, logging
* feat: remove wal and immediate offloads
* feat: advisory lock
* feat: partition swap logic
* fix: rm debug
* fix: add todo
* fix: sql cleanup
* fix: sql cleanup, ii
* chore: nuke a bunch of WAL stuff
* chore: more wal
* feat: trigger for crud opts
* feat: drop trigger + function in swapover
* feat: move autovac to later
* feat: use unlogged table initially
* feat: update migration
* fix: drop trigger
* fix: use insert + on conflict
* fix: types
* refactor: clean up a bit
* fix: panic
* fix: detach partition before dropping
* feat: configurable batch size
* feat: offset tracking in the db
* feat: explicitly lock
* fix: down migration
* fix: bug
* fix: offset handling
* fix: try explicit ordering of the insert
* fix: lock location
* fix: do less stuff after locking
* fix: ordering
* fix: dont drop and recreate if temp table exists
* fix: explicitly track completed status
* fix: table name
* fix: dont use unlogged table
* fix: rm todos
* chore: lint
* feat: configurable delay
* fix: use date as pk instead of varchar
* fix: daily job
* fix: hack check constraint to speed up partition attach
* fix: syntax
* fix: syntax
* fix: drop constraint after attaching
* fix: syntax
* fix: drop triggers properly
* fix: factor out insert logic
* refactor: factor out loop logic
* refactor: factor out job preparation work
* fix: ordering
* fix: run the job more often
* fix: use `WithSingletonMode`
* fix: singleton mode sig
* fix: env var cleanup
* fix: overwrite sig
* fix: re-enable immediate offloads with a flag
* fix: order, offload at logic
* feat: add count query to compare
* fix: row-level triggers, partition time bug
* fix: rm todo
* fix: for true
* fix: handle lock not acquired
* fix: handle error
* fix: comment
2025-12-05 10:54:26 -05:00
abelanger5
9dabe7d902
feat: dlq for dispatcher queues ( #2600 )
...
* feat: dlq for dispatcher queues
* reduce dispatcher message ttl to 20 seconds
* rename dispatcher queue for clarity
* add error logs when dead lettering
* address comment
2025-12-04 14:19:01 -05:00
Mohammed Nafees
cf18b31218
Initialize concurrency keys slice for replayed tasks ( #2549 )
...
* make sure concurrency keys for replayed tasks are initialized as expected
* no need for useless allocs
* revert change for insertTasks
* panic catch
* uncomment fix
2025-12-04 22:29:30 +05:30
abelanger5
d071a1c36b
fix: prevent large worker gRPC stream backlogs ( #2597 )
...
* fix: prevent large worker backlogs
* add config value
* add doc for troubleshooting
2025-12-03 17:15:43 -05:00
Gabe Ruttner
1e2a587b21
fix: GetLatestWorkflowVersionForWorkflows ( #2590 )
...
* fix query
* gen
2025-12-02 05:14:08 -08:00
Sid Premkumar
709dd89a18
Add gzip compression ( #2539 )
...
* Add gzip compression init
* revert
* Feat: Initial cross-domain identify setup (#2533 )
* feat: initial setup
* fix: factor out
* chore: lint
* fix: xss vuln
* feat: set up properly
* fix: lint
* fix: key
* fix: keys, cleanup
* Fix: use sessionStorage instead of localStorage (#2541 )
* chore(deps): bump golang.org/x/crypto from 0.44.0 to 0.45.0 (#2545 )
Bumps [golang.org/x/crypto](https://github.com/golang/crypto ) from 0.44.0 to 0.45.0.
- [Commits](https://github.com/golang/crypto/compare/v0.44.0...v0.45.0 )
---
updated-dependencies:
- dependency-name: golang.org/x/crypto
dependency-version: 0.45.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore(deps): bump google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml (#2547 )
Bumps [google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml](https://github.com/google/osv-scanner-action ) from 2.2.4 to 2.3.0.
- [Release notes](https://github.com/google/osv-scanner-action/releases )
- [Commits](https://github.com/google/osv-scanner-action/compare/v2.2.4...v2.3.0 )
---
updated-dependencies:
- dependency-name: google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml
dependency-version: 2.3.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [Go SDK] Resubscribe and get a new listener stream when gRPC connections fail (#2544 )
* fix listener cache issue to resubscribe when erroring out
* worker retry message clarification (#2543 )
* add another retry layer and add comments
* fix loop logic
* make listener channel retry
* Compression test utils, and add log to indicate its enabled
* clean + fix
* more fallbacks
* common pgxpool afterconnect method (#2553 )
* remove
* lint
* lint
* add cpu monitor during test
* fix background monitor and lint
* Make envvar to disable compression
* cleanup monitoring
* PR Feedback
* Update paths in compression tests + bump package versions
* path issue on test script
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: matt <mrkaye97@gmail.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mohammed Nafees <hello@mnafees.me >
2025-11-26 17:14:38 -05:00
matt
be9c7df026
Fix: Noisy Payload Error ( #2561 )
...
* fix: noisy error
* fix: only error if the task is completed but has no payload
2025-11-26 17:04:37 -05:00
Gabe Ruttner
3e5f737ef5
fix: query optimization get latest workflow version ( #2576 )
2025-11-26 08:56:20 -08:00
Gabe Ruttner
c920d54519
analyze v1 lookup table ( #2568 )
...
Co-authored-by: matt <mrkaye97@gmail.com >
2025-11-25 17:25:40 -05:00
matt
727a8fe470
Fix: OLAP Task Event Dual Write Bug ( #2572 )
...
* fix: task events bug
* fix: fallback bug
* fix: simplfiy test
2025-11-25 17:24:56 -05:00
matt
8350cb2205
Revert "optimize UUID sqlchelpers ( #2532 )" ( #2571 )
...
This reverts commit 9a09105e52 .
2025-11-25 12:10:34 -05:00
Mohammed Nafees
9a09105e52
optimize UUID sqlchelpers ( #2532 )
2025-11-24 16:50:21 +01:00
Mohammed Nafees
7bb3e1da8d
common pgxpool afterconnect method ( #2553 )
2025-11-21 14:55:04 +01:00
Mohammed Nafees
f66fe63ad0
[Go SDK] Resubscribe and get a new listener stream when gRPC connections fail ( #2544 )
...
* fix listener cache issue to resubscribe when erroring out
* worker retry message clarification (#2543 )
* add another retry layer and add comments
* fix loop logic
* make listener channel retry
2025-11-20 19:13:24 +01:00
abelanger5
2249ef3b79
fix: small scheduler optimizations ( #2426 )
...
* fix: actually increment snapshot count
* add a context with timeout to wrap replenish
2025-11-17 15:45:14 -05:00
matt
62a163d835
Fix: Revert n+1 queries on the list API ( #2531 )
...
* feat: revert query
* feat: revert n+1 query
* feat: revert another n+1 query
* fix: payloads
2025-11-17 10:54:05 -05:00
Mohammed Nafees
49b11b2548
Fix seq scan in PollCronSchedules query ( #2524 )
...
* fix seq scan
* new CTE
* fmt
2025-11-14 17:15:39 +01:00
Mohammed Nafees
8d47de193b
Attempt to fix pgx multi dimensional slice reflection error #1 ( #2523 )
...
* multi dim slice pgx reflection error
* make sure to maintain the cardinality
* fix nil
2025-11-14 16:54:26 +01:00
Mohammed Nafees
f97171f245
[Go SDK] Case on worker labels for durable tasks ( #2511 )
...
* fix durable task worker labels
* fix assignment
2025-11-12 18:32:58 +01:00
Jishnu
e82915b626
feat: add pagination support for V1LogLineList ( #2354 )
...
* feat: pagination for v1 loglines list
* add: sqlc v1 query for loglines count
* add: generated rest-client changes for python sdk
* refactor: frontend logs UI with pagination elements
* add: ts-sdk logline pagination, py logline list pagination docstring
* feat: add since queryparam for v1logline, add infinitescroll pagination on FE
* add custom polling for logs refresh on FE, remove inefficient default refresh logic
* add since queryparam of v1logline to all rest-clients
* refactor: remove offset query param, add until query param(v1logline)
* remove pagination from v1loglinelist
* fix: inconsistent scroll behaviour, add pagination response schema on v1loglist
* add: infinite scroll behavior for smooth log scrolling; prefetch next page logs in advance
* fix: pagination scroll, when task is running, remove stale pagination data when logs tab inactive
* chore: lint
* chore: lint
---------
Co-authored-by: mrkaye97 <mrkaye97@gmail.com >
2025-11-07 17:38:29 +01:00
matt
2824646ad7
Immediate Payload Offloads OLAP Wiring ( #2492 )
...
* feat: payload store updates for immediate offloads
* feat: handle immediate offloads
* feat: start wiring up immediate offloads
* fix: get rid of payload store return
* feat: start immediate offloads work
* fix: event trigger put call
* fix: dynamic payload put depending on if offload worked
* fix: rm put
* fix: write event payload from the right place
* fix: dummy id for task events to prevent duplication issues with the tasks themselves
* fix: rm comments
* fix: rm unused struct
* fix: enabled wal
* fix: rm `RETURNING`
* fix: small cleanup
* fix: wal issue
2025-11-07 17:38:10 +01:00
Mohammed Nafees
c5496184be
pass labels to durable worker ( #2504 )
2025-11-07 16:10:01 +01:00
matt
7fe9806f5d
Feat: Configurable OLAP status update size limits ( #2499 )
...
* feat: configurable status updates
* fix: config
* fix: wiring
* feat: export limits from olap
* fix: param drilling
2025-11-06 13:37:40 -05:00
Mohammed Nafees
57ad1af68d
fix: deadlocks on trigger, olap prometheus background worker, otel improvements ( #2475 )
...
* print error log temporarily
* casing
* only for create-monitoring-event
* rate limit iterator
* add a debugger
* remove rate limiter
* improve otel on trigger
* cache probability stuff
* track misses
* move down one ln
* default
* Fix: Pass tx down into payload retrieve (#2483 )
* [Python] Feat: Dataclass Support (#2476 )
* fix: prevent lifespan error from hanging worker
* fix: handle cleanup
* feat: dataclass outputs
* feat: dataclasses
* feat: incremental dataclass work
* feat: dataclass tests
* fix: lint
* fix: register wf
* fix: ugh
* chore: changelog
* fix: validation issue
* fix: none check
* fix: lint
* fix: error type
* chore: regenerate examples (#2477 )
Co-authored-by: GitHub Action <action@github.com >
* feat: add health and metrics api on typescript sdk worker (#2457 )
* feat: add health and metrics api on typescript sdk worker
add: prom-client to fetch metrics data
add: track health status of worker across different states
* refactor: keep prom-client as optional dependency
* refactor: remove async import of prom-client
* chore: update package version for ts sdk
* fix: lint
* fix: lint, const enum
---------
Co-authored-by: mrkaye97 <mrkaye97@gmail.com >
* Update frontend onboarding steps (#2478 )
* Update frontend onboarding steps
* Update sidebar as well
* Fix Go SDK cron inputs (#2481 )
* cron input in Go SDK
* add example
* fix: pass tx down to retrieve
* fix: attempt 2, another pool use
* fix: spans and debugging for task statuses
* attempted hotfix on olap statuses
* process tenants in parallel in prom worker
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com >
Co-authored-by: Jishnu <jishnun789@gmail.com >
Co-authored-by: Sid Premkumar <sid.premkumar@gmail.com >
Co-authored-by: Mohammed Nafees <hello@mnafees.me >
Co-authored-by: Alexander Belanger <alexander@hatchet.run >
* move debugger package, clean up init
* remove probability factor logic
* remove debug
* fix: debugger instantiation
---------
Co-authored-by: Alexander Belanger <alexander@hatchet.run >
Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com >
Co-authored-by: mrkaye97 <mrkaye97@gmail.com >
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com >
Co-authored-by: Jishnu <jishnun789@gmail.com >
Co-authored-by: Sid Premkumar <sid.premkumar@gmail.com >
2025-11-04 09:05:44 +01:00
Mohammed Nafees
861e205171
Fix Go SDK cron inputs ( #2481 )
...
* cron input in Go SDK
* add example
2025-11-02 18:00:23 +01:00
abelanger5
7603b5ef39
feat: add grpc otel spans, better tx debugging ( #2474 )
...
* feat: add grpc otel spans, better tx debugging
* fix: ctx
2025-10-31 18:55:42 +01:00
matt
c33091e815
fix: include payload partitions in olap partitions to drop ( #2472 )
2025-10-31 10:39:39 +01:00
matt
99544bbd4e
Fix: read payloads from payload store for event API ( #2471 )
...
* fix: read payloads from payload store
* debug: add log
* debug: more log lines
* fix: bug
* fix: rm debug lines
* fix: comment loc
2025-10-31 00:57:36 +01:00
matt
4700c42183
fix: re-enable writes ( #2469 )
2025-10-31 00:11:43 +01:00
abelanger5
3a27bdf7cb
fix: don't send expiry alert on internal proxy tokens ( #2468 )
2025-10-30 23:17:56 +01:00
Mohammed Nafees
1aabbe3e94
Run cleanup on more tables ( #2467 )
...
* cleanup more tables
* use task retention period
* use task retention period
* cleanup
* fix query
2025-10-30 23:17:36 +01:00
Mohammed Nafees
bc3dc53433
no need to check for partitions when updating them ( #2466 )
2025-10-30 22:13:46 +01:00
Mohammed Nafees
b58359d7b3
Do not run cleanup on v1_workflow_concurrency_slot ( #2463 )
...
* do not run cleanup on v1_concurrency_slot
* fix health endpoints for engine
2025-10-30 15:34:50 +01:00
Mohammed Nafees
91cdb28ddf
Logs for liveness and readiness endpoints + PG conn stats ( #2460 )
...
* error logs for liveness and readiness endpoints with pg information
* use context timeout of 3 seconds
* context
2025-10-30 14:35:02 +01:00
abelanger5
745918ba2c
fix: reduce status update limits from 10k -> 1k ( #2462 )
...
* reduce status update limits from 10k -> 1k
* remove comment
2025-10-30 14:34:03 +01:00
Sid Premkumar
4f7a8da580
Add support for non-wal payload store logic to skip main db ( #2445 )
2025-10-29 07:24:11 +01:00
Mohammed Nafees
f1eccfddf4
[hotfix] Fix running task stats without concurrency keys ( #2452 )
...
* fix task stats running
* formatting
* if block fix
2025-10-28 22:19:52 +01:00
Mohammed Nafees
56eb054a1e
New tenant task stats endpoint ( #2433 )
...
* tenant workflow stats endpoint
* not olap but task
* lint
* enable rate limiting on endpoint
* fix SQL query
* spelling
* lesser CTEs
* fix query
* rename to task
* update query
* fix nil pointer
* typed API object
* queues have counts
2025-10-28 16:52:19 +01:00
Mohammed Nafees
54701e87d0
Retry RMQ messages indefinitely with aggressive logging after 5 retries ( #2448 )
...
* aggressively log errors when rmq retry more than 5 times
* revisit comments
* new vars and fix integration test
* fix test
* log only after 5 retries
2025-10-28 16:51:39 +01:00
abelanger5
e1fdeeaf1c
fix: payload performance ( #2441 )
...
* change some olap flush settings
* increase timeouts for payload wal
* fix: improve performance of payload wal metrics
* slight updates
* more small tweaks
* undo some olap changes, don't offload some payloads
* remove double reads
* try reducing wal poll limit
* analyze v1_dag
* move partition method
2025-10-23 17:45:49 -04:00
Mohammed Nafees
cf5c5989ff
add vars to tune concurrency poller ( #2428 )
2025-10-23 11:36:12 -04:00
abelanger5
1f35782b59
fix: move err check to before len check ( #2437 )
2025-10-21 19:24:19 -04:00
matt
c6e154fd03
Feat: OLAP Payloads ( #2410 )
...
* feat: olap payloads table
* feat: olap queue messages for payload puts
* feat: wire up writes on task write
* driveby: add + ignore psql-connect
* fix: down migration
* fix: use external id for pk
* fix: insert query
* fix: more external ids
* fix: bit more cleanup
* feat: dags
* fix: the rest of the refs
* fix: placeholder uuid
* fix: write external ids
* feat: wire up messages over the queue
* fix: panic
* Revert "fix: panic"
This reverts commit c0adccf2ea .
* Revert "feat: wire up messages over the queue"
This reverts commit 36f425f3c1 .
* fix: rm unused method
* fix: rm more
* fix: rm cruft
* feat: wire up failures
* feat: start wiring up completed events
* fix: more wiring
* fix: finish wiring up completed event payloads
* fix: lint
* feat: start wiring up external ids in the core
* feat: olap pub
* fix: add returning
* fix: wiring
* debug: log lines for pubs
* fix: external id writes
* Revert "debug: log lines for pubs"
This reverts commit fe430840bd .
* fix: rm sample
* debug: rm pub buffer param
* Revert "debug: rm pub buffer param"
This reverts commit b42a5cacbb .
* debug: stuck queries
* debug: more logs
* debug: yet more logs
* fix: rename BulkRetrieve -> Retrieve
* chore: lint
* fix: naming
* fix: conn leak in putpayloads
* fix: revert debug
* Revert "debug: more logs"
This reverts commit 95da7de64f .
* Revert "debug: stuck queries"
This reverts commit 8fda64adc4 .
* feat: improve getters, olap getter
* fix: key type
* feat: first pass at pulling olap payloads from the payload store
* fix: start fixing bugs
* fix: start reworking `includePayloads` param
* fix: include payloads wiring
* feat: analyze for payloads
* fix: simplify writes more + write event payloads
* feat: read out event payloads
* feat: env vars for dual writes
* refactor: clean up task prop drilling a bit
* feat: add include payloads params to python for tests
* fix: tx commit
* fix: dual writes
* fix: not null constraint
* fix: one more
* debug: logging
* fix: more debugging, tweak function sig
* fix: function sig
* fix: refs
* debug: more logging
* debug: more logging
* debug: fix condition
* debug: overwrite properly
* fix: revert debug
* fix: rm more drilling
* fix: comments
* fix: partitioning jobs
* chore: ver
* fix: bug, docs
* hack: dummy id and inserted at for payload offloads
* fix: bug
* fix: no need to handle offloads for task event data
* hack: jitter + current ts
* fix: short circuit
* fix: offload payloads in a tx
* fix: uncomment sampling
* fix: don't offload if external store is disabled
* chore: gen sqlc
* fix: migration
* fix: start reworking types
* fix: couple more
* fix: rm unused code
* fix: drill includePayloads down again
* fix: silence annoying error in some cases
* fix: always store payloads
* debug: use workflow run id for input
* fix: improve logging
* debug: logging on retrieve
* debug: task input
* fix: use correct field
* debug: write even null payloads to limit errors
* debug: hide error lines
* fix: quieting more errors
* fix: duplicate example names, remove print lines
* debug: add logging for olap event writes
* hack: immediate event offloads and cutovers
* fix: rm log line
* fix: import
* fix: short circuit events
* fix: duped names
2025-10-20 09:09:49 -04:00
Mohammed Nafees
8f57989730
fix race condition in child spawn ( #2429 )
2025-10-17 16:56:41 +02:00
Mohammed Nafees
e2b1f1353e
Fix OTel span attribute naming convention ( #2409 )
...
* rename spans according to convention
* low cardinality
2025-10-16 18:43:40 +02:00