Gabe Ruttner
1e2a587b21
fix: GetLatestWorkflowVersionForWorkflows ( #2590 )
...
* fix query
* gen
2025-12-02 05:14:08 -08:00
Sid Premkumar
709dd89a18
Add gzip compression ( #2539 )
...
* Add gzip compression init
* revert
* Feat: Initial cross-domain identify setup (#2533 )
* feat: initial setup
* fix: factor out
* chore: lint
* fix: xss vuln
* feat: set up properly
* fix: lint
* fix: key
* fix: keys, cleanup
* Fix: use sessionStorage instead of localStorage (#2541 )
* chore(deps): bump golang.org/x/crypto from 0.44.0 to 0.45.0 (#2545 )
Bumps [golang.org/x/crypto](https://github.com/golang/crypto ) from 0.44.0 to 0.45.0.
- [Commits](https://github.com/golang/crypto/compare/v0.44.0...v0.45.0 )
---
updated-dependencies:
- dependency-name: golang.org/x/crypto
dependency-version: 0.45.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore(deps): bump google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml (#2547 )
Bumps [google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml](https://github.com/google/osv-scanner-action ) from 2.2.4 to 2.3.0.
- [Release notes](https://github.com/google/osv-scanner-action/releases )
- [Commits](https://github.com/google/osv-scanner-action/compare/v2.2.4...v2.3.0 )
---
updated-dependencies:
- dependency-name: google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml
dependency-version: 2.3.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [Go SDK] Resubscribe and get a new listener stream when gRPC connections fail (#2544 )
* fix listener cache issue to resubscribe when erroring out
* worker retry message clarification (#2543 )
* add another retry layer and add comments
* fix loop logic
* make listener channel retry
* Compression test utils, and add log to indicate its enabled
* clean + fix
* more fallbacks
* common pgxpool afterconnect method (#2553 )
* remove
* lint
* lint
* add cpu monitor during test
* fix background monitor and lint
* Make envvar to disable compression
* cleanup monitoring
* PR Feedback
* Update paths in compression tests + bump package versions
* path issue on test script
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: matt <mrkaye97@gmail.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mohammed Nafees <hello@mnafees.me >
2025-11-26 17:14:38 -05:00
matt
be9c7df026
Fix: Noisy Payload Error ( #2561 )
...
* fix: noisy error
* fix: only error if the task is completed but has no payload
2025-11-26 17:04:37 -05:00
Gabe Ruttner
3e5f737ef5
fix: query optimization get latest workflow version ( #2576 )
2025-11-26 08:56:20 -08:00
Gabe Ruttner
c920d54519
analyze v1 lookup table ( #2568 )
...
Co-authored-by: matt <mrkaye97@gmail.com >
2025-11-25 17:25:40 -05:00
matt
727a8fe470
Fix: OLAP Task Event Dual Write Bug ( #2572 )
...
* fix: task events bug
* fix: fallback bug
* fix: simplfiy test
2025-11-25 17:24:56 -05:00
matt
8350cb2205
Revert "optimize UUID sqlchelpers ( #2532 )" ( #2571 )
...
This reverts commit 9a09105e52 .
2025-11-25 12:10:34 -05:00
Mohammed Nafees
9a09105e52
optimize UUID sqlchelpers ( #2532 )
2025-11-24 16:50:21 +01:00
Mohammed Nafees
7bb3e1da8d
common pgxpool afterconnect method ( #2553 )
2025-11-21 14:55:04 +01:00
Mohammed Nafees
f66fe63ad0
[Go SDK] Resubscribe and get a new listener stream when gRPC connections fail ( #2544 )
...
* fix listener cache issue to resubscribe when erroring out
* worker retry message clarification (#2543 )
* add another retry layer and add comments
* fix loop logic
* make listener channel retry
2025-11-20 19:13:24 +01:00
abelanger5
2249ef3b79
fix: small scheduler optimizations ( #2426 )
...
* fix: actually increment snapshot count
* add a context with timeout to wrap replenish
2025-11-17 15:45:14 -05:00
matt
62a163d835
Fix: Revert n+1 queries on the list API ( #2531 )
...
* feat: revert query
* feat: revert n+1 query
* feat: revert another n+1 query
* fix: payloads
2025-11-17 10:54:05 -05:00
Mohammed Nafees
49b11b2548
Fix seq scan in PollCronSchedules query ( #2524 )
...
* fix seq scan
* new CTE
* fmt
2025-11-14 17:15:39 +01:00
Mohammed Nafees
8d47de193b
Attempt to fix pgx multi dimensional slice reflection error #1 ( #2523 )
...
* multi dim slice pgx reflection error
* make sure to maintain the cardinality
* fix nil
2025-11-14 16:54:26 +01:00
Mohammed Nafees
f97171f245
[Go SDK] Case on worker labels for durable tasks ( #2511 )
...
* fix durable task worker labels
* fix assignment
2025-11-12 18:32:58 +01:00
Jishnu
e82915b626
feat: add pagination support for V1LogLineList ( #2354 )
...
* feat: pagination for v1 loglines list
* add: sqlc v1 query for loglines count
* add: generated rest-client changes for python sdk
* refactor: frontend logs UI with pagination elements
* add: ts-sdk logline pagination, py logline list pagination docstring
* feat: add since queryparam for v1logline, add infinitescroll pagination on FE
* add custom polling for logs refresh on FE, remove inefficient default refresh logic
* add since queryparam of v1logline to all rest-clients
* refactor: remove offset query param, add until query param(v1logline)
* remove pagination from v1loglinelist
* fix: inconsistent scroll behaviour, add pagination response schema on v1loglist
* add: infinite scroll behavior for smooth log scrolling; prefetch next page logs in advance
* fix: pagination scroll, when task is running, remove stale pagination data when logs tab inactive
* chore: lint
* chore: lint
---------
Co-authored-by: mrkaye97 <mrkaye97@gmail.com >
2025-11-07 17:38:29 +01:00
matt
2824646ad7
Immediate Payload Offloads OLAP Wiring ( #2492 )
...
* feat: payload store updates for immediate offloads
* feat: handle immediate offloads
* feat: start wiring up immediate offloads
* fix: get rid of payload store return
* feat: start immediate offloads work
* fix: event trigger put call
* fix: dynamic payload put depending on if offload worked
* fix: rm put
* fix: write event payload from the right place
* fix: dummy id for task events to prevent duplication issues with the tasks themselves
* fix: rm comments
* fix: rm unused struct
* fix: enabled wal
* fix: rm `RETURNING`
* fix: small cleanup
* fix: wal issue
2025-11-07 17:38:10 +01:00
Mohammed Nafees
c5496184be
pass labels to durable worker ( #2504 )
2025-11-07 16:10:01 +01:00
matt
7fe9806f5d
Feat: Configurable OLAP status update size limits ( #2499 )
...
* feat: configurable status updates
* fix: config
* fix: wiring
* feat: export limits from olap
* fix: param drilling
2025-11-06 13:37:40 -05:00
Mohammed Nafees
57ad1af68d
fix: deadlocks on trigger, olap prometheus background worker, otel improvements ( #2475 )
...
* print error log temporarily
* casing
* only for create-monitoring-event
* rate limit iterator
* add a debugger
* remove rate limiter
* improve otel on trigger
* cache probability stuff
* track misses
* move down one ln
* default
* Fix: Pass tx down into payload retrieve (#2483 )
* [Python] Feat: Dataclass Support (#2476 )
* fix: prevent lifespan error from hanging worker
* fix: handle cleanup
* feat: dataclass outputs
* feat: dataclasses
* feat: incremental dataclass work
* feat: dataclass tests
* fix: lint
* fix: register wf
* fix: ugh
* chore: changelog
* fix: validation issue
* fix: none check
* fix: lint
* fix: error type
* chore: regenerate examples (#2477 )
Co-authored-by: GitHub Action <action@github.com >
* feat: add health and metrics api on typescript sdk worker (#2457 )
* feat: add health and metrics api on typescript sdk worker
add: prom-client to fetch metrics data
add: track health status of worker across different states
* refactor: keep prom-client as optional dependency
* refactor: remove async import of prom-client
* chore: update package version for ts sdk
* fix: lint
* fix: lint, const enum
---------
Co-authored-by: mrkaye97 <mrkaye97@gmail.com >
* Update frontend onboarding steps (#2478 )
* Update frontend onboarding steps
* Update sidebar as well
* Fix Go SDK cron inputs (#2481 )
* cron input in Go SDK
* add example
* fix: pass tx down to retrieve
* fix: attempt 2, another pool use
* fix: spans and debugging for task statuses
* attempted hotfix on olap statuses
* process tenants in parallel in prom worker
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com >
Co-authored-by: Jishnu <jishnun789@gmail.com >
Co-authored-by: Sid Premkumar <sid.premkumar@gmail.com >
Co-authored-by: Mohammed Nafees <hello@mnafees.me >
Co-authored-by: Alexander Belanger <alexander@hatchet.run >
* move debugger package, clean up init
* remove probability factor logic
* remove debug
* fix: debugger instantiation
---------
Co-authored-by: Alexander Belanger <alexander@hatchet.run >
Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com >
Co-authored-by: mrkaye97 <mrkaye97@gmail.com >
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com >
Co-authored-by: Jishnu <jishnun789@gmail.com >
Co-authored-by: Sid Premkumar <sid.premkumar@gmail.com >
2025-11-04 09:05:44 +01:00
Mohammed Nafees
861e205171
Fix Go SDK cron inputs ( #2481 )
...
* cron input in Go SDK
* add example
2025-11-02 18:00:23 +01:00
abelanger5
7603b5ef39
feat: add grpc otel spans, better tx debugging ( #2474 )
...
* feat: add grpc otel spans, better tx debugging
* fix: ctx
2025-10-31 18:55:42 +01:00
matt
c33091e815
fix: include payload partitions in olap partitions to drop ( #2472 )
2025-10-31 10:39:39 +01:00
matt
99544bbd4e
Fix: read payloads from payload store for event API ( #2471 )
...
* fix: read payloads from payload store
* debug: add log
* debug: more log lines
* fix: bug
* fix: rm debug lines
* fix: comment loc
2025-10-31 00:57:36 +01:00
matt
4700c42183
fix: re-enable writes ( #2469 )
2025-10-31 00:11:43 +01:00
abelanger5
3a27bdf7cb
fix: don't send expiry alert on internal proxy tokens ( #2468 )
2025-10-30 23:17:56 +01:00
Mohammed Nafees
1aabbe3e94
Run cleanup on more tables ( #2467 )
...
* cleanup more tables
* use task retention period
* use task retention period
* cleanup
* fix query
2025-10-30 23:17:36 +01:00
Mohammed Nafees
bc3dc53433
no need to check for partitions when updating them ( #2466 )
2025-10-30 22:13:46 +01:00
Mohammed Nafees
b58359d7b3
Do not run cleanup on v1_workflow_concurrency_slot ( #2463 )
...
* do not run cleanup on v1_concurrency_slot
* fix health endpoints for engine
2025-10-30 15:34:50 +01:00
Mohammed Nafees
91cdb28ddf
Logs for liveness and readiness endpoints + PG conn stats ( #2460 )
...
* error logs for liveness and readiness endpoints with pg information
* use context timeout of 3 seconds
* context
2025-10-30 14:35:02 +01:00
abelanger5
745918ba2c
fix: reduce status update limits from 10k -> 1k ( #2462 )
...
* reduce status update limits from 10k -> 1k
* remove comment
2025-10-30 14:34:03 +01:00
Sid Premkumar
4f7a8da580
Add support for non-wal payload store logic to skip main db ( #2445 )
2025-10-29 07:24:11 +01:00
Mohammed Nafees
f1eccfddf4
[hotfix] Fix running task stats without concurrency keys ( #2452 )
...
* fix task stats running
* formatting
* if block fix
2025-10-28 22:19:52 +01:00
Mohammed Nafees
56eb054a1e
New tenant task stats endpoint ( #2433 )
...
* tenant workflow stats endpoint
* not olap but task
* lint
* enable rate limiting on endpoint
* fix SQL query
* spelling
* lesser CTEs
* fix query
* rename to task
* update query
* fix nil pointer
* typed API object
* queues have counts
2025-10-28 16:52:19 +01:00
Mohammed Nafees
54701e87d0
Retry RMQ messages indefinitely with aggressive logging after 5 retries ( #2448 )
...
* aggressively log errors when rmq retry more than 5 times
* revisit comments
* new vars and fix integration test
* fix test
* log only after 5 retries
2025-10-28 16:51:39 +01:00
abelanger5
e1fdeeaf1c
fix: payload performance ( #2441 )
...
* change some olap flush settings
* increase timeouts for payload wal
* fix: improve performance of payload wal metrics
* slight updates
* more small tweaks
* undo some olap changes, don't offload some payloads
* remove double reads
* try reducing wal poll limit
* analyze v1_dag
* move partition method
2025-10-23 17:45:49 -04:00
Mohammed Nafees
cf5c5989ff
add vars to tune concurrency poller ( #2428 )
2025-10-23 11:36:12 -04:00
abelanger5
1f35782b59
fix: move err check to before len check ( #2437 )
2025-10-21 19:24:19 -04:00
matt
c6e154fd03
Feat: OLAP Payloads ( #2410 )
...
* feat: olap payloads table
* feat: olap queue messages for payload puts
* feat: wire up writes on task write
* driveby: add + ignore psql-connect
* fix: down migration
* fix: use external id for pk
* fix: insert query
* fix: more external ids
* fix: bit more cleanup
* feat: dags
* fix: the rest of the refs
* fix: placeholder uuid
* fix: write external ids
* feat: wire up messages over the queue
* fix: panic
* Revert "fix: panic"
This reverts commit c0adccf2ea .
* Revert "feat: wire up messages over the queue"
This reverts commit 36f425f3c1 .
* fix: rm unused method
* fix: rm more
* fix: rm cruft
* feat: wire up failures
* feat: start wiring up completed events
* fix: more wiring
* fix: finish wiring up completed event payloads
* fix: lint
* feat: start wiring up external ids in the core
* feat: olap pub
* fix: add returning
* fix: wiring
* debug: log lines for pubs
* fix: external id writes
* Revert "debug: log lines for pubs"
This reverts commit fe430840bd .
* fix: rm sample
* debug: rm pub buffer param
* Revert "debug: rm pub buffer param"
This reverts commit b42a5cacbb .
* debug: stuck queries
* debug: more logs
* debug: yet more logs
* fix: rename BulkRetrieve -> Retrieve
* chore: lint
* fix: naming
* fix: conn leak in putpayloads
* fix: revert debug
* Revert "debug: more logs"
This reverts commit 95da7de64f .
* Revert "debug: stuck queries"
This reverts commit 8fda64adc4 .
* feat: improve getters, olap getter
* fix: key type
* feat: first pass at pulling olap payloads from the payload store
* fix: start fixing bugs
* fix: start reworking `includePayloads` param
* fix: include payloads wiring
* feat: analyze for payloads
* fix: simplify writes more + write event payloads
* feat: read out event payloads
* feat: env vars for dual writes
* refactor: clean up task prop drilling a bit
* feat: add include payloads params to python for tests
* fix: tx commit
* fix: dual writes
* fix: not null constraint
* fix: one more
* debug: logging
* fix: more debugging, tweak function sig
* fix: function sig
* fix: refs
* debug: more logging
* debug: more logging
* debug: fix condition
* debug: overwrite properly
* fix: revert debug
* fix: rm more drilling
* fix: comments
* fix: partitioning jobs
* chore: ver
* fix: bug, docs
* hack: dummy id and inserted at for payload offloads
* fix: bug
* fix: no need to handle offloads for task event data
* hack: jitter + current ts
* fix: short circuit
* fix: offload payloads in a tx
* fix: uncomment sampling
* fix: don't offload if external store is disabled
* chore: gen sqlc
* fix: migration
* fix: start reworking types
* fix: couple more
* fix: rm unused code
* fix: drill includePayloads down again
* fix: silence annoying error in some cases
* fix: always store payloads
* debug: use workflow run id for input
* fix: improve logging
* debug: logging on retrieve
* debug: task input
* fix: use correct field
* debug: write even null payloads to limit errors
* debug: hide error lines
* fix: quieting more errors
* fix: duplicate example names, remove print lines
* debug: add logging for olap event writes
* hack: immediate event offloads and cutovers
* fix: rm log line
* fix: import
* fix: short circuit events
* fix: duped names
2025-10-20 09:09:49 -04:00
Mohammed Nafees
8f57989730
fix race condition in child spawn ( #2429 )
2025-10-17 16:56:41 +02:00
Mohammed Nafees
e2b1f1353e
Fix OTel span attribute naming convention ( #2409 )
...
* rename spans according to convention
* low cardinality
2025-10-16 18:43:40 +02:00
Mohammed Nafees
d9268c7270
Cleanup job for old and invalid entries ( #2378 )
...
* auto run table cleanup
* batched cleanup of tables
* address PR comments
* fix timeout
* update queries
* fix shouldContinue
* also call cleanup for v1_workflow_concurrency_slot
* fix comment
* comment fix
2025-10-16 16:51:08 +02:00
matt
aa38c6d2df
fix: payload fallback for child runs ( #2421 )
2025-10-15 16:16:51 -04:00
abelanger5
b16be655be
feat: stateful polling intervals ( #2417 )
...
* initial pass on stateful intervals
* pr review comments + add evict expired idempotency keys
* fix: goroutine leak and name vars better
* fix some cleanup logic
2025-10-15 11:40:22 -04:00
matt
5b5adcb8ed
Feat: Scheduled run detail view, bulk cancel / replay with pagination helper ( #2416 )
...
* feat: endpoint for listing external ids
* feat: wire up external id list
* chore: regen api
* feat: py sdk wrapper
* fix: since type
* fix: log
* fix: improve defaults for statuses
* feat: docs
* feat: docs
* fix: rm extra file
* feat: add id column to scheduled runs
* feat: side panel for scheduled runs
* fix: side panel header pinned
* fix: border + padding
* chore: gen
* chore: lint
* chore: changelog, version
* fix: spacing of cols
* fix: empty webhook resource limit
* fix: tsc
* fix: sort organizations and tenants alphabetically
2025-10-15 11:36:45 -04:00
Mohammed Nafees
a750ce950d
Introduce vars to tune ANALYZE job gocron run intervals ( #2407 )
...
* introduce cars to tune ANALYZE job gocron run intervals
* update config doc
* fix assignment
2025-10-10 11:02:10 +02:00
Mohammed Nafees
0695db820c
Use UTC for all pgx connections and check for database TZ ( #2398 )
...
* set utc for all pgx sessions
* helper func
* also accept IANA Etc/UTC
2025-10-09 10:54:27 +02:00
matt
d677cb2b08
feat: gzip compression for large payloads, persistent OLAP writes ( #2368 )
...
* debug: remove event pub
* add additional spans to publish message
* debug: don't publish payloads
* fix: persistent messages on olap
* add back other payloads
* remove pub buffers temporarily
* fix: correct queue
* hacky partitioning
* add back pub buffers to scheduler
* don't send no worker events
* add attributes for queue name and message id to publish
* add back pub buffers to grpc api
* remove pubs again, no worker writes though
* task processing queue hashes
* remove payloads again
* gzip compression over 5kb
* add back task controller payloads
* add back no worker requeueing event, with expirable lru cache
* add back pub buffers
* remove hash partitioned queues
* small fixes
* ignore lru cache top fn
* config vars for compression, disable by default
---------
Co-authored-by: Alexander Belanger <alexander@hatchet.run >
2025-10-08 11:44:04 -04:00
matt
c48a3211b5
Feat: Immediate Payload Offloads ( #2375 )
...
* feat: modify operations
* feat: attempt 1 at doing the cutover + the offload in the same query
* fix: operation write
* debug: add some print lines
* fix: check constraint
* fix: select records to offload properly
* fix: fn
* feat: add second table to hold queued cutovers
* fix: start reworking queries
* fix: select
* fix: missing cols
* fix: for update
* fix: query name for finalize
* feat: cut over query finalizer
* feat: query for writes into cutover queue
* feat: add query for cut over polling
* feat: add cutover job
* fix: rm operations
* feat: write cutover queue items at the same time as setting payload keys
* fix: simplify into single query
* fix: revert debug
* chore: lint
* fix: don't remove operation column yet
* feat: refactor into struct of opts and make job intervals configurable
* fix: add analyze for payload table
* fix: schema copy paste
* fix: drop fk
* feat: add an index to help with poll performance for a short while
* fix: simplify poll ordering
* fix: simplify more
* fix: ctx
Co-authored-by: Mohammed Nafees <hello@mnafees.me >
* Feat: Task Event and DAG Payloads (#2370 )
* feat: initial work on task event payloads
* fix: iterator
* feat: wire up task events
* fix: backwards compat
* fix: migrations
* fix: duplication
* fix: col
* fix: add timestamptz col
* fix: overwrite
* fix: rm debugging
* fix: revert debugging
* fix: rm unused cols
* fix: spelling
* fix: use `current_timestamp` as default
* feat: dual writes for payloads
* fix: improve debug lines
* debug: add log
* debug: always write
* fix: make annoying log debug level
* fix: rm debug lines
* fix: add comment
* feat: dag payloads
* fix: index
* fix: migration ver
* fix: error msg
Co-authored-by: abelanger5 <belanger@sas.upenn.edu >
* fix: create, then set default
* fix: inserted at copy paste
* fix: n+1 query
* fix: another n+1 query
* fix: rm unused singleton retrieve
---------
Co-authored-by: abelanger5 <belanger@sas.upenn.edu >
---------
Co-authored-by: Mohammed Nafees <hello@mnafees.me >
Co-authored-by: abelanger5 <belanger@sas.upenn.edu >
2025-10-08 11:22:34 -04:00
matt
8fd90a29a6
Feat: Pausable Crons ( #2395 )
...
* feat: update query, patch route
* feat: api for update
* fix: simplify ui a bit
* feat: wire up fe
* feat: improve copy, spinners
* fix: invert naming to avoid horrible double negative
* fix: improve handling of optional types
* fix: last bits of naming
* feat: persist enabled flag across workflow versions properly
* fix: update spinner
2025-10-08 11:12:14 -04:00