Commit Graph

34 Commits

Author SHA1 Message Date
abelanger5
e1fdeeaf1c fix: payload performance (#2441)
* change some olap flush settings

* increase timeouts for payload wal

* fix: improve performance of payload wal metrics

* slight updates

* more small tweaks

* undo some olap changes, don't offload some payloads

* remove double reads

* try reducing wal poll limit

* analyze v1_dag

* move partition method
2025-10-23 17:45:49 -04:00
Mohammed Nafees
d9268c7270 Cleanup job for old and invalid entries (#2378)
* auto run table cleanup

* batched cleanup of tables

* address PR comments

* fix timeout

* update queries

* fix shouldContinue

* also call cleanup for v1_workflow_concurrency_slot

* fix comment

* comment fix
2025-10-16 16:51:08 +02:00
abelanger5
b16be655be feat: stateful polling intervals (#2417)
* initial pass on stateful intervals

* pr review comments + add evict expired idempotency keys

* fix: goroutine leak and name vars better

* fix some cleanup logic
2025-10-15 11:40:22 -04:00
matt
c48a3211b5 Feat: Immediate Payload Offloads (#2375)
* feat: modify operations

* feat: attempt 1 at doing the cutover + the offload in the same query

* fix: operation write

* debug: add some print lines

* fix: check constraint

* fix: select records to offload properly

* fix: fn

* feat: add second table to hold queued cutovers

* fix: start reworking queries

* fix: select

* fix: missing cols

* fix: for update

* fix: query name for finalize

* feat: cut over query finalizer

* feat: query for writes into cutover queue

* feat: add query for cut over polling

* feat: add cutover job

* fix: rm operations

* feat: write cutover queue items at the same time as setting payload keys

* fix: simplify into single query

* fix: revert debug

* chore: lint

* fix: don't remove operation column yet

* feat: refactor into struct of opts and make job intervals configurable

* fix: add analyze for payload table

* fix: schema copy paste

* fix: drop fk

* feat: add an index to help with poll performance for a short while

* fix: simplify poll ordering

* fix: simplify more

* fix: ctx

Co-authored-by: Mohammed Nafees <hello@mnafees.me>

* Feat: Task Event and DAG Payloads (#2370)

* feat: initial work on task event payloads

* fix: iterator

* feat: wire up task events

* fix: backwards compat

* fix: migrations

* fix: duplication

* fix: col

* fix: add timestamptz col

* fix: overwrite

* fix: rm debugging

* fix: revert debugging

* fix: rm unused cols

* fix: spelling

* fix: use `current_timestamp` as default

* feat: dual writes for payloads

* fix: improve debug lines

* debug: add log

* debug: always write

* fix: make annoying log debug level

* fix: rm debug lines

* fix: add comment

* feat: dag payloads

* fix: index

* fix: migration ver

* fix: error msg

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

* fix: create, then set default

* fix: inserted at copy paste

* fix: n+1 query

* fix: another n+1 query

* fix: rm unused singleton retrieve

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

---------

Co-authored-by: Mohammed Nafees <hello@mnafees.me>
Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2025-10-08 11:22:34 -04:00
matt
8ae760dd15 fix: revert partition pruning (#2295) 2025-09-12 17:33:13 -04:00
matt
c759da79aa Feat: Partition pruning for ListTaskParentOutputs, lookup index for v1_payload_wal (#2294)
* fix: small query rework, partition pruning for `ListTaskParentOutputs`

* feat: migration for adding index

* fix: copilot comments
2025-09-12 13:18:08 -04:00
matt
92843bb277 Feat: Payload Store Repository (#2047)
* feat: add table for storing payloads

* feat: add payload type enum

* feat: gen sqlc

* feat: initial sql impl

* feat: add payload store repo to shared

* feat: add overwrite

* fix: impl

* feat: bulk op

* feat: initial wiring of inputs for task triggers

* feat: wire up dag matches

* feat: create V1TaskWithPayload and use it everywhere

* fix: couple bugs

* fix: clean up types

* fix: overwrite

* fix: rm input from replay

* fix: move payload store to shared repo

* fix: schema

* refactor: repo setup

* refactor: repos

* fix: gen

* chore: lint

* fix: rename

* feat: naming, write dag inputs

* fix: more naming, trigger bug

* fix: dual writes for now

* fix: pass in tx

* feat: initial work on offloader

* feat: improve external offloader

* fix: some refs

* add withExternalHandler

* fix: improve impl of external store

* feat: implement offloading, fix other impls

* feat: add query to update JSON

* fix: implement offloading + updating records in payloads table

* feat: add WAL table

* feat: add queries for polling WAL and evicting

* feat: wire up writes into WAL

* fix: get job working

* refactor: improve types

* fix: infinite loop

* feat: improve offloading logic to run in two separate txes

* refactor: rework how overrides work

* fix: lint

* fix: migration number

* fix: migration

* fix: migration version

* fix: revert back to reading payloads out

* fix: fall back to previous input, part i

* fix: input fallback

* fix: add back input to replay

* fix: input fallback in dispatcher

* fix: nil check

* feat: advisory locks, part i

* fix: no skip locked

* feat: hash partitioned wal table

* fix: modify queries a bit, tweak crud enum

* fix: pk order, function to find tenants

* feat: wal processing

* fix: only write wal if an external store is enabled, fix offloading logic

* fix: spacing

* feat: schema cleanup

* fix: rm external store loc name

* fix: set content to null when offloading

* fix: cleanup, naming

* fix: pass overwrite payload store along

* debug: add some logging

* Revert "debug: add some logging"

This reverts commit 43e71eadf1.

* fix: typo

* fx: add offloatAt to store opts for offloading

* fix: handle leasing with advisory lock

* fix: struct def

* fix: requeue on payloads not found

* fix: rm hack for triggers

* fix: revert empty input on write

* fix: write input

* feat: env var for enabling / disabling dual writes

* feat: wire up dual writes

* fix: comments

* feat: generics!

* fix: panic from type cast

* fix: migration

* fix: generic

* fix: hack for T key in map

* fix: cleanup
2025-09-12 09:53:01 -04:00
matt
f385964fcc Fix: Scheduled runs race w/ idempotency key check (#2077)
* feat: create table for storing key

* feat: is_filled col

* feat: idempotency repo

* fix: handle filling

* fix: improve queries

* feat: check if was created already before triggering

* fix: handle partitions

* feat: improve schema

* feat: initial idempotency key claiming impl

* fix: db

* fix: sql fmt

* feat: crazy query

* fix: downstream

* fix: queries

* fix: query bug

* fix: migration rename

* fix: couple small issues

* feat: eviction job

* fix: copilot comments

* fix: index name

* fix: rm comment
2025-09-12 07:54:42 -04:00
Mohammed Nafees
1a2891154e Periodically run ANALYZE on v1_task and v1_task_event (#2236)
* analyze v1_task and v1_task_event tables periodically

* copy pasta
2025-09-02 11:07:05 -04:00
abelanger5
f62142f74d fix: explicit ordering in ReleaseTasks and lock parent slots (#2201)
* fix: explicit ordering in ReleaseTasks and lock parent slots

* fix: IN instead of =

* fix: gen diff
2025-08-26 11:06:55 -04:00
abelanger5
2a8ba155fa fix: match and cancel newest/in progress deadlocks (#2190) 2025-08-25 12:54:08 -04:00
abelanger5
1407594902 fix: move rate limited queue items off the main queue (#2155)
* fix: move rate limited queue items off the main queue

* preserve FIFO behavior on queues

* fix unit tests, address pr comments

* fix: generated

* rename table
2025-08-18 11:31:21 -04:00
matt
3dcd6059c8 Fix: Partition pruning for PreflightCheckTasksForReplay (#2029)
* feat: partition pruning for PreflightCheckTasksForReplay

* fix: use 1d as placeholder

* fix: use current time instead

* fix: pass inserted ats through correctly

* fix: try adding a CTE

* fix: query cleanup
2025-07-21 20:30:59 +02:00
Mohammed Nafees
cbc962ea2b Ensure table partitions exist for tomorrow (#1880)
* add method to ensure table partitions exist for tomorrow

* fix formatting

* add generated tasks sql go

* proper lint for new sql query

* run pre commit check to fix lint issues
2025-06-20 01:57:32 +05:30
abelanger5
33d1bf60d6 revert: removing replay logic (#1864)
* revert: removing input from replay

* add to replayopt as well

* add a comment
2025-06-13 18:28:22 -04:00
Gabe Ruttner
68de72d534 Ops disableable replay (#1855)
* try lock

* revert

* Update pkg/repository/v1/scheduler_concurrency.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update pkg/repository/v1/scheduler_concurrency.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* both strats

* disable

* remove input

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-12 15:25:38 -04:00
abelanger5
c1a0783fd8 fix: one more issue in dag populator (#1698)
* fix: one more issue in dag populator

* fix: order by
2025-05-08 20:26:30 -04:00
abelanger5
3d31d206e8 fix: select the max retry count from the DAG data (#1697) 2025-05-08 13:53:15 -04:00
abelanger5
dacf48180b feat: sampling (#1592)
* feat: sampling

* Update internal/services/controllers/v1/olap/controller.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* docs: sampling

* sampling -> trace sampling

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-23 17:28:58 -04:00
Matt Kaye
80137736af Feat: Priority (#1513)
* feat: initial work wiring up priorities

* fix: add default to default prio in the db

* feat: wire priority through api on wf creation

* feat: extend python test

* feat: priority for scheduled workflows

* feat: wire priority through python api

* feat: more wiring priority through the api

* feat: I think it works?

* feat: e2e test for priority

* it works!

* feat: expand tests for default priorities

* feat: e2e scheduling test

* fix: skip broken test for now

* fix: lint

* feat: add priority columns to cron and schedule ref  tables

* feat: update inserts to include prio

* feat: wire up more apis

* feat: more wiring

* feat: wire up more rest api fields

* chore: cruft

* fix: more wiring

* fix: lint

* chore: gen + wire up priorities

* fix: retries

* fix: try changing fixture scope

* chore: bump version again

* feat: send priority with action payload

* fix: generate script

* Feat  priority ts (#1518)

* feat: initial work wiring up priorities

* fix: add default to default prio in the db

* feat: wire priority through api on wf creation

* feat: extend python test

* feat: priority for scheduled workflows

* feat: wire priority through python api

* feat: more wiring priority through the api

* feat: I think it works?

* feat: e2e test for priority

* it works!

* feat: expand tests for default priorities

* feat: e2e scheduling test

* chore: minor version for priority

* fix: skip broken test for now

* fix: lint

* feat: add priority columns to cron and schedule ref  tables

* feat: update inserts to include prio

* feat: wire up more apis

* feat: more wiring

* feat: wire up more rest api fields

* chore: cruft

* fix: more wiring

* fix: lint

* chore: gen + wire up priorities

* fix: increase timeout

* fix: retries

* fix: try changing fixture scope

* chore: generate

* fix: set schedule priority

* feat: priority

* fix: move priority to wf

* release: 1.2.0

* rm log

* fix: import

* fix: add priority to step

---------

Co-authored-by: mrkaye97 <mrkaye97@gmail.com>

* fix: add dummy runs to priority test to prevent race conditions

* fix: non-breaking field

* fix: gen

* feat: initial pass at docs

* feat: priority in go sdk

* feat: initial work on go example

* fix: doc examples

* fix: proofread

* chore: version

* feat: go sdk

* fix: lint

* fix: declarations and add back RunAsChild

* fix: child workflows

* fix: namespace

* fix: faster child workflows

* fix: sticky

* add back run as child

---------

Co-authored-by: Gabe Ruttner <gabriel.ruttner@gmail.com>
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2025-04-14 16:22:00 -04:00
abelanger5
6813ab1c75 fix: streaming order improvements, go sdk stability (#1536)
* fix: streaming order improvements, go sdk stability

* fix: improve replay query
2025-04-11 13:02:47 -04:00
abelanger5
29a7258e5c fix: match condition writes and retry counts on failure (#1507) 2025-04-08 13:34:33 -04:00
abelanger5
d4e489996c fix: v1 edge cases on concurrency, go SDK, parent outputs (#1497)
* fix: v1 edge cases on concurrency, go SDK, parent outputs

* fix: overflow on queue metrics

* revert changes to DAG

* fix: remove prefix on error for Result method

* cleanup schema, fix migrations

* fix panic edge case
2025-04-07 08:19:13 -04:00
abelanger5
6ab9c70d80 fix: LockSignalCreatedEvents performance (#1476)
* fix: LockSignalCreatedEvents performance, part 2

* fix: query

* fix: LockSignalCreatedEvents

* Revert "fix: LockSignalCreatedEvents"

This reverts commit cce5af242f.

* fix: LockSignalCreatedEvents

* fix: whitespace

* Update run.ts

* Update worker.ts

* revert example

---------

Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com>
2025-04-03 05:55:20 -07:00
Matt Kaye
58d54703b2 Feat: Non-Retryable Exceptions (#1456)
* feat: add boolean flag to proto

* feat: initial wiring up priorities and non-retryables

* fix: query

* fix: cruft comment

* fix: rm priority changes

* feat: python side

* feat: tests for non-retrying workflows

* feat: expand tests

* chore: generate ts

* feat: add name prop to wf

* feat(go-sdk): non retryable error

* feat: start implementing ts

* cleanup: simplify to raising a specific error

* fix: simplify ts

* feat: ts examples

* feat: ver

* feat: docs

* fix: tests + linters

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2025-04-01 15:34:43 -04:00
abelanger5
61e8e95212 fix: improve performance of signal events query (#1468)
* fix: improve performance of signal events query

* fix: run python + ts tests on engine changes

---------

Co-authored-by: mrkaye97 <mrkaye97@gmail.com>
2025-04-01 15:34:31 -04:00
abelanger5
8604f649bf fix(v1): timeout/reassign ambiguous tenant_id reference (#1381)
* fix: timeouts and reassignment error

* fix: alias reference
2025-03-21 11:48:02 -04:00
abelanger5
2333090751 fix: cancellations, failures, and retries edge case (#1377) 2025-03-20 17:27:50 -04:00
abelanger5
21bd707ba6 fix(v1): improved query plans for replay and task outputs, reassignment + timeout tweaks (#1354)
* don't call parent output task when not necessary

* help query planner by refactoring replay task

* fix: use failed task pathway for reassignments and
timeouts
2025-03-17 14:10:32 -04:00
abelanger5
5c647e247e chore(v1): small improvements to replay/parent task lookup (#1346)
* small tweaks to replay/parent task lookup

* some more improvments
2025-03-15 09:15:57 -04:00
abelanger5
7ad251df26 fix: recursive queries should use PKs (#1345) 2025-03-14 13:13:27 -04:00
abelanger5
4cbde4405a fix: more v1 bug bashing (#1334) 2025-03-13 17:13:04 -04:00
abelanger5
ac968e94b8 fix: concurrency issues and a few small improvements (#1324) 2025-03-12 16:30:34 -04:00
abelanger5
1f2096313d feat: v1 engine (#1318) 2025-03-11 14:57:13 -04:00