Commit Graph

368 Commits

Author SHA1 Message Date
abelanger5 6813ab1c75 fix: streaming order improvements, go sdk stability (#1536)
* fix: streaming order improvements, go sdk stability

* fix: improve replay query
2025-04-11 13:02:47 -04:00
abelanger5 29a7258e5c fix: match condition writes and retry counts on failure (#1507) 2025-04-08 13:34:33 -04:00
Gabe Ruttner dc757a36b5 fix: filter scope for bulk ops (#1503)
* fix: scope

* fix: default filter and both filter and ids

* release: 1.1.5
2025-04-07 15:48:54 -07:00
Gabe Ruttner bc72465b65 fix: list and cancel (#1502)
* fix: list

* fix: optional since

* fix: scope

* fix: keep nil ptr

* release: 1.1.3

* Update sdks/typescript/src/v1/client/features/runs.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-07 09:03:32 -04:00
abelanger5 d4e489996c fix: v1 edge cases on concurrency, go SDK, parent outputs (#1497)
* fix: v1 edge cases on concurrency, go SDK, parent outputs

* fix: overflow on queue metrics

* revert changes to DAG

* fix: remove prefix on error for Result method

* cleanup schema, fix migrations

* fix panic edge case
2025-04-07 08:19:13 -04:00
abelanger5 5c985c3f49 fix: set schedule timeout on task level (#1492) 2025-04-03 22:46:33 -04:00
Matt Kaye 58d54703b2 Feat: Non-Retryable Exceptions (#1456)
* feat: add boolean flag to proto

* feat: initial wiring up priorities and non-retryables

* fix: query

* fix: cruft comment

* fix: rm priority changes

* feat: python side

* feat: tests for non-retrying workflows

* feat: expand tests

* chore: generate ts

* feat: add name prop to wf

* feat(go-sdk): non retryable error

* feat: start implementing ts

* cleanup: simplify to raising a specific error

* fix: simplify ts

* feat: ts examples

* feat: ver

* feat: docs

* fix: tests + linters

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2025-04-01 15:34:43 -04:00
abelanger5 b03a8d2666 improve ttl cache on pgmq (#1438)
* improve ttl cache on pgmq

* fix: panic
2025-03-28 09:27:12 -07:00
abelanger5 5eec4a7bea fix: improves error handling and retries in dispatcher (#1441)
* fix: improves error handling and retries in dispatcher

* rm toolchain
2025-03-28 07:41:44 -07:00
abelanger5 024af22cbe small layout improvements (#1437)
* small layout improvements

* fix alerter relative date
2025-03-27 18:30:48 -07:00
abelanger5 9c389ee5e3 fix: get values for checksum after ordering (#1414) 2025-03-26 11:56:22 -07:00
abelanger5 cff2b37a6a fix: checksums on workflow versions (#1410) 2025-03-26 08:00:39 -07:00
Matt Kaye 5062bf1e3e V1 SDKs and Docs (#1361)
New SDKs and docs for the v1 release.
2025-03-25 15:45:07 -07:00
abelanger5 a20ab2de65 fix(v1): add exponential backoff for internal retries (#1399) 2025-03-25 09:14:15 -07:00
abelanger5 c54bf9266c feat(v1): tenant limits (#1388)
* feat(v1): tenant limits

* fix: migration

* fix: kill metered cache
2025-03-23 19:03:55 -07:00
abelanger5 00c4bbff09 feat(v1): new gRPC API endpoints (#1367)
* wip: api contracts

* feat: implement put workflow version endpoint

* add support for match existing data, get scaffolding in place for additional triggers

* create additional matches

* feat: durable sleep, user event matching

* update protos

* fix: working poc of user events, durable sleep

* add migration

* fix: migration column

* feat: durable event listener

* fix: skip overrides

* fix: input -> output
2025-03-23 18:58:20 -07:00
abelanger5 7d1244fc3b fix: limit alerted runs in emails and case on task vs workflow run (#1374) 2025-03-19 23:49:49 +00:00
abelanger5 e91047d7b3 feat: add back tenant alerting to v1 (#1372) 2025-03-19 17:50:42 -04:00
abelanger5 f25c408d5c fix: reassignments consistent with v0 behavior (#1360) 2025-03-18 09:17:31 -04:00
abelanger5 21bd707ba6 fix(v1): improved query plans for replay and task outputs, reassignment + timeout tweaks (#1354)
* don't call parent output task when not necessary

* help query planner by refactoring replay task

* fix: use failed task pathway for reassignments and
timeouts
2025-03-17 14:10:32 -04:00
abelanger5 5c647e247e chore(v1): small improvements to replay/parent task lookup (#1346)
* small tweaks to replay/parent task lookup

* some more improvments
2025-03-15 09:15:57 -04:00
abelanger5 677fe2d328 fix: spawn workflows should handle on failure properly, lite improvements (#1336) 2025-03-13 22:02:03 -04:00
abelanger5 4cbde4405a fix: more v1 bug bashing (#1334) 2025-03-13 17:13:04 -04:00
abelanger5 ac968e94b8 fix: concurrency issues and a few small improvements (#1324) 2025-03-12 16:30:34 -04:00
abelanger5 1f2096313d feat: v1 engine (#1318) 2025-03-11 14:57:13 -04:00
Gabe Ruttner 234b010ff6 fix: hash (#1309) 2025-03-07 07:35:05 -08:00
abelanger5 9d1c40ae1f fix: order DAG steps before inserting (#1268) 2025-02-13 07:33:56 -08:00
Gabe Ruttner 158b56c43b feat: add retry count param (#1236) 2025-01-29 07:19:19 -08:00
Gabe Ruttner c8bcf9ae8c fix: link (#1235) 2025-01-29 06:28:10 -08:00
Gabe Ruttner 0e91542d87 wip: backoff state (#1225)
* wip: backoff state

* fix: retry state and step run start condition

* fix: missing key

* fix: gen

* chore: squash migration

* chore: rm todos

* ops: upgrade proto
2025-01-28 19:16:12 +00:00
Sean Reilly 190f3f984a clean up rabbit mq session stuff, add a quick ack and error processin… (#1197)
* clean up rabbit mq session stuff, add a quick ack and error processing for AddMessage

* bit more paranoid about getting stuck in chans

* first pass at locking the message to deal with the failed states better

* clean up the access to ready for the mq

* make sure we don't block sending this ack
2025-01-23 16:06:02 -08:00
abelanger5 61ae067014 fix: race condition on err in pgmq (#1198) 2025-01-18 16:20:24 +00:00
Matt Kaye 9efd56c7de Feat: Propagate Error Through Context (#1193)
* feat: add query to fetch upstream errors from db

* fix: return many

* feat: propagate errors through `input`

* fix: implement the method to get the errors out

* fix: query cleanup

* feat: rename errors

* fix: col names

* fix: key name in the json

* feat: add method to context to get failed step errors

* fix: add 👀

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

* feat: add error log if not errors

* fix: logger

* fix: simplify query

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2025-01-17 21:49:13 -05:00
Sean Reilly a8dd33c61f Feature - configurable logging backend (#1188)
* allow us to configure different repos

* make the struct contents public

* pass in config values to new log repo

* rename functions - possibly breaking changes so lets discuss

* make the logging backend configurable

* fix tests

* don't allow calls to WithAdditionalConfig

* cleanup

* replace sc with server

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

* rename sc to server

* add a LRU cache for the step run lookup

* lets not use an expirable cache and just use the regular one - we cannot close the go func in exirable

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2025-01-17 15:34:10 -08:00
Gabe Ruttner 49000e5c65 Fix webhook stop healthcheck (#1163)
* fix: concurrent map writes

* fix: cancel healthcheck on move

* fix: cancel healthcheck on move

* revert: remove unneeded check
2025-01-08 09:42:58 -05:00
Gabe Ruttner e92146816f fix: webhook workers on rebalance (#1162)
* fix: log ui

* fix: partition handling and unregister

* fix: concurrent cleanup

* feat: op pool

* fix: run or continue partition id

* fix: return false out of check
2025-01-07 10:54:15 -08:00
Sean Reilly 9e961ac196 Feature add version info (#1154)
* adding a /version endpoint for the engine and a /api/v1/version endpoint for the API

* make the security optional so we don't get redirected for having auth

* lint

* upgrade protoc to the latest available version on brew

* use useQuery and clean up html
2025-01-06 10:50:04 -08:00
abelanger5 a237f90450 fix: circuit breaker for dispatcher reassignment (#1144) 2024-12-20 16:00:23 -05:00
abelanger5 b383ae8047 Improve handling of result size in dispatcher (#1133)
* Improve handling of result size in dispatcher

* small if case

* 3MB as var
2024-12-18 16:56:07 -05:00
abelanger5 23dc410552 fix: make retries with exp backoff atomic, and fix issues related to cancelling states (#1132)
* fix: exp backoff retries and cancelling states

* fix flaky concurrency test
2024-12-18 19:32:08 +00:00
abelanger5 dcb67a1dac feat: postgres-backed message queue (#1119) 2024-12-18 09:00:54 -05:00
abelanger5 c696263d20 fix: don't cancel context on failed sends (#1129) 2024-12-18 02:02:58 +00:00
abelanger5 e12e700980 feat: CANCEL_NEWEST strategy and make cancel in progress more reliable (#1127) 2024-12-18 01:40:14 +00:00
Sean Reilly cbc2526c0b add a monitoring probe (#1108)
* add a monitoring probe

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-12-17 15:55:50 -05:00
Sean Reilly 9943452490 Make round robin enqueueing atomic (#1085) 2024-12-17 15:18:20 -05:00
Sean Reilly e32f353587 Speed up the delete worker query (#1103)
* add an index on lastHeartbeatAt and don't do highly related actions concurrently



---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-12-12 20:49:22 -05:00
abelanger5 94d14336aa feat(go-sdk): blocking worker (#1106) 2024-12-12 20:42:13 -05:00
abelanger5 4c74a62183 refactor(repository): improve usability of repository (#1114)
* refactor(repository): consolidate repository buffers, create pattern for callbacks, consolidate queries

* fix: spelling

* fix: clean up cache
2024-12-11 18:45:02 -05:00
Gabe Ruttner 44ffe1d66c fix: panic (#1105) 2024-12-09 15:50:36 +00:00
abelanger5 1499668df9 fix: duplicate cron expressions only cause a single trigger (#1101) 2024-12-06 16:02:37 -05:00