Commit Graph

329 Commits

Author SHA1 Message Date
abelanger5
aebcf0bb0c fix: boundary conditions on 1-second rate limiters (#1379) 2025-03-20 21:44:08 +00:00
abelanger5
2333090751 fix: cancellations, failures, and retries edge case (#1377) 2025-03-20 17:27:50 -04:00
abelanger5
e91047d7b3 feat: add back tenant alerting to v1 (#1372) 2025-03-19 17:50:42 -04:00
abelanger5
f25c408d5c fix: reassignments consistent with v0 behavior (#1360) 2025-03-18 09:17:31 -04:00
abelanger5
d7812e6847 fix(v1): use inserted_at to join dags, tasks in olap queries (#1358)
* fix: use inserted_at to join dags, tasks in olap queries

* proper olap query
2025-03-17 16:59:45 -04:00
abelanger5
87132bf7ca fix: rate limit queries (#1357) 2025-03-17 15:43:10 -04:00
abelanger5
21bd707ba6 fix(v1): improved query plans for replay and task outputs, reassignment + timeout tweaks (#1354)
* don't call parent output task when not necessary

* help query planner by refactoring replay task

* fix: use failed task pathway for reassignments and
timeouts
2025-03-17 14:10:32 -04:00
Gabe Ruttner
3670b94fc4 Feat v1 UI tweaks (#1344)
* fix: drop uncached loader

* feat: upgrade modal

* add beta

* hacky feature flag

* fix: build

* refetch interval

* 5s

* stop flashing on load

* lint

* fix: map

* fix: last redir

* nil check

* small styling and wording things, change default canUpgrade -> true

* switch link to github discussion

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2025-03-15 09:23:32 -04:00
abelanger5
5c647e247e chore(v1): small improvements to replay/parent task lookup (#1346)
* small tweaks to replay/parent task lookup

* some more improvments
2025-03-15 09:15:57 -04:00
abelanger5
7ad251df26 fix: recursive queries should use PKs (#1345) 2025-03-14 13:13:27 -04:00
abelanger5
48fdc4a7b7 fix: high deadlock counts on rate limits (#1343) 2025-03-14 13:13:17 -04:00
abelanger5
436f0b8699 remove order by in olap query (#1342) 2025-03-14 15:21:18 +00:00
abelanger5
4b8cefc957 fix: replay query (#1337) 2025-03-14 08:54:03 -04:00
abelanger5
677fe2d328 fix: spawn workflows should handle on failure properly, lite improvements (#1336) 2025-03-13 22:02:03 -04:00
Matt Kaye
94192c02c8 fix: dag metadata query (#1333) 2025-03-13 17:13:16 -04:00
abelanger5
4cbde4405a fix: more v1 bug bashing (#1334) 2025-03-13 17:13:04 -04:00
abelanger5
ac968e94b8 fix: concurrency issues and a few small improvements (#1324) 2025-03-12 16:30:34 -04:00
Alexander Belanger
1950a0796d fix: limit subqueries 2025-03-12 13:24:41 -04:00
abelanger5
afd853e223 v1 hotfixes (#1320)
* fix: when grpcInsecure is set to true with no internal client overrides, use TLS strategy=none

* fix: invites
2025-03-11 16:18:07 -04:00
abelanger5
1f2096313d feat: v1 engine (#1318) 2025-03-11 14:57:13 -04:00
Gabe Ruttner
234b010ff6 fix: hash (#1309) 2025-03-07 07:35:05 -08:00
Gabe Ruttner
e518d16e42 fix: stable order (#1308)
* fix: stable order

* fix: query
2025-03-07 06:18:28 -08:00
abelanger5
b8d1a35ea2 fix: don't join on old workflow version id (#1288) 2025-02-25 16:22:41 -05:00
abelanger5
6974066040 fix: scheduled workflows query (#1287)
* fix: scheduled workflows query

* add deletedAt
2025-02-25 15:28:29 -05:00
Gabe Ruttner
2ebd6598f9 fix: schedule latest version (#1263) 2025-02-13 10:35:26 -05:00
abelanger5
9d1c40ae1f fix: order DAG steps before inserting (#1268) 2025-02-13 07:33:56 -08:00
Gabe Ruttner
ed0bf34a1f fix: propegate context (#1264) 2025-02-12 12:17:44 -08:00
Gabe Ruttner
46c4b367b1 fix: propagate retry (#1258) 2025-02-10 10:40:24 -08:00
Gabe Ruttner
30c1a979ac Fix: backoff bugs (#1248)
* fix: cancel backoff state

* fix: two paths for retry

* lint
2025-02-05 06:57:05 -08:00
Gabe Ruttner
5cda236e5d fix: addl meta unmarshal deep obj (#1245)
* fix: addl meta unmarshal

* fix: ci

* fix: ci

* fix: ci
2025-02-04 10:59:36 -08:00
abelanger5
9b30a3c5a3 fix: make extension less memory intensive (#1241) 2025-01-31 10:28:53 -05:00
Gabe Ruttner
3185a6740d Optimization scheduler memory (#1240)
* memory optimizations

* revert mu

* trace

* revert trace

* chore: lint
2025-01-31 09:48:48 -05:00
Gabe Ruttner
ffa0e2782e fix: memory (#1237)
* fix: memory

* fix: rip

* simplify structs

* fix: unassigned
2025-01-29 19:55:30 -05:00
Gabe Ruttner
0e91542d87 wip: backoff state (#1225)
* wip: backoff state

* fix: retry state and step run start condition

* fix: missing key

* fix: gen

* chore: squash migration

* chore: rm todos

* ops: upgrade proto
2025-01-28 19:16:12 +00:00
Gabe Ruttner
13024c09bd Feat canceled state (#1228)
* feat: add cancel state to event list

* ops: db conns

* feat: add cancelled status to wfr

* feat: mark cancelled workflow runs
2025-01-28 10:31:04 -08:00
abelanger5
769fed7d97 feat(go-sdk): adds preset labels on workers for autoscaling (#1195)
* feat(go-sdk): adds preset labels on workers for autoscaling

* fix: env var consistency

---------

Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com>
2025-01-28 14:58:41 +00:00
Matt Kaye
fc9ff0eb05 Feat: Sample Sentries in the Engine (#1209)
* feat: sample sentries in the engine

* set sample rate via env var

* fix: propagate sample rate through config

* fix: bind env
2025-01-23 17:41:41 -05:00
Matt Kaye
0f1cbd98ca Fix: Change which exceptions retry (#1194)
* fix: change retryable exceptions

* fix: add back resource exhausted
2025-01-22 14:54:32 -05:00
abelanger5
b691117b67 fix: increase workflow run pop timeout, fix broken concurrency query (#1189)
* increase pop timeout

* fix: dunno

* fix: improve query

* fix: whitespace

* fix: extend timeout and make pop query pseudo-random
2025-01-21 18:24:39 -05:00
Matt Kaye
9efd56c7de Feat: Propagate Error Through Context (#1193)
* feat: add query to fetch upstream errors from db

* fix: return many

* feat: propagate errors through `input`

* fix: implement the method to get the errors out

* fix: query cleanup

* feat: rename errors

* fix: col names

* fix: key name in the json

* feat: add method to context to get failed step errors

* fix: add 👀

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

* feat: add error log if not errors

* fix: logger

* fix: simplify query

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2025-01-17 21:49:13 -05:00
Sean Reilly
a8dd33c61f Feature - configurable logging backend (#1188)
* allow us to configure different repos

* make the struct contents public

* pass in config values to new log repo

* rename functions - possibly breaking changes so lets discuss

* make the logging backend configurable

* fix tests

* don't allow calls to WithAdditionalConfig

* cleanup

* replace sc with server

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

* rename sc to server

* add a LRU cache for the step run lookup

* lets not use an expirable cache and just use the regular one - we cannot close the go func in exirable

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2025-01-17 15:34:10 -08:00
Sean Reilly
c2248c08ab Fix security headers and emails (#1181)
* add a bunch of default headers

* add a check on the emails so we don't resend if we have a valid invite in future

* lets people invite for a new role

* add in some logging so we have more visibility on what is hapening here

* Add a limit to the number of pending invites a user can have. Add comments for the various headers
2025-01-17 15:06:26 -08:00
abelanger5
75657a109e fix: hard sticky strategy with no desired worker id (#1186) 2025-01-14 09:12:29 -08:00
abelanger5
332ccb77cf fix: don't exit early out of queuer (#1184)
* fix: don't exit early out of queuer

* rm unused file
2025-01-13 19:54:25 -08:00
abelanger5
042411c3e6 fix: run workflows which are scheduled on older versions (#1180) 2025-01-10 17:05:08 -05:00
Sean Reilly
f3bb937a55 fix: two bugs in how concurrency works for Round Robin (#1164)
* fix: make sure we never have more than maxRuns of a workflowRun even with other unfull groups, fix bug where inconsitent ordering of workflow runs allowed extra runs than maxRuns

* compile the comment

* lets error our in the test when we fail
2025-01-10 16:13:46 -05:00
abelanger5
7bf189272a fix: add guard to min id query (#1174) 2025-01-10 09:26:52 -05:00
abelanger5
b867171269 fix: timeout race condition (#1175) 2025-01-10 09:26:38 -05:00
Gabe Ruttner
df75ddb611 fix: fifo (#1173)
* fix: serially try assign

* fix: ensure queue sort
2025-01-09 19:33:41 -05:00
Gabe Ruttner
cc0a8db4fd feat: expose stepId on context (#1169)
* feat: expose stepId on context

* fix: test context
2025-01-08 16:12:25 -05:00