Commit Graph

186 Commits

Author SHA1 Message Date
abelanger5 840e590312 fix: frontend improvements (#905)
* fix: set time range properly on reload

* fix: small button to show queue counts for now
2024-09-24 19:08:41 -04:00
Gabe Ruttner f98d3277b7 fix: trunc large payloads (#903)
* fix: trunc large payloads

* lets send the stepRuns and steps with output back on the WorkflowRunGet

* fix: times

* fix: rm unsafe

* rename to GetStepRunsForJobRunsWithOutput so we know we might potentially be getting a very large result set

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-09-24 22:52:00 +00:00
Sean Reilly 5811929928 feat: bulk inserts of events (#887)
* progress commit of bulk inserts

* in_flight: Add changes to metering finish the bulk insert

* remove an attempt to overide enforce limits

* merge in PR fixes

* update docs to add in an additional section in the User guide to describe pushing single events and pushing multiple events

* run lint fix

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-09-23 09:19:39 -07:00
abelanger5 baf13bd577 fix: duration int -> bigint (#902) 2024-09-23 08:30:16 -07:00
abelanger5 ad12f658da fix: have refresh timeout use timeout queue item (#898) 2024-09-23 05:41:06 -07:00
abelanger5 0204929b02 fix: concurrency key performance (#894) 2024-09-19 21:28:08 -04:00
abelanger5 d61f37de14 fix: queue properly on internal retry (#891) 2024-09-19 20:11:47 +00:00
Sean Reilly 15c50f46b5 Partial PR - need to generate SDK - Add endpoint to get the total free worker slots for a worker and the … (#857)
* Add endpoint to get the total free worker slots for a worker and the max runs

* update to use WorkerSempahoreCount instead of checking stepRunId

* modify the query for the new table and change the interface

* bump golangci-lint make changes to name of returned data

* revert the simple example

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-09-19 10:11:16 -07:00
abelanger5 d23e5d9963 feat: expression-based concurrency keys (#889)
* feat: expression-based concurrency keys

* fix: build

* fix: typos

* fix: gen

* fix: migration

* fix: remove print statements

* fix: reassignment bugs, retries on closed transport, pr review
2024-09-19 10:32:22 -04:00
Steinway Wu 44d03af852 fix: propagating additional metadata for child workflows (#882)
* fix: propagating additional metadata for child workflows

* add unit test
2024-09-19 13:28:46 +00:00
abelanger5 55eb63d9a4 fix: replay without group keys and status updates (#883) 2024-09-16 16:59:34 -04:00
Gabe Ruttner 2379e3638a fix: reset on replay (#875) 2024-09-16 17:01:51 +00:00
Gabe Ruttner af9ed49f1e fix: events list view (#878)
* fix: filter by event id

* fix: run count

* feat: filter by id api

* feat: filter by Event Id

* chore: default page is runs

* feat: cancel event runs

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-09-16 16:46:31 +00:00
Gabe Ruttner c64c62f66a feat: improved workflow run details page (#821)
* wip: rip prisma

* wip

* wip

* fix: lint

* wip

* wip

* gen

* wip

* wip

* fix trigger

* hide overview

* revert db changes

* feat: wrap up frontend changes and perf

* chore: generate

* chore: frontend build

* fix: workflow transformer

* fix: avoid race conditions on simultaneous parent completions

* fix: 2025 started

* feat: toast for replay/cancel

* fix: toast

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-09-16 15:39:49 +00:00
abelanger5 893637cb0f fix: improve LinkStepRunParents to prevent usage of temp files (#874) 2024-09-12 17:19:58 -04:00
abelanger5 bed2cb559a fix: add back sem slots, without row contention (#868)
* fix: add back sem slots, without row contention

* fix: serialize queue step runs to prevent dirty reads

* remove serializable for now

* statement timeouts on create workflow run

* statement timeout for reassign

* proper migration + cleanup

* remove old tables and code

* fix: worker slot state

* remove last unused table from workers
2024-09-11 20:47:49 +00:00
abelanger5 f4c5cd973e feat: more efficient step run timeouts (#863) 2024-09-10 18:23:11 -04:00
abelanger5 b635c875f6 fix: race conditions on release slot (#858)
* fix: race conditions on release slot

* better engine logs for ci

* fix: improve cancellation

* better debug logs and increase timeout
2024-09-10 14:22:32 -04:00
abelanger5 9efcebe6af fix: better logic for multiple restricted domains (#860) 2024-09-10 12:07:55 -04:00
abelanger5 77ab9460d3 fix: unset statement timeouts for session, print more debug info (#861) 2024-09-10 12:07:47 -04:00
abelanger5 a1324d43db fix: improve status updates and event writes (#855)
* fix: improve status updates and event writes

* chore: better debug line
2024-09-09 18:09:44 +00:00
abelanger5 e59ce100e3 fix: cleanup worker assign events + limit semaphore query (#854) 2024-09-09 11:11:44 -04:00
abelanger5 23daa89ba0 fix: list initial step runs query + better serialized queue operations (#851)
* fix: list initial step runs

* fix: better tenant operation logic
2024-09-08 19:29:11 -04:00
abelanger5 478c897035 fix: proper migration to counts, startable step runs query (#850) 2024-09-08 12:48:51 -04:00
abelanger5 70855ecc6b refactor: bulk step run updates (#847)
* wip: v4 of queue

* fix: correct query for updating counts

* tmp: save migration files

* feat: wrap up initial queue

* fix compilation

* fix: reassigns

* temp: refactor on step runs

* feat: bulk step run updates and processing

* fix: cancellation

* address review comments
2024-09-06 17:11:19 -04:00
abelanger5 891514b461 feat: queue v4 (#842)
* wip: v4 of queue

* fix: correct query for updating counts

* tmp: save migration files

* feat: wrap up initial queue

* fix compilation

* fix: reassigns
2024-09-06 16:12:22 -04:00
abelanger5 7308876776 fix: use separate database pool for queueing, statement timeouts on tx (#839)
* fix: different queue pool and statement timeouts on step runs

* fix: implement prepareTx

* fix: defer rollback properly

* fix: race condition
2024-09-03 21:07:26 +00:00
abelanger5 b5014f6b3d chore: more visibility and debug lines for queues (#836)
* chore: more visibility and debug options for queues

* better debug lines on queue repo

* don't log so much in load test
2024-08-29 14:49:24 -04:00
abelanger5 17b7e84876 fix: delete queue items when no longer used (#831) 2024-08-28 17:12:31 -04:00
abelanger5 263eaf069b feat: pass otel through msgqueue (#802)
* feat: pass otel through msgqueue

* feat: more spans on scheduling

* otel increase batch size
2024-08-28 14:45:02 +00:00
abelanger5 6317f86793 refactor: consolidate partition logic (#826)
* refactor: consolidate partition logic

* fix: race on scheduler

* fix: move partition uuid to db query

* fix: generate
2024-08-27 15:28:53 -04:00
Gabe Ruttner 526e7ef308 feat: expose priority queue (#814)
* feat: workflow default priority

* feat: write priority on run

* feat: propagate to queue

* chore: squash migrations

* chore: generate
2024-08-26 14:11:28 -04:00
Gabe Ruttner ee5d86796f fix: required affinity (#812)
* fix: required affinity

* chore: rm dead code
2024-08-23 15:19:29 -04:00
Gabe Ruttner 53be615d5f Enhancement webhook usability (#807)
* feat: secret copier

* feat: improved form

* fix: quotes

* wip: improved flow

* feat: health check logging

* fix: page design

* fix: hard delete, no upsert

* fix: reset modal state

* fix: empty text

* fix: worker state

* fix: update only token

* fix: dont delete name

* fix: logs component

* fix: sort order

* chore: build

* fix: webhook worker cleanup

* chore: squash migrations

* Update api-contracts/openapi/paths/webhook-worker/webhook-worker.yaml

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

* chore: rename

* fix: wrong query

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2024-08-23 10:09:09 -04:00
abelanger5 dd8a4144cb fix: hard sticky assignment to workers when no desired worker id (#809) 2024-08-23 07:42:52 -04:00
abelanger5 7a3c06884f fix: don't queue cancelled step runs (#805)
* fix: cancelled step runs should not be assigned

* check cancellations before planning

* remove reassign logic from controller
2024-08-22 11:21:49 -04:00
abelanger5 e40d77d9d7 fix: reassignment should reset scheduling timeout (#804) 2024-08-22 10:07:10 -04:00
Gabe Ruttner 9bea55438a Fix webhook healthcheck race (#797)
* fix: race

* fix: partition no rows

* chore: move to workers tab

* feat: redirect empty worker path to all

* chore: add worker type and webhook id

* fix: upsert webhook worker

* fix: update by webhookId

* fix: only stub on create

* feat: url on worker

* chore: migration version

* fix: move

* fix: upsert

* fix: upert

* chore: fix migration

* fix: migrations

* chore: generate
2024-08-21 19:23:24 +00:00
Gabe Ruttner 46956a4153 fix: sort (#803) 2024-08-21 17:55:15 +00:00
Gabe Ruttner 651be542c3 fix: migration state (#800)
* fix: migration state

* almost there...

* fix: hack for constraints

* chore: lint
2024-08-21 17:07:00 +00:00
abelanger5 6b7e6de4c4 fix: use single queue limit (#801) 2024-08-21 12:46:05 -04:00
abelanger5 84f7334a06 feat: add windows for metrics and selector (#794)
* feat: add windows for metrics and selector

* better placeholder
2024-08-20 15:29:32 +00:00
abelanger5 67357cfa64 fix: add correct indexes for get group key runs, improve queries (#786)
* fix: add correct indexes for get group key runs

* chore: generate

* fix: hash
2024-08-20 09:31:30 -04:00
Gabe Ruttner 31e5b70441 fix: filter on joins (#790) 2024-08-19 13:21:31 -04:00
abelanger5 dd50515aeb fix: cancelling status on frontend (#779)
* fix: cancelling status on frontend

* fix: slot grid and dedupe on slots
2024-08-13 17:01:26 +00:00
Gabe Ruttner 4ea4712d4d refactor: performance and throughput (#756)
Refactors the queueing logic to be fairly balanced between actions, with each action backed as a separate FIFO queue. Also adds support for priority queueing and custom queues, though those aren't exposed on the API layer yet. Improves throughput to be > 5000 tasks/second on a single queue. 

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-08-12 14:38:47 +00:00
Viktor Szépe 0948598749 Fix typos (#775) 2024-08-10 10:58:33 +00:00
abelanger5 652f604873 fix: add max msg size as env var (#759) 2024-08-01 09:25:05 -04:00
Gabe Ruttner b4670af138 Fix qos otel config (#754)
* feat: otel trace id ratio

* feat: rabbitmq qos

* feat: requeue limit

* fix: tests
2024-07-30 18:11:10 -04:00
Gabe Ruttner b802f9f45f feat: stream by addl meta (#751)
* feat: prop schedule and run

* wip

* fix: filter wfrid

* feat: hangup

* chore: rm debug log

* chore: func name

* fix: cancelled payload

* fix: load

* fix: cleanup the cahce

* fix: single proto

* fix: key -> val

* chore: case

* chore: rm dead code

* chore: rm dead code

* feat: go and docs

* fix: docs
2024-07-29 19:09:51 +00:00