Commit Graph

13 Commits

Author SHA1 Message Date
abelanger5
1f2096313d feat: v1 engine (#1318) 2025-03-11 14:57:13 -04:00
abelanger5
dcb67a1dac feat: postgres-backed message queue (#1119) 2024-12-18 09:00:54 -05:00
abelanger5
67a96d7166 feat(throughput): single process per queue (#956)
* feat(throughput): single process per queue

* fix data race

* fix: golint and data race on load test

* wrap up initial v2 scheduler

* fix: more debug logs and tighten channel logic/blocking sends

* improved casing on dispatcher and lease manager

* fix: data race on min id

* increase wait on load test, fix data race

* fix: trylock -> lock

* clean up queue when no longer in set

* fix: clean up cache on exit

* ensure cleanup is only called once

* address review comments
2024-10-15 11:05:19 -04:00
Gabe Ruttner
4ea4712d4d refactor: performance and throughput (#756)
Refactors the queueing logic to be fairly balanced between actions, with each action backed as a separate FIFO queue. Also adds support for priority queueing and custom queues, though those aren't exposed on the API layer yet. Improves throughput to be > 5000 tasks/second on a single queue. 

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-08-12 14:38:47 +00:00
abelanger5
7c3ddfca32 feat: api server extensions (#614)
* feat: allow extending the api server

* chore: remove internal packages to pkg

* chore: update db_gen.go

* fix: expose auth

* fix: move logger to pkg

* fix: don't generate gitignore for prisma client

* fix: allow extensions to register their own api spec

* feat: expose pool on server config

* fix: nil pointer exception on empty opts

* fix: run.go file
2024-06-19 09:36:13 -04:00
abelanger5
7b7fbe3668 fix: update Requeue and Reassign logic to fix performance degradation when many events are queued (#310)
Logic for requeueing and reassigning did not limit the number of step runs to requeue, so when events accumulate with no worker present it causes memory to spike along with a very high query latency on the database. This commit limits the number of step runs returned in the requeue and reassign queries, and also properly locks step run rows for these queries so only a step run in a PENDING or PENDING_ASSIGNMENT state can be requeued.

It also improves performance of the `AssignStepRunToWorker` query and ensures that `maxRuns` on workers are always respected through the introduction of a `WorkerSemaphore` model. The value gets decremented when a step run is assigned and incremented when a step run is in a final state. 

Co-authored-by: Luca Steeb <contact@luca-steeb.com>

* Update controller.go

---------

Co-authored-by: steebchen <contact@luca-steeb.com>
2024-04-01 12:33:18 -04:00
Luca Steeb
8183dd509a test(rampup): add load ramp up test (#273)
* test(rampup): add load ramp up test

* disable debug logging

* actual implementation

* refactor

* max acceptable schedule

* check for non-executed events

* fixes

* chore: set log level to error in engine tests

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2024-03-31 19:14:30 -04:00
Luca Steeb
9b68115fb5 refactor: cleanup functions in api + worker (#192) 2024-03-02 00:37:02 +07:00
Luca Steeb
ae4841031b feat(engine): standalone tests and engine teardown (#172) 2024-02-28 00:15:25 +07:00
abelanger5
df3f540748 feat: add retries to the engine and SDKs (#171)
This PR adds support for retrying failed step runs against the engine and SDKs. This was tested up to 30 retries per step run, with both failure and success at the 30th step run. Each SDK now has a `retries` configurable param for steps when declaring a workflow.
2024-02-16 13:00:22 -05:00
Luca Steeb
00111d823c test(load): add load tests CLI & e2e tests (#157) 2024-02-16 23:47:34 +07:00
abelanger5
52fde1e704 feat: dag-style execution (#108)
* feat: dag-style execution

* docs: update to reflect new context

* ensure no cycles

* remove example cycle

* linting

* lint and small fixes

* update deferred rollback

* last rollback handling

* unset max issues

* fix requeue edge case
2024-01-16 11:31:24 -05:00
abelanger5
ac0c4e934a fix: rabbitmq concurrent processing (#92) 2024-01-09 21:15:19 -05:00