* feat(throughput): single process per queue
* fix data race
* fix: golint and data race on load test
* wrap up initial v2 scheduler
* fix: more debug logs and tighten channel logic/blocking sends
* improved casing on dispatcher and lease manager
* fix: data race on min id
* increase wait on load test, fix data race
* fix: trylock -> lock
* clean up queue when no longer in set
* fix: clean up cache on exit
* ensure cleanup is only called once
* address review comments
Refactors the queueing logic to be fairly balanced between actions, with each action backed as a separate FIFO queue. Also adds support for priority queueing and custom queues, though those aren't exposed on the API layer yet. Improves throughput to be > 5000 tasks/second on a single queue.
---------
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
* feat: allow extending the api server
* chore: remove internal packages to pkg
* chore: update db_gen.go
* fix: expose auth
* fix: move logger to pkg
* fix: don't generate gitignore for prisma client
* fix: allow extensions to register their own api spec
* feat: expose pool on server config
* fix: nil pointer exception on empty opts
* fix: run.go file
Logic for requeueing and reassigning did not limit the number of step runs to requeue, so when events accumulate with no worker present it causes memory to spike along with a very high query latency on the database. This commit limits the number of step runs returned in the requeue and reassign queries, and also properly locks step run rows for these queries so only a step run in a PENDING or PENDING_ASSIGNMENT state can be requeued.
It also improves performance of the `AssignStepRunToWorker` query and ensures that `maxRuns` on workers are always respected through the introduction of a `WorkerSemaphore` model. The value gets decremented when a step run is assigned and incremented when a step run is in a final state.
Co-authored-by: Luca Steeb <contact@luca-steeb.com>
* Update controller.go
---------
Co-authored-by: steebchen <contact@luca-steeb.com>
This PR adds support for retrying failed step runs against the engine and SDKs. This was tested up to 30 retries per step run, with both failure and success at the 30th step run. Each SDK now has a `retries` configurable param for steps when declaring a workflow.
* feat: dag-style execution
* docs: update to reflect new context
* ensure no cycles
* remove example cycle
* linting
* lint and small fixes
* update deferred rollback
* last rollback handling
* unset max issues
* fix requeue edge case