* feat: runtime signature
* feat: add sdk runtime to worker model
* feat: post runtime
* feat: expose sdk version on worker
* feat: go inf
* chore: gen
* chore: migrations and generation
* fix: simpler runtime
* feat: hatchet sdk ver
* fix: rm debug line
* add a serial write for step run events
* update other problematic queries
* tmp: don't upsert queue
* add SerialBuffer to the config
* revert the change to config
* fix: add back queue upsert
* add statement timeout to upsert queue
---------
Co-authored-by: Sean Reilly <sean@hatchet.run>
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
- Simplifies architecture for splitting engine services into different components. The three supported services are now `grpc-api`, `scheduler`, and `controllers`. The `grpc-api` service is the only one which needs to be exposed for workers. The other two can run as unexposed services.
- Fixes a set of bugs and race conditions in the `v2` scheduler
- Adds a `lastActive` time to the `Queue` table and includes a migration which sets this `lastActive` time for the most recent 24 hours of queues. Effectively this means that the max scheduling time in a queue is 24 hours.
- Rewrites the `ListWorkflowsForEvent` query to improve performance and select far fewer rows.
* rejig the query for creating multiple sticky states
* fix: sticky strategy of soft and improve query
* fix: sort method was using indexes that didn't necessarilly correspond to original indexes, leading to inconsistent behavior
---------
Co-authored-by: Sean Reilly <sean@hatchet.run>
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
* make it so the bulk example succeeds
* make the bulk workflows work a little harder
* add some ordering to mitigate deadlocks
* fix: link step run parents bad query, improvements to locking
* add timed mutex and telemetry
* remove for update on cancel
---------
Co-authored-by: Sean Reilly <sean@hatchet.run>
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
* feat(throughput): single process per queue
* fix data race
* fix: golint and data race on load test
* wrap up initial v2 scheduler
* fix: more debug logs and tighten channel logic/blocking sends
* improved casing on dispatcher and lease manager
* fix: data race on min id
* increase wait on load test, fix data race
* fix: trylock -> lock
* clean up queue when no longer in set
* fix: clean up cache on exit
* ensure cleanup is only called once
* address review comments
* (wip) handle step run updates without deferred updates
* refactor: buffered writes of step run statuses
* fix: add more safety on tenant pools
* add configurable flush period, remove wait for started
* flush immediately if last flush time plus flush period is in the past
* feat: add configurable flush internal/max items
* feat: add callbacks for workflow run completed
* add tenant id to resolve row
* add finishedBefore, finishedAfter to workflow runs query
* add more callbacks
* feat: tenant ids and loggers in callback
* feat: workflow run metrics frontend
* fix: frontend build
* fix: trunc large payloads
* lets send the stepRuns and steps with output back on the WorkflowRunGet
* fix: times
* fix: rm unsafe
* rename to GetStepRunsForJobRunsWithOutput so we know we might potentially be getting a very large result set
---------
Co-authored-by: Sean Reilly <sean@hatchet.run>
* progress commit of bulk inserts
* in_flight: Add changes to metering finish the bulk insert
* remove an attempt to overide enforce limits
* merge in PR fixes
* update docs to add in an additional section in the User guide to describe pushing single events and pushing multiple events
* run lint fix
---------
Co-authored-by: Sean Reilly <sean@hatchet.run>
* Add endpoint to get the total free worker slots for a worker and the max runs
* update to use WorkerSempahoreCount instead of checking stepRunId
* modify the query for the new table and change the interface
* bump golangci-lint make changes to name of returned data
* revert the simple example
---------
Co-authored-by: Sean Reilly <sean@hatchet.run>
* fix: filter by event id
* fix: run count
* feat: filter by id api
* feat: filter by Event Id
* chore: default page is runs
* feat: cancel event runs
---------
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
* fix: add back sem slots, without row contention
* fix: serialize queue step runs to prevent dirty reads
* remove serializable for now
* statement timeouts on create workflow run
* statement timeout for reassign
* proper migration + cleanup
* remove old tables and code
* fix: worker slot state
* remove last unused table from workers