Commit Graph

56 Commits

Author SHA1 Message Date
Gabe Ruttner 4ea4712d4d refactor: performance and throughput (#756)
Refactors the queueing logic to be fairly balanced between actions, with each action backed as a separate FIFO queue. Also adds support for priority queueing and custom queues, though those aren't exposed on the API layer yet. Improves throughput to be > 5000 tasks/second on a single queue. 

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-08-12 14:38:47 +00:00
Gabe Ruttner b802f9f45f feat: stream by addl meta (#751)
* feat: prop schedule and run

* wip

* fix: filter wfrid

* feat: hangup

* chore: rm debug log

* chore: func name

* fix: cancelled payload

* fix: load

* fix: cleanup the cahce

* fix: single proto

* fix: key -> val

* chore: case

* chore: rm dead code

* chore: rm dead code

* feat: go and docs

* fix: docs
2024-07-29 19:09:51 +00:00
abelanger5 a245151d91 feat: add workflow kind to workflow versions (#750)
* feat: support workflow kinds

* chore: generate
2024-07-29 12:07:34 -07:00
Gabe Ruttner fd947cb5bc feat: go worker assignment (#741)
* feat: create worker with label

* feat: worker context

* feat: dynamic labels

* feat: affinity

* fix: ptr

* fix: nil labels

* feat: sticky dag

* feat: sticky docs

* feat: sticky children

* chore: lint

* fix: tests

* fix: possibly nil workerId

* chore: cleanup unneeded pointers
2024-07-26 10:19:11 -07:00
Luca Steeb a51681ddc6 fix(webhooks): use PUT for healthcheck (#644) 2024-06-26 10:39:11 -04:00
Luca Steeb 1490d88954 feat: webhook workers (#542)
Adds serverless support via the concept of webhook workers. Allows any webhook to be registered as a serverless endpoint for executing a step.
2024-06-25 17:06:43 -04:00
abelanger5 7c3ddfca32 feat: api server extensions (#614)
* feat: allow extending the api server

* chore: remove internal packages to pkg

* chore: update db_gen.go

* fix: expose auth

* fix: move logger to pkg

* fix: don't generate gitignore for prisma client

* fix: allow extensions to register their own api spec

* feat: expose pool on server config

* fix: nil pointer exception on empty opts

* fix: run.go file
2024-06-19 09:36:13 -04:00
Gabe Ruttner b728616161 feat: improve reassign and timeout behavior and visibility (#484)
* feat: create step run event

* fix: shorten reassign heartbeat

* feat: add reassign event

* feat: fail timeout instead of cancel

* chore: squash migration

* chore: clarify copy

* docs: improve timeouts doc

* chore: linting

* chore: generate

* fix: test

* fix: send cancellation signal on timeout failure

* fix: rm retry check

* chore: update migration for release

---------

Co-authored-by: Alexander Belanger <belanger@sas.upenn.edu>
2024-05-14 16:47:00 -04:00
Gabe Ruttner 0586af8e8c Feat add rate limit durations (#466)
* feat: add day, week, month, year durations

* feat: add options to client
2024-05-08 19:35:39 -04:00
Gabe Ruttner c8b59d1cc1 Feat add additional meta to trigger (#458)
* feat: add additional metadata to sdk

* docs: metadata

* fix: build

* feat: filter gif
2024-05-08 10:07:28 -04:00
Gabe Ruttner fa07400159 feat: event and workflow run metadata (#446)
Adds additional user-defined metadata to events and workflow runs.
2024-05-06 17:10:33 -04:00
abelanger5 7543a0c2a5 add jobs which always run on failure (#445)
* (wip) prisma schema

* feat: on-failure steps

* chore: address changes from PR review

* chore: bump migration number
2024-05-06 15:39:22 -04:00
abelanger5 4ce1dd8632 feat: multi-workflow runs listener on a single endpoint
* new api-contract for workflow run events

* feat: initial implementation for new subscribe listener

* fix: sync issues and send workflow runs immediately

* refactor: add context to all engine db queries, fix deadlocking query

* fix: use new ctx for deleting dispatcher and ticker

* add cancellation reasons

* fix: docs linting

---------

Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com>
2024-04-18 20:55:11 -04:00
Gabe Ruttner f43f32283c feat(py/go): namespaces (#354)
* feat: namespaced python

* wip: namespaced go

* fix: service name

* fix: tests

* feat: client WithNamespace

* feat: namespace example

* feat: namespaced event triggers

* docs: namespace docs

* chore: linting

---------

Co-authored-by: gabriel ruttner <gabe@hatchet.run>
2024-04-15 11:26:05 -07:00
Alexander Belanger 75172c4a05 fix: retries cause semaphore to go to zero 2024-04-15 13:50:16 -04:00
Luca Steeb 2f7483a3be test(timeout): add test for timeout (#344) 2024-04-11 00:50:24 +07:00
Luca Steeb 32aebd97b4 test(cancellation): add test for cancelled run (#339) 2024-04-10 23:58:18 +07:00
abelanger5 d6004bbe0c fix: stale queries in update semaphore (#333)
* fix: stale queries in update semaphore

* fix: set correct alias for semaphore update
2024-04-04 12:58:32 -04:00
abelanger5 066b3c5b71 feat(engine): initial rate-limiting engine implementation (#324)
* feat(engine): initial rate-limiting engine implementation

* fixes and implement go sdk rate limiting
2024-04-02 10:53:03 -04:00
Gabe Ruttner d8b6843dec feat: streaming events (#309)
* feat: add stream event model

* docs: how to work with db models

* feat: put stream event

* chore: rm comments

* feat: add stream resource type

* feat: enqueue stream event

* fix: contracts

* feat: protos

* chore: set properties correctly for typing

* fix: stream example

* chore: rm old example

* fix: async on

* fix: bytea type

* fix: worker

* feat: put stream data

* feat: stream type

* fix: correct queue

* feat: streaming payloads

* fix: cleanup

* fix: validation

* feat: example file streaming

* chore: rm unused query

* fix: tenant check and read only consumer

* fix: check tenant-steprun relation

* Update prisma/schema.prisma

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>

* chore: generate protos

* chore: rename migration

* release: 0.20.0

* feat(go-sdk): implement streaming in go

---------

Co-authored-by: gabriel ruttner <gabe@hatchet.run>
Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2024-04-01 15:46:21 -04:00
abelanger5 7b7fbe3668 fix: update Requeue and Reassign logic to fix performance degradation when many events are queued (#310)
Logic for requeueing and reassigning did not limit the number of step runs to requeue, so when events accumulate with no worker present it causes memory to spike along with a very high query latency on the database. This commit limits the number of step runs returned in the requeue and reassign queries, and also properly locks step run rows for these queries so only a step run in a PENDING or PENDING_ASSIGNMENT state can be requeued.

It also improves performance of the `AssignStepRunToWorker` query and ensures that `maxRuns` on workers are always respected through the introduction of a `WorkerSemaphore` model. The value gets decremented when a step run is assigned and incremented when a step run is in a final state. 

Co-authored-by: Luca Steeb <contact@luca-steeb.com>

* Update controller.go

---------

Co-authored-by: steebchen <contact@luca-steeb.com>
2024-04-01 12:33:18 -04:00
Luca Steeb 8183dd509a test(rampup): add load ramp up test (#273)
* test(rampup): add load ramp up test

* disable debug logging

* actual implementation

* refactor

* max acceptable schedule

* check for non-executed events

* fixes

* chore: set log level to error in engine tests

---------

Co-authored-by: abelanger5 <belanger@sas.upenn.edu>
2024-03-31 19:14:30 -04:00
abelanger5 77e5d2b77c feat(go-sdk): spawnWorkflow method and get up to speed with other sdks (#297)
* feat(go-sdk): spawnWorkflow method and get up to speed with other sdks

* fix: manual trigger example

* fix: linting errors

* fix: double serialization from go sdk

* fix: spawn workflow logic and procedural example

* test(e2e): add procedural test

* fix: panic in e2e test

* fix: e2e test preparation

* fix: api server url in test.yml

* fix: load test server url

* chore: make num children configurable

* address pr review
2024-03-29 14:07:39 -07:00
abelanger5 092f54c64f refactor: separate api and engine repositories, change ticker logic (#281)
* refactor: separate api and engine repositories, change ticker logic

* fix: nil error blocks

* fix: run migration on load test

* fix: generate db package in load test

* fix: test.yml

* fix: add pnpm to load test

* fix: don't lock CTEs with columns that don't get updated

* fix: update heartbeat for worker every 4 seconds, not 5

* chore: remove dead code

* chore: update python sdk

* chore: add back telemetry attributes
2024-03-21 14:10:34 -04:00
abelanger5 65224753c1 fix(go-sdk): support tls strategy of none, with docs (#269)
* fix(go-sdk): support tls strategy of none, with docs

* chore: errorf -> sprintf in examples

* Apply suggestions from code review

Co-authored-by: Luca Steeb <contact@luca-steeb.com>

* fix: remove time from example

---------

Co-authored-by: Luca Steeb <contact@luca-steeb.com>
2024-03-18 14:02:53 -04:00
Luca Steeb 713b8c95c6 fix: eliminate remaining race conditions (#220) 2024-03-02 23:47:50 +07:00
Luca Steeb 9b68115fb5 refactor: cleanup functions in api + worker (#192) 2024-03-02 00:37:02 +07:00
Luca Steeb ae4841031b feat(engine): standalone tests and engine teardown (#172) 2024-02-28 00:15:25 +07:00
abelanger5 6ea38a99f2 feat: support maxRuns parameter on workers (#195)
* feat: round robin queueing

* feat: configurable max runs per worker

* fix: address PR review

* docs for max runs and group round robin
2024-02-26 00:48:46 -05:00
abelanger5 2d625fec81 feat: round robin queueing (#194) 2024-02-26 00:16:40 -05:00
abelanger5 df3f540748 feat: add retries to the engine and SDKs (#171)
This PR adds support for retrying failed step runs against the engine and SDKs. This was tested up to 30 retries per step run, with both failure and success at the 30th step run. Each SDK now has a `retries` configurable param for steps when declaring a workflow.
2024-02-16 13:00:22 -05:00
Luca Steeb 00111d823c test(load): add load tests CLI & e2e tests (#157) 2024-02-16 23:47:34 +07:00
abelanger5 c2ea09f375 feat: step reruns from the dashboard (#143) 2024-02-03 01:26:09 -05:00
abelanger5 82d7995343 feat: manual triggers and give clients a hook into step run events (#141)
* feat: pubsub for clients, more qol stuff

* fix: generate sqlc files

* chore: linting and comments
2024-02-02 12:52:34 -05:00
abelanger5 aed11c3958 feat: workflow visualization and qol improvements (#140)
* feat: workflow visualization and qol improvements

* fix: npm build
2024-02-02 01:35:05 -05:00
abelanger5 d63b66a837 feat: concurrency groups (#135)
* first pass at moving controllers around

* feat: concurrency limits for strategy CANCEL_IN_PROGRESS

* fix: linting

* chore: bump python sdk version
2024-01-30 00:00:28 -05:00
abelanger5 78685d0098 feat(security): multiple encryption options, API tokens, easier setup (#125)
* (wip) encryption

* feat: api tokens

* chore: add api token generation command

* fix: e2e tests

* chore: set timeout for e2e job

* fix: e2e tests, remove client-side certs

* chore: address PR review comments

* fix: token tests

* chore: address review comments and fix tests
2024-01-26 15:38:36 -05:00
Luca Steeb 8b379ee9d1 feat(events): add workflow filter (#114)
* feat(events): add workflow filter

* cast to uuid

---------

Co-authored-by: Alexander Belanger <belanger@sas.upenn.edu>
2024-01-21 22:33:58 -05:00
abelanger5 52fde1e704 feat: dag-style execution (#108)
* feat: dag-style execution

* docs: update to reflect new context

* ensure no cycles

* remove example cycle

* linting

* lint and small fixes

* update deferred rollback

* last rollback handling

* unset max issues

* fix requeue edge case
2024-01-16 11:31:24 -05:00
abelanger5 752d5b0ab7 feat: support passing inputs to scheduled workflows (#104)
* fix: usage of RegisterAction

* make registered actions callable

* chore: update yaml example

* docs: register action documented

* feat: support input to scheduled workflows

* add worker

* add client

* docs: add input to schedule workflow

* chore: generate with updated protoc

* chore: sqlc generate
2024-01-12 00:27:34 -05:00
abelanger5 1f7baacb94 Fix usage of RegisterAction and update docs (#103)
* fix: usage of RegisterAction

* make registered actions callable

* chore: update yaml example

* docs: register action documented
2024-01-11 18:42:20 -05:00
Luca Steeb 6d8c7ab073 feat(test): introduce basic e2e tests (#97) 2024-01-11 13:36:15 -05:00
abelanger5 7011416cb7 fix: simple example (#95) 2024-01-10 11:05:54 -05:00
abelanger5 ac0c4e934a fix: rabbitmq concurrent processing (#92) 2024-01-09 21:15:19 -05:00
abelanger5 fe76c724d1 feat: add ticker reassignment to engine (#86) 2024-01-08 14:11:30 -05:00
abelanger5 62445dc37f feat: support one-time scheduled workflows (#84)
* feat: support one-time scheduled workflows

* refactor: move schedule out of workflow trigger def

* docs: add scheduling workflows section

* docs: update creating workflow

* only cancel schedules that are in the future
2024-01-08 10:03:32 -05:00
abelanger5 7d1cf5400f feat: add middleware to worker and services (#82)
* feat: add middleware support

* docs: add middleware to docs

* chore: remove commented code

* address review comments
2024-01-05 08:35:19 -05:00
abelanger5 65c165caf6 fix: slack example and rendering bug (#79) 2024-01-04 08:28:49 -05:00
abelanger5 ff30027ae7 feat: update go-sdk workflow syntax to make it less verbose (#78)
* feat: update go-sdk workflow syntax to make it less verbose

* docs: make triggers and definitions more clear
2024-01-03 17:14:12 -05:00
abelanger5 76d38d1af9 fix: allow rendering step runs with - in name (fixes #69) (#74)
* fix: allow rendering step runs with - in name (fixes #69)

* remove debug lines

* include worker fix
2024-01-03 14:48:18 -05:00