Commit Graph

31 Commits

Author SHA1 Message Date
abelanger5
b8352bcaca config: allow buffer settings to be configurable (#1649) 2025-05-01 07:13:30 -04:00
abelanger5
2c1f1f4808 test: improve Go testing harness (#1631)
* test: improves testing harness for engine

* update CI test

* fix: race condition in test

* make tests more stable

* cleanup pub and sub buffers

* fix: goleak on rampup test

* feat: matrix tests for engine
2025-04-29 10:55:16 -04:00
abelanger5
ef6668a8c3 fix: go signature and docs (#1561)
* fix: go signature and docs

* Update examples/v1/workflows/concurrency-rr.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-04-15 17:31:48 -04:00
abelanger5
6813ab1c75 fix: streaming order improvements, go sdk stability (#1536)
* fix: streaming order improvements, go sdk stability

* fix: improve replay query
2025-04-11 13:02:47 -04:00
abelanger5
b03a8d2666 improve ttl cache on pgmq (#1438)
* improve ttl cache on pgmq

* fix: panic
2025-03-28 09:27:12 -07:00
abelanger5
a20ab2de65 fix(v1): add exponential backoff for internal retries (#1399) 2025-03-25 09:14:15 -07:00
abelanger5
ac968e94b8 fix: concurrency issues and a few small improvements (#1324) 2025-03-12 16:30:34 -04:00
abelanger5
1f2096313d feat: v1 engine (#1318) 2025-03-11 14:57:13 -04:00
Sean Reilly
190f3f984a clean up rabbit mq session stuff, add a quick ack and error processin… (#1197)
* clean up rabbit mq session stuff, add a quick ack and error processing for AddMessage

* bit more paranoid about getting stuck in chans

* first pass at locking the message to deal with the failed states better

* clean up the access to ready for the mq

* make sure we don't block sending this ack
2025-01-23 16:06:02 -08:00
abelanger5
61ae067014 fix: race condition on err in pgmq (#1198) 2025-01-18 16:20:24 +00:00
abelanger5
dcb67a1dac feat: postgres-backed message queue (#1119) 2024-12-18 09:00:54 -05:00
abelanger5
2cdee59aea refactor: optimize v0.50.0 release (#975)
- Simplifies architecture for splitting engine services into different components. The three supported services are now `grpc-api`, `scheduler`, and `controllers`. The `grpc-api` service is the only one which needs to be exposed for workers. The other two can run as unexposed services.
- Fixes a set of bugs and race conditions in the `v2` scheduler
- Adds a `lastActive` time to the `Queue` table and includes a migration which sets this `lastActive` time for the most recent 24 hours of queues. Effectively this means that the max scheduling time in a queue is 24 hours. 
- Rewrites the `ListWorkflowsForEvent` query to improve performance and select far fewer rows.
2024-10-23 12:05:16 +00:00
abelanger5
95558138a4 chore: improve throughput, remove deadlocks (#949)
* add otel to pub

* temporarily remove tenant id exchange

* fix: increase internal queue throughput

* fix: remove potential deadlocking

* rollback hash factor multiplier

* fix: batch update issues

* fix: rm unneeded locks

* move disable tenant pubsub to an env var

---------

Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com>
2024-10-10 08:54:34 -04:00
abelanger5
8939c94f63 fix: send fewer messages to job queue when it's not necessary (#932)
* handle started at differently

* fix: start job runs in workflows controller

* fix: keep job runs around for backwards compat
2024-10-03 07:39:06 -04:00
abelanger5
c3fa2c57f3 fix: don't need acks on queue checks (#926) 2024-10-02 00:52:02 +00:00
abelanger5
5f5e1e8a88 refactor: use shared tenant listener for messages (#911)
* refactor: use shared tenant listener per tenant exchange

* fix: remove subscription properly
2024-09-26 14:54:11 -04:00
abelanger5
9d69e4d192 fix: use read-only message queue (#897)
* fix: use read-only message queue

* set very high qos for read-heavy queue
2024-09-24 18:30:43 -04:00
abelanger5
0204929b02 fix: concurrency key performance (#894) 2024-09-19 21:28:08 -04:00
abelanger5
263eaf069b feat: pass otel through msgqueue (#802)
* feat: pass otel through msgqueue

* feat: more spans on scheduling

* otel increase batch size
2024-08-28 14:45:02 +00:00
Gabe Ruttner
4ea4712d4d refactor: performance and throughput (#756)
Refactors the queueing logic to be fairly balanced between actions, with each action backed as a separate FIFO queue. Also adds support for priority queueing and custom queues, though those aren't exposed on the API layer yet. Improves throughput to be > 5000 tasks/second on a single queue. 

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-08-12 14:38:47 +00:00
Viktor Szépe
0948598749 Fix typos (#775) 2024-08-10 10:58:33 +00:00
Gabe Ruttner
b4670af138 Fix qos otel config (#754)
* feat: otel trace id ratio

* feat: rabbitmq qos

* feat: requeue limit

* fix: tests
2024-07-30 18:11:10 -04:00
abelanger5
5538196169 fix: correct lengths on random.Generate (#638) 2024-06-25 15:12:59 -04:00
Luca Steeb
b6dcb4e7e9 refactor(random): refactor random string generation (#633) 2024-06-24 23:44:03 +01:00
abelanger5
7c3ddfca32 feat: api server extensions (#614)
* feat: allow extending the api server

* chore: remove internal packages to pkg

* chore: update db_gen.go

* fix: expose auth

* fix: move logger to pkg

* fix: don't generate gitignore for prisma client

* fix: allow extensions to register their own api spec

* feat: expose pool on server config

* fix: nil pointer exception on empty opts

* fix: run.go file
2024-06-19 09:36:13 -04:00
abelanger5
b0b2e26952 feat: hatchet-lite (#560)
* feat: hatchet-lite mvp

* fix: init shadow db

* fix: install atlas

* fix: correct env

* fix: wait for db ready

* fix: remove name flag

* fix: add hatchet-lite to build
2024-06-06 14:03:53 -04:00
abelanger5
ff90533458 fix: only close rabbitmq channels if they are open (#402) 2024-04-22 05:35:30 -04:00
abelanger5
347bc5dd53 feat: rabbitmq connection pooling (#387)
* feat: add rabbitmq connection pool and remove non-fatal worker errors

* chore: go mod tidy

* fix: release pool after opening channel

* fix: make sure channel is closed after all tasks return on subscribe

* fix: don't loop endlessly
2024-04-16 16:45:03 -04:00
abelanger5
08f0864046 fix: retry rabbitmq connections properly and retry published messages (#369) 2024-04-10 15:48:06 -04:00
abelanger5
7b7fbe3668 fix: update Requeue and Reassign logic to fix performance degradation when many events are queued (#310)
Logic for requeueing and reassigning did not limit the number of step runs to requeue, so when events accumulate with no worker present it causes memory to spike along with a very high query latency on the database. This commit limits the number of step runs returned in the requeue and reassign queries, and also properly locks step run rows for these queries so only a step run in a PENDING or PENDING_ASSIGNMENT state can be requeued.

It also improves performance of the `AssignStepRunToWorker` query and ensures that `maxRuns` on workers are always respected through the introduction of a `WorkerSemaphore` model. The value gets decremented when a step run is assigned and incremented when a step run is in a final state. 

Co-authored-by: Luca Steeb <contact@luca-steeb.com>

* Update controller.go

---------

Co-authored-by: steebchen <contact@luca-steeb.com>
2024-04-01 12:33:18 -04:00
abelanger5
c66f97c856 fix: deadlocks on workers and tickers (#241)
* chore: add sentry support to engine

* fix: deadlocks on workers and tickers

* refactor: reduce prisma calls in engine

* trigger

* fix: remove some tenant lookups

* feat: dlx and renamed taskqueue -> msgqueue

* refactor: get group key run logic

* fix: retry counts on messages and concurrency edge cases

* fix: rabbitmq integration tests

* feat: add consumer timeouts

---------

Co-authored-by: Luca Steeb <contact@luca-steeb.com>
2024-03-12 00:45:18 -04:00