hatchet

mirror of https://github.com/hatchet-dev/hatchet.git synced 2025-12-21 08:40:10 -06:00

Author	SHA1	Message	Date
abelanger5	9dabe7d902	feat: dlq for dispatcher queues (#2600 ) * feat: dlq for dispatcher queues * reduce dispatcher message ttl to 20 seconds * rename dispatcher queue for clarity * add error logs when dead lettering * address comment	2025-12-04 14:19:01 -05:00
abelanger5	3f5c243325	fix: move check for large payloads to after json.Marshal (#2594 )	2025-12-02 11:45:37 -05:00
abelanger5	d906a441d4	fix: ensure that slow worker doesn't interrupt dispatcher, guard large RabbitMQ pubs (#2591 ) * ensure that slow worker doesn't interrupt dispatcher * fix: large payload pub issues * add comments * fix: review comments	2025-12-02 09:54:54 -05:00
Mohammed Nafees	54701e87d0	Retry RMQ messages indefinitely with aggressive logging after 5 retries (#2448 ) * aggressively log errors when rmq retry more than 5 times * revisit comments * new vars and fix integration test * fix test * log only after 5 retries	2025-10-28 16:51:39 +01:00
Mohammed Nafees	e2b1f1353e	Fix OTel span attribute naming convention (#2409 ) * rename spans according to convention * low cardinality	2025-10-16 18:43:40 +02:00
matt	d677cb2b08	feat: gzip compression for large payloads, persistent OLAP writes (#2368 ) * debug: remove event pub * add additional spans to publish message * debug: don't publish payloads * fix: persistent messages on olap * add back other payloads * remove pub buffers temporarily * fix: correct queue * hacky partitioning * add back pub buffers to scheduler * don't send no worker events * add attributes for queue name and message id to publish * add back pub buffers to grpc api * remove pubs again, no worker writes though * task processing queue hashes * remove payloads again * gzip compression over 5kb * add back task controller payloads * add back no worker requeueing event, with expirable lru cache * add back pub buffers * remove hash partitioned queues * small fixes * ignore lru cache top fn * config vars for compression, disable by default --------- Co-authored-by: Alexander Belanger <alexander@hatchet.run>	2025-10-08 11:44:04 -04:00
Mohammed Nafees	ed40a82dbb	Include `tenant_id` in OTel spans wherever possible (#2382 )	2025-10-03 18:16:16 +02:00
abelanger5	2edeeb10ea	feat: max channels for rabbitmq (#2365 ) * feat: max conns for rabbitmq * rename conns -> chans	2025-09-30 08:49:45 -04:00
abelanger5	733feedbff	fix: use separate connections for pub and sub (#2358 ) * use separate connections for pub and sub * Update internal/msgqueue/v1/rabbitmq/rabbitmq.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-09-29 14:29:45 -04:00
matt	025f42af74	Debug: Error log if we send >10mb message over the internal queue (#2345 ) * fix: send error log if we try to send message > 10mb * feat: add some span attributes * fix: span attribute names * fix: cleanup * fix: add message id	2025-09-25 18:15:35 -04:00
matt	92843bb277	Feat: Payload Store Repository (#2047 ) * feat: add table for storing payloads * feat: add payload type enum * feat: gen sqlc * feat: initial sql impl * feat: add payload store repo to shared * feat: add overwrite * fix: impl * feat: bulk op * feat: initial wiring of inputs for task triggers * feat: wire up dag matches * feat: create V1TaskWithPayload and use it everywhere * fix: couple bugs * fix: clean up types * fix: overwrite * fix: rm input from replay * fix: move payload store to shared repo * fix: schema * refactor: repo setup * refactor: repos * fix: gen * chore: lint * fix: rename * feat: naming, write dag inputs * fix: more naming, trigger bug * fix: dual writes for now * fix: pass in tx * feat: initial work on offloader * feat: improve external offloader * fix: some refs * add withExternalHandler * fix: improve impl of external store * feat: implement offloading, fix other impls * feat: add query to update JSON * fix: implement offloading + updating records in payloads table * feat: add WAL table * feat: add queries for polling WAL and evicting * feat: wire up writes into WAL * fix: get job working * refactor: improve types * fix: infinite loop * feat: improve offloading logic to run in two separate txes * refactor: rework how overrides work * fix: lint * fix: migration number * fix: migration * fix: migration version * fix: revert back to reading payloads out * fix: fall back to previous input, part i * fix: input fallback * fix: add back input to replay * fix: input fallback in dispatcher * fix: nil check * feat: advisory locks, part i * fix: no skip locked * feat: hash partitioned wal table * fix: modify queries a bit, tweak crud enum * fix: pk order, function to find tenants * feat: wal processing * fix: only write wal if an external store is enabled, fix offloading logic * fix: spacing * feat: schema cleanup * fix: rm external store loc name * fix: set content to null when offloading * fix: cleanup, naming * fix: pass overwrite payload store along * debug: add some logging * Revert "debug: add some logging" This reverts commit `43e71eadf1`. * fix: typo * fx: add offloatAt to store opts for offloading * fix: handle leasing with advisory lock * fix: struct def * fix: requeue on payloads not found * fix: rm hack for triggers * fix: revert empty input on write * fix: write input * feat: env var for enabling / disabling dual writes * feat: wire up dual writes * fix: comments * feat: generics! * fix: panic from type cast * fix: migration * fix: generic * fix: hack for T key in map * fix: cleanup	2025-09-12 09:53:01 -04:00
Mohammed Nafees	89e6d00a8f	Add telemetry around task statuses in controller (#2090 ) * add telemetry around task statuses in controller * fixes * more fixes	2025-08-06 08:41:54 -04:00
abelanger5	1abb2a20e7	fix: hatchet-lite connection leakage and improve listen/notify performance (#1924 ) * fix: hatchet-lite connection leakage and improve listen/notify performance * fix: cancel mq listener * remove event deps * skip webhook test for now	2025-06-30 17:13:09 -04:00
Matt Kaye	e62f7edab3	Fix: Streaming Bugs (#1913 ) * fix: bug with json parsing failing * fix: hang up on cancel and fail * fix: pub stream events even if tenant pubs are disabled * fix: condition * fix: eq	2025-06-26 16:22:56 -04:00
abelanger5	b8352bcaca	config: allow buffer settings to be configurable (#1649 )	2025-05-01 07:13:30 -04:00
abelanger5	2c1f1f4808	test: improve Go testing harness (#1631 ) * test: improves testing harness for engine * update CI test * fix: race condition in test * make tests more stable * cleanup pub and sub buffers * fix: goleak on rampup test * feat: matrix tests for engine	2025-04-29 10:55:16 -04:00
abelanger5	ef6668a8c3	fix: go signature and docs (#1561 ) * fix: go signature and docs * Update examples/v1/workflows/concurrency-rr.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-04-15 17:31:48 -04:00
abelanger5	6813ab1c75	fix: streaming order improvements, go sdk stability (#1536 ) * fix: streaming order improvements, go sdk stability * fix: improve replay query	2025-04-11 13:02:47 -04:00
abelanger5	b03a8d2666	improve ttl cache on pgmq (#1438 ) * improve ttl cache on pgmq * fix: panic	2025-03-28 09:27:12 -07:00
abelanger5	a20ab2de65	fix(v1): add exponential backoff for internal retries (#1399 )	2025-03-25 09:14:15 -07:00
abelanger5	ac968e94b8	fix: concurrency issues and a few small improvements (#1324 )	2025-03-12 16:30:34 -04:00
abelanger5	1f2096313d	feat: v1 engine (#1318 )	2025-03-11 14:57:13 -04:00
Sean Reilly	190f3f984a	clean up rabbit mq session stuff, add a quick ack and error processin… (#1197 ) * clean up rabbit mq session stuff, add a quick ack and error processing for AddMessage * bit more paranoid about getting stuck in chans * first pass at locking the message to deal with the failed states better * clean up the access to ready for the mq * make sure we don't block sending this ack	2025-01-23 16:06:02 -08:00
abelanger5	61ae067014	fix: race condition on err in pgmq (#1198 )	2025-01-18 16:20:24 +00:00
abelanger5	dcb67a1dac	feat: postgres-backed message queue (#1119 )	2024-12-18 09:00:54 -05:00
abelanger5	2cdee59aea	refactor: optimize v0.50.0 release (#975 ) - Simplifies architecture for splitting engine services into different components. The three supported services are now `grpc-api`, `scheduler`, and `controllers`. The `grpc-api` service is the only one which needs to be exposed for workers. The other two can run as unexposed services. - Fixes a set of bugs and race conditions in the `v2` scheduler - Adds a `lastActive` time to the `Queue` table and includes a migration which sets this `lastActive` time for the most recent 24 hours of queues. Effectively this means that the max scheduling time in a queue is 24 hours. - Rewrites the `ListWorkflowsForEvent` query to improve performance and select far fewer rows.	2024-10-23 12:05:16 +00:00
abelanger5	95558138a4	chore: improve throughput, remove deadlocks (#949 ) * add otel to pub * temporarily remove tenant id exchange * fix: increase internal queue throughput * fix: remove potential deadlocking * rollback hash factor multiplier * fix: batch update issues * fix: rm unneeded locks * move disable tenant pubsub to an env var --------- Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com>	2024-10-10 08:54:34 -04:00
abelanger5	8939c94f63	fix: send fewer messages to job queue when it's not necessary (#932 ) * handle started at differently * fix: start job runs in workflows controller * fix: keep job runs around for backwards compat	2024-10-03 07:39:06 -04:00
abelanger5	c3fa2c57f3	fix: don't need acks on queue checks (#926 )	2024-10-02 00:52:02 +00:00
abelanger5	5f5e1e8a88	refactor: use shared tenant listener for messages (#911 ) * refactor: use shared tenant listener per tenant exchange * fix: remove subscription properly	2024-09-26 14:54:11 -04:00
abelanger5	9d69e4d192	fix: use read-only message queue (#897 ) * fix: use read-only message queue * set very high qos for read-heavy queue	2024-09-24 18:30:43 -04:00
abelanger5	0204929b02	fix: concurrency key performance (#894 )	2024-09-19 21:28:08 -04:00
abelanger5	263eaf069b	feat: pass otel through msgqueue (#802 ) * feat: pass otel through msgqueue * feat: more spans on scheduling * otel increase batch size	2024-08-28 14:45:02 +00:00
Gabe Ruttner	4ea4712d4d	refactor: performance and throughput (#756 ) Refactors the queueing logic to be fairly balanced between actions, with each action backed as a separate FIFO queue. Also adds support for priority queueing and custom queues, though those aren't exposed on the API layer yet. Improves throughput to be > 5000 tasks/second on a single queue. --------- Co-authored-by: Alexander Belanger <alexander@hatchet.run>	2024-08-12 14:38:47 +00:00
Viktor Szépe	0948598749	Fix typos (#775 )	2024-08-10 10:58:33 +00:00
Gabe Ruttner	b4670af138	Fix qos otel config (#754 ) * feat: otel trace id ratio * feat: rabbitmq qos * feat: requeue limit * fix: tests	2024-07-30 18:11:10 -04:00
abelanger5	5538196169	fix: correct lengths on random.Generate (#638 )	2024-06-25 15:12:59 -04:00
Luca Steeb	b6dcb4e7e9	refactor(random): refactor random string generation (#633 )	2024-06-24 23:44:03 +01:00
abelanger5	7c3ddfca32	feat: api server extensions (#614 ) * feat: allow extending the api server * chore: remove internal packages to pkg * chore: update db_gen.go * fix: expose auth * fix: move logger to pkg * fix: don't generate gitignore for prisma client * fix: allow extensions to register their own api spec * feat: expose pool on server config * fix: nil pointer exception on empty opts * fix: run.go file	2024-06-19 09:36:13 -04:00
abelanger5	b0b2e26952	feat: hatchet-lite (#560 ) * feat: hatchet-lite mvp * fix: init shadow db * fix: install atlas * fix: correct env * fix: wait for db ready * fix: remove name flag * fix: add hatchet-lite to build	2024-06-06 14:03:53 -04:00
abelanger5	ff90533458	fix: only close rabbitmq channels if they are open (#402 )	2024-04-22 05:35:30 -04:00
abelanger5	347bc5dd53	feat: rabbitmq connection pooling (#387 ) * feat: add rabbitmq connection pool and remove non-fatal worker errors * chore: go mod tidy * fix: release pool after opening channel * fix: make sure channel is closed after all tasks return on subscribe * fix: don't loop endlessly	2024-04-16 16:45:03 -04:00
abelanger5	08f0864046	fix: retry rabbitmq connections properly and retry published messages (#369 )	2024-04-10 15:48:06 -04:00
abelanger5	7b7fbe3668	fix: update `Requeue` and `Reassign` logic to fix performance degradation when many events are queued (#310 ) Logic for requeueing and reassigning did not limit the number of step runs to requeue, so when events accumulate with no worker present it causes memory to spike along with a very high query latency on the database. This commit limits the number of step runs returned in the requeue and reassign queries, and also properly locks step run rows for these queries so only a step run in a PENDING or PENDING_ASSIGNMENT state can be requeued. It also improves performance of the `AssignStepRunToWorker` query and ensures that `maxRuns` on workers are always respected through the introduction of a `WorkerSemaphore` model. The value gets decremented when a step run is assigned and incremented when a step run is in a final state. Co-authored-by: Luca Steeb <contact@luca-steeb.com> * Update controller.go --------- Co-authored-by: steebchen <contact@luca-steeb.com>	2024-04-01 12:33:18 -04:00
abelanger5	c66f97c856	fix: deadlocks on workers and tickers (#241 ) * chore: add sentry support to engine * fix: deadlocks on workers and tickers * refactor: reduce prisma calls in engine * trigger * fix: remove some tenant lookups * feat: dlx and renamed taskqueue -> msgqueue * refactor: get group key run logic * fix: retry counts on messages and concurrency edge cases * fix: rabbitmq integration tests * feat: add consumer timeouts --------- Co-authored-by: Luca Steeb <contact@luca-steeb.com>	2024-03-12 00:45:18 -04:00

45 Commits