hatchet

mirror of https://github.com/hatchet-dev/hatchet.git synced 2026-02-19 14:49:10 -06:00

Author	SHA1	Message	Date
abelanger5	d071a1c36b	fix: prevent large worker gRPC stream backlogs (#2597 ) * fix: prevent large worker backlogs * add config value * add doc for troubleshooting	2025-12-03 17:15:43 -05:00
matt	7fe9806f5d	Feat: Configurable OLAP status update size limits (#2499 ) * feat: configurable status updates * fix: config * fix: wiring * feat: export limits from olap * fix: param drilling	2025-11-06 13:37:40 -05:00
Mohammed Nafees	ed4c0327ce	[hotfix] Meaningful casing for engine liveness and readiness probes (#2465 ) * more fixes for engine live and ready probes * rename * no need to set it to false * fix casing health check * log onlt when not shutting down	2025-10-30 20:24:33 +01:00
Mohammed Nafees	b58359d7b3	Do not run cleanup on `v1_workflow_concurrency_slot` (#2463 ) * do not run cleanup on v1_concurrency_slot * fix health endpoints for engine	2025-10-30 15:34:50 +01:00
Mohammed Nafees	e2b1f1353e	Fix OTel span attribute naming convention (#2409 ) * rename spans according to convention * low cardinality	2025-10-16 18:43:40 +02:00
Mohammed Nafees	a750ce950d	Introduce vars to tune `ANALYZE` job gocron run intervals (#2407 ) * introduce cars to tune ANALYZE job gocron run intervals * update config doc * fix assignment	2025-10-10 11:02:10 +02:00
Gabe Ruttner	f59ebd6c47	feat: analytics events (#2171 ) * feat: analytics events * review comments	2025-08-22 05:41:17 -07:00
Mohammed Nafees	793df41ccb	Deploy HyperDX locally via docker-compose and add traces to task controller (#2058 ) * deploy jaegar locally and add traces to task controller * use jaegar v2 * add SERVER_OTEL_COLLECTOR_AUTH * fix PR comments * fix span name	2025-07-29 16:24:38 +02:00
abelanger5	27435a72d6	feat: option to disable logging (#2030 )	2025-07-21 16:53:11 +02:00
Mohammed Nafees	ef498a6235	Introduce tenant Prometheus metrics (#1875 ) * introduce tenant workflow completed metric * expose tenant prom metrics via handler * fix workflow and worker id in metrics * correctly add workflow metrics from workflow controller * use olap DB to gather information for workflow completion * fix prom metrics endpoint for tenant * workflow name from external id * simplify tenant registry based metrics * add docs for prometheus metrics * fix docs lint * run prettier fix * WIP metrics work * use federate prom server URL to proxy metrics * implement workflow duration histogram metric * separate prom stack docker compose * fix duration metrics calls * move scheduler metrics to prom tenant specific file * update docs for prom metrics * fix lint * use proper indices to query for durations * reorg tenant metrics * fix lint for doc * update docs with promql examples and casing around prom metrics enabled * update prom server url * fix lint * enabled prom metrics for v1 only from controller	2025-06-27 11:46:31 -04:00
Gabe Ruttner	68de72d534	Ops disableable replay (#1855 ) * try lock * revert * Update pkg/repository/v1/scheduler_concurrency.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update pkg/repository/v1/scheduler_concurrency.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * both strats * disable * remove input --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-12 15:25:38 -04:00
Gabe Ruttner	1261509755	configurable task ops jitter (#1800 ) * configurable task ops jitter * single config, configurable poll * revert timeout * fix correct param	2025-06-02 16:02:01 -04:00
Gabe Ruttner	1421c826ad	Feat configurable olap jitter (#1759 ) * jitter * times * configurable olap jitter and interval	2025-05-21 11:01:00 -04:00
abelanger5	8f9ae4ecf2	fix: make stripped payload size configurable (#1685 )	2025-05-07 09:13:07 -04:00
abelanger5	d4ba9c761d	feat: pause internal controllers (#1670 ) * feat: pause internal controllers * improve controller active logic	2025-05-03 18:19:34 -04:00
abelanger5	ffbeafc204	revert: add back testing harness (#1659 ) * re-add new testing harness * add healthcheck port and pick random grpc port to listen on * feat: parallel load tests and faster tests * make parallelism = 5 * fix: lint * add linter to pre * fix: add back rampup fixes * reduce matrix on PR, add matrix to pre-release step * make load tests less likely to block * make limit strategy group round robin * uncomment lines	2025-05-01 15:22:30 -04:00
abelanger5	dacf48180b	feat: sampling (#1592 ) * feat: sampling * Update internal/services/controllers/v1/olap/controller.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * docs: sampling * sampling -> trace sampling --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-04-23 17:28:58 -04:00
abelanger5	9aead7ab68	feat: global prometheus metrics (#1568 ) * feat: global prometheus metrics * configure prom with env vars, clean up metrics * add histogram and docs * update port	2025-04-17 15:11:38 -04:00
abelanger5	c54bf9266c	feat(v1): tenant limits (#1388 ) * feat(v1): tenant limits * fix: migration * fix: kill metered cache	2025-03-23 19:03:55 -07:00
abelanger5	00c4bbff09	feat(v1): new gRPC API endpoints (#1367 ) * wip: api contracts * feat: implement put workflow version endpoint * add support for match existing data, get scaffolding in place for additional triggers * create additional matches * feat: durable sleep, user event matching * update protos * fix: working poc of user events, durable sleep * add migration * fix: migration column * feat: durable event listener * fix: skip overrides * fix: input -> output	2025-03-23 18:58:20 -07:00
abelanger5	e91047d7b3	feat: add back tenant alerting to v1 (#1372 )	2025-03-19 17:50:42 -04:00
abelanger5	1f2096313d	feat: v1 engine (#1318 )	2025-03-11 14:57:13 -04:00
Sean Reilly	a8dd33c61f	Feature - configurable logging backend (#1188 ) * allow us to configure different repos * make the struct contents public * pass in config values to new log repo * rename functions - possibly breaking changes so lets discuss * make the logging backend configurable * fix tests * don't allow calls to WithAdditionalConfig * cleanup * replace sc with server Co-authored-by: abelanger5 <belanger@sas.upenn.edu> * rename sc to server * add a LRU cache for the step run lookup * lets not use an expirable cache and just use the regular one - we cannot close the go func in exirable --------- Co-authored-by: abelanger5 <belanger@sas.upenn.edu>	2025-01-17 15:34:10 -08:00
Sean Reilly	9e961ac196	Feature add version info (#1154 ) * adding a /version endpoint for the engine and a /api/v1/version endpoint for the API * make the security optional so we don't get redirected for having auth * lint * upgrade protoc to the latest available version on brew * use useQuery and clean up html	2025-01-06 10:50:04 -08:00
abelanger5	a9936ef687	fix: set otel insecure flag for all telemetry instantiations (#999 )	2024-10-30 17:34:36 -04:00
abelanger5	7ece86dfff	fix: start scheduler if old config is used (#989 )	2024-10-24 10:52:57 -04:00
abelanger5	2cdee59aea	refactor: optimize v0.50.0 release (#975 ) - Simplifies architecture for splitting engine services into different components. The three supported services are now `grpc-api`, `scheduler`, and `controllers`. The `grpc-api` service is the only one which needs to be exposed for workers. The other two can run as unexposed services. - Fixes a set of bugs and race conditions in the `v2` scheduler - Adds a `lastActive` time to the `Queue` table and includes a migration which sets this `lastActive` time for the most recent 24 hours of queues. Effectively this means that the max scheduling time in a queue is 24 hours. - Rewrites the `ListWorkflowsForEvent` query to improve performance and select far fewer rows.	2024-10-23 12:05:16 +00:00
abelanger5	0ec434d62e	feat: allow insecure option for otel collector address (#971 ) * feat: allow insecure option for otel collector address * cast to lower	2024-10-16 20:16:22 +00:00
abelanger5	67a96d7166	feat(throughput): single process per queue (#956 ) * feat(throughput): single process per queue * fix data race * fix: golint and data race on load test * wrap up initial v2 scheduler * fix: more debug logs and tighten channel logic/blocking sends * improved casing on dispatcher and lease manager * fix: data race on min id * increase wait on load test, fix data race * fix: trylock -> lock * clean up queue when no longer in set * fix: clean up cache on exit * ensure cleanup is only called once * address review comments	2024-10-15 11:05:19 -04:00
Sean Reilly	27736fa30f	bulk insert buffering (#913 ) Adds bulk inserts to event writes, and adds a generic buffer which can be used by future batch implementations.	2024-10-03 16:26:12 -04:00
abelanger5	bfb11cac51	fix: always use retention on queues, optional data/worker (#916 )	2024-09-27 14:23:14 -04:00
abelanger5	b5014f6b3d	chore: more visibility and debug lines for queues (#836 ) * chore: more visibility and debug options for queues * better debug lines on queue repo * don't log so much in load test	2024-08-29 14:49:24 -04:00
abelanger5	6317f86793	refactor: consolidate partition logic (#826 ) * refactor: consolidate partition logic * fix: race on scheduler * fix: move partition uuid to db query * fix: generate	2024-08-27 15:28:53 -04:00
Gabe Ruttner	4ea4712d4d	refactor: performance and throughput (#756 ) Refactors the queueing logic to be fairly balanced between actions, with each action backed as a separate FIFO queue. Also adds support for priority queueing and custom queues, though those aren't exposed on the API layer yet. Improves throughput to be > 5000 tasks/second on a single queue. --------- Co-authored-by: Alexander Belanger <alexander@hatchet.run>	2024-08-12 14:38:47 +00:00
Gabe Ruttner	b4670af138	Fix qos otel config (#754 ) * feat: otel trace id ratio * feat: rabbitmq qos * feat: requeue limit * fix: tests	2024-07-30 18:11:10 -04:00
Gabe Ruttner	b802f9f45f	feat: stream by addl meta (#751 ) * feat: prop schedule and run * wip * fix: filter wfrid * feat: hangup * chore: rm debug log * chore: func name * fix: cancelled payload * fix: load * fix: cleanup the cahce * fix: single proto * fix: key -> val * chore: case * chore: rm dead code * chore: rm dead code * feat: go and docs * fix: docs	2024-07-29 19:09:51 +00:00
Gabe Ruttner	ad29edb44f	fix: partitioned semaphore resolver (#731 ) * fix: partition and improve query * feat: paginate until done * chore: address comments * fix: write partitions	2024-07-18 11:06:25 -04:00
Gabe Ruttner	b7cec9ec53	feat: soft delete (#717 ) * feat: soft delete workflows and versions * feat: filter soft deletes wf and wfr * feat: filter events and step runs * fix: query * fix: query * chore: generate * wip * chore: squash migrations * chore: separate retention into new service * feat: regularly clean up * chore: migrations * fix: tests * fix: queries * fix: ambiguous * fix: refs * fix: ambiguous id * fix: remove update from * fix: soft delete * fix: cleanup retention scheduler * fix: has more query * chore: gen * fix: query * fix: table	2024-07-18 09:06:05 -04:00
abelanger5	8f8f3ad287	fix: reduce max throughput of requeue (#713 ) * fix: reduce max throughput of requeue * fix: reassign query * fix: move step run timeout to partition model * fix: partitioning queries and index * better logs on requeue * fix: inactive rebalance and get step run for engine query * fix: correct inactive queries	2024-07-12 14:03:55 -04:00
abelanger5	c2debe62d8	fix: add back deprecated service names and fix webhook worker query (#660 )	2024-06-27 08:01:02 -04:00
abelanger5	f2c6bc1f44	feat: tenant partitioning (#649 ) * feat: tenant partitioning * fix: rebalance inactive partitions, split into separate partitioner * fix: shutdown partitioner scheduler properly * update config options * fix: config options linting	2024-06-26 21:06:51 +00:00
Gabe Ruttner	a8d42819ea	feat: check security service (#639 ) * feat: check security service * feat: propegate version * feat: with ident * fix: lint * chore: generate * fix: change domain * fix: panic recover * fix: migrations * fix: hash * fix: dont check in tests	2024-06-26 16:26:29 -04:00
abelanger5	d19e299d1e	refactor: make engine runnable with config instead of loader (#640 ) * refactor: make hatchet-engine runnable programmatically * feat: export teardown name and fn	2024-06-26 08:14:30 -04:00
Luca Steeb	1490d88954	feat: webhook workers (#542 ) Adds serverless support via the concept of webhook workers. Allows any webhook to be registered as a serverless endpoint for executing a step.	2024-06-25 17:06:43 -04:00
abelanger5	7c3ddfca32	feat: api server extensions (#614 ) * feat: allow extending the api server * chore: remove internal packages to pkg * chore: update db_gen.go * fix: expose auth * fix: move logger to pkg * fix: don't generate gitignore for prisma client * fix: allow extensions to register their own api spec * feat: expose pool on server config * fix: nil pointer exception on empty opts * fix: run.go file	2024-06-19 09:36:13 -04:00
Gabe Ruttner	bbc4e58dd9	feat: limits (#559 ) * feat: workflow run limits * fix: resource exhausted 429 * feat: event limit * feat: worker limit * fix: sensible error * fix: pb * feat: expose limits api * feat: default limits * feat: add enable alert option * feat: slack and email alerts * fix: cron interval * feat: make metered util * wip: schedules and crons * chore: squash migration * fix: select or insert * fix: remove unfinished meter * chore: atlas migration * fix: template format * fix: shared ErrResourceExhausted * feat: cache * fix: limit can be nil * fix: clarification * fix: close meter ticker * fix: friendly error for child workflows	2024-06-07 10:57:57 -07:00
abelanger5	68a79fe071	fix: handle nil input more gracefully (#486 )	2024-05-13 13:07:41 -04:00
abelanger5	b50ed62924	feat: alerting from slack and email (#461 ) * feat: alerting. implements slack alerting, email, and refactors tenant settings to make them more manageable * chore: generate * chore: generate sqlc after migrate	2024-05-08 10:04:58 -04:00
abelanger5	e0d363e796	chore: intercept grpc errors and don't send internal to client (#370 )	2024-04-10 19:03:18 -04:00
Gabe Ruttner	d8b6843dec	feat: streaming events (#309 ) * feat: add stream event model * docs: how to work with db models * feat: put stream event * chore: rm comments * feat: add stream resource type * feat: enqueue stream event * fix: contracts * feat: protos * chore: set properties correctly for typing * fix: stream example * chore: rm old example * fix: async on * fix: bytea type * fix: worker * feat: put stream data * feat: stream type * fix: correct queue * feat: streaming payloads * fix: cleanup * fix: validation * feat: example file streaming * chore: rm unused query * fix: tenant check and read only consumer * fix: check tenant-steprun relation * Update prisma/schema.prisma Co-authored-by: abelanger5 <belanger@sas.upenn.edu> * chore: generate protos * chore: rename migration * release: 0.20.0 * feat(go-sdk): implement streaming in go --------- Co-authored-by: gabriel ruttner <gabe@hatchet.run> Co-authored-by: abelanger5 <belanger@sas.upenn.edu>	2024-04-01 15:46:21 -04:00

1 2

72 Commits