Commit Graph

110 Commits

Author SHA1 Message Date
Gabe Ruttner 4eaa9e7fd9 feat: configurable internal retry (#1049)
* feat: configurable internal retry

* fix: bump default to 3
2024-11-15 09:19:24 -05:00
Sean Reilly 9a5acc5179 modify the Event created at to be a clock_timestamp instead of a transaction timestamp so we maintain ordering of inserted events - also extend the length of the timestamp so we have enough significant bits (#1044)
* add the migration for the timestamp and clock

* regenerate

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-11-14 11:15:45 -08:00
abelanger5 780496e7fb fix: prevent infinite reassign loop (#1028) 2024-11-07 17:28:12 +00:00
Gabe Ruttner c531c36870 fix: filter-cancel-cases (#1027)
* fix: filter-cancel-cases

* fix: case CANCELLED_BY_CONCURRENCY_LIMIT
2024-11-07 11:18:50 -05:00
Alexander Belanger 5b59af076e fix: cancellation status propagation and minimap view 2024-11-07 11:13:14 -05:00
Gabe Ruttner 3871df01ee fix: dont bump deleted (#1024) 2024-11-06 16:11:36 -05:00
Gabe Ruttner 5759311574 fix: ratelimit and invalid output blocking queue (#1023)
* fix: rm unused offending code, handle unacked

* fix: handle invalid outputs

* fix: dont reset failed

* fix: case on json err

* fix: completed step run ids

* fix: scope
2024-11-06 18:21:22 +00:00
Gabe Ruttner 1003a1f5e7 fix: filter alert runs by failure only (#1001)
* fix: filter runs by failure only

* fix: post-lookup filter

* fix: filtered failures

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-11-01 11:46:27 +00:00
Gabe Ruttner 44addbb47e Feat scheduled improvements (#992)
* wip: stub schedule page

* wip: stub list

* fix: 2025 bug...

* feat: wip cron list

* feat: addl meta

* feat: expose metadata column

* feat: sort and created at

* cron to recurring

* scheduled: with statuses

* fix: links

* feat: expose schedule ids

* feat: delete run

* fix: remove search

* feat: filterable scheduled

* fix: remove broken features

* chore: lint

* rm metadata for now

* chore: lint

* chore: recurring to cron job

* fix: review comments

* fix: populator
2024-11-01 07:16:20 -04:00
Gabe Ruttner 4932e7f863 Feat sdk runtime (#942)
* feat: runtime signature

* feat: add sdk runtime to worker model

* feat: post runtime

* feat: expose sdk version on worker

* feat: go inf

* chore: gen

* chore: migrations and generation

* fix: simpler runtime

* feat: hatchet sdk ver

* fix: rm debug line
2024-10-28 13:47:12 -07:00
Sean Reilly 9f4b63817d add a serial write for step run events (#990)
* add a serial write for step run events

* update other problematic queries

* tmp: don't upsert queue

* add SerialBuffer to the config

* revert the change to config

* fix: add back queue upsert

* add statement timeout to upsert queue

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-10-25 16:56:38 +00:00
abelanger5 509542b804 fix: duplicate assignments in queuer (#993)
* wip: individual mutexes for actions

* tmp: debug panic

* remove debug code

* remove deadlocks package and don't write unassigned events

* fix: race condition in scheduler and add internal retries

* fix: data race
2024-10-25 16:52:43 +00:00
abelanger5 718d8f59c9 fix: rewrite queries for checking child workflows (#983)
* rewrite queries for child workflows

* add index

* fix: remove tenant id where it's not needed
2024-10-23 19:18:26 -04:00
abelanger5 dd5bc90497 fix: more efficient step run events, reduce caching on queue (#981) 2024-10-23 16:23:59 -04:00
Sean Reilly 35b115cb4f don't need to filter on tenant id for step runs & some debug for buffers (#980)
Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-10-23 15:04:11 -04:00
abelanger5 2cdee59aea refactor: optimize v0.50.0 release (#975)
- Simplifies architecture for splitting engine services into different components. The three supported services are now `grpc-api`, `scheduler`, and `controllers`. The `grpc-api` service is the only one which needs to be exposed for workers. The other two can run as unexposed services.
- Fixes a set of bugs and race conditions in the `v2` scheduler
- Adds a `lastActive` time to the `Queue` table and includes a migration which sets this `lastActive` time for the most recent 24 hours of queues. Effectively this means that the max scheduling time in a queue is 24 hours. 
- Rewrites the `ListWorkflowsForEvent` query to improve performance and select far fewer rows.
2024-10-23 12:05:16 +00:00
Sean Reilly ecb9ce1e1e rejig the query for creating multiple sticky states (#973)
* rejig the query for creating multiple sticky states

* fix: sticky strategy of soft and improve query

* fix: sort method was using indexes that didn't necessarilly correspond to original indexes, leading to inconsistent behavior

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-10-17 13:29:19 +00:00
abelanger5 c86a50711b fix: don't reset input for concurrency keys on replay (#970) 2024-10-16 15:55:28 -04:00
Sean Reilly 7e526de381 fix: deadlocks on events and incorrect step run ordering query (#966)
* make it so the bulk example succeeds

* make the bulk workflows work a little harder

* add some ordering to mitigate deadlocks

* fix: link step run parents bad query, improvements to locking

* add timed mutex and telemetry

* remove for update on cancel

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-10-16 10:28:33 -04:00
Gabe Ruttner 7cd08077d5 feat: improved sdk ack (#931)
* feat: add step run event reasons

* feat: ack

* fix: remove rejected reason

* fix: merge

* fix: correct buffer

* fix: consistent message

* chore: rm todo
2024-10-15 15:52:42 +00:00
abelanger5 67a96d7166 feat(throughput): single process per queue (#956)
* feat(throughput): single process per queue

* fix data race

* fix: golint and data race on load test

* wrap up initial v2 scheduler

* fix: more debug logs and tighten channel logic/blocking sends

* improved casing on dispatcher and lease manager

* fix: data race on min id

* increase wait on load test, fix data race

* fix: trylock -> lock

* clean up queue when no longer in set

* fix: clean up cache on exit

* ensure cleanup is only called once

* address review comments
2024-10-15 11:05:19 -04:00
Sean Reilly 29721cd1f0 Feat bulk workflows (#940)
Adds support for inserting workflows in bulk via the API and an optional buffered insert on the engine.
2024-10-14 15:35:29 -04:00
Gabe Ruttner c8711f7f83 fix: id constraint (#957)
* fix: id constraint

* chore: gen
2024-10-11 18:00:12 -04:00
Gabe Ruttner 3340ec8626 fix: event keys (#951)
* feat: insert unique event keys

* fix: list query

* feat: bulk

* chore: gen
2024-10-10 08:54:52 -04:00
abelanger5 3d218302ff fix: internal queue items performance and race conditions (#943)
* fix: don't use xmin hack

* fix: assign not append

* refactor: parallel step run updates via hashes

* fix: intermittent double execution of child step runs

* fix: rollback rate limits

* fix: bulk event writes from single buffer

* expose cleanup

* fix: race conditions on failures and cancellations

* change logger defaults to warn and console
2024-10-07 11:16:53 -04:00
abelanger5 fd4ee804d3 refactor: buffered writes of step run statuses (#941)
* (wip) handle step run updates without deferred updates

* refactor: buffered writes of step run statuses

* fix: add more safety on tenant pools

* add configurable flush period, remove wait for started

* flush immediately if last flush time plus flush period is in the past

* feat: add configurable flush internal/max items
2024-10-04 15:08:21 -04:00
Sean Reilly 27736fa30f bulk insert buffering (#913)
Adds bulk inserts to event writes, and adds a generic buffer which can be used by future batch implementations.
2024-10-03 16:26:12 -04:00
abelanger5 c29984305e fix: faster processing of timeout queue items (#924) 2024-10-01 13:50:38 +00:00
abelanger5 8d49a247c3 fix: loading states for workflow runs table (#923) 2024-10-01 09:19:43 -04:00
abelanger5 117533c1b5 fix: remove more fks (#922)
* fix: remove more fks

* chore: generate
2024-09-30 16:53:38 -04:00
Gabe Ruttner 7d7e43d4e1 feat: pauseable workflows (#879)
* feat: pause workflow state

* feat: dont run paused workflows

* feat: skipped paused

* implement unpaused behavior for workflow runs

* fix: frontend

* fix: more frontend

* fix: imports

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-09-29 10:58:10 -04:00
abelanger5 6172956bbd refactor: remove foreign keys from unchanged/non-cascading parent tables (#918)
* refactor: remove fks from unchanged/non-cascading parent tables

* fix: cleanup cache for engine repository

* fix: remove streamevent
2024-09-27 14:21:45 -04:00
abelanger5 925b2654c8 feat: workflow run metrics view (#912)
* feat: add callbacks for workflow run completed

* add tenant id to resolve row

* add finishedBefore, finishedAfter to workflow runs query

* add more callbacks

* feat: tenant ids and loggers in callback

* feat: workflow run metrics frontend

* fix: frontend build
2024-09-27 07:38:15 -04:00
abelanger5 a1a10b4073 feat: dynamic rate limits (#904)
* wip: step run expressions on rate limits

* feat: dynamic rate limits

* chore: v0.47.0

* chore: address changes from PR review

* fix: improved error handling

* address pr review

* better error messages for step run cels, remove debug logs

* fix: hash

---------

Co-authored-by: gabriel ruttner <gabriel.ruttner@gmail.com>
2024-09-26 22:00:34 +00:00
abelanger5 840e590312 fix: frontend improvements (#905)
* fix: set time range properly on reload

* fix: small button to show queue counts for now
2024-09-24 19:08:41 -04:00
Gabe Ruttner f98d3277b7 fix: trunc large payloads (#903)
* fix: trunc large payloads

* lets send the stepRuns and steps with output back on the WorkflowRunGet

* fix: times

* fix: rm unsafe

* rename to GetStepRunsForJobRunsWithOutput so we know we might potentially be getting a very large result set

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-09-24 22:52:00 +00:00
Sean Reilly 5811929928 feat: bulk inserts of events (#887)
* progress commit of bulk inserts

* in_flight: Add changes to metering finish the bulk insert

* remove an attempt to overide enforce limits

* merge in PR fixes

* update docs to add in an additional section in the User guide to describe pushing single events and pushing multiple events

* run lint fix

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-09-23 09:19:39 -07:00
abelanger5 baf13bd577 fix: duration int -> bigint (#902) 2024-09-23 08:30:16 -07:00
abelanger5 ad12f658da fix: have refresh timeout use timeout queue item (#898) 2024-09-23 05:41:06 -07:00
abelanger5 0204929b02 fix: concurrency key performance (#894) 2024-09-19 21:28:08 -04:00
Sean Reilly 15c50f46b5 Partial PR - need to generate SDK - Add endpoint to get the total free worker slots for a worker and the … (#857)
* Add endpoint to get the total free worker slots for a worker and the max runs

* update to use WorkerSempahoreCount instead of checking stepRunId

* modify the query for the new table and change the interface

* bump golangci-lint make changes to name of returned data

* revert the simple example

---------

Co-authored-by: Sean Reilly <sean@hatchet.run>
2024-09-19 10:11:16 -07:00
abelanger5 d23e5d9963 feat: expression-based concurrency keys (#889)
* feat: expression-based concurrency keys

* fix: build

* fix: typos

* fix: gen

* fix: migration

* fix: remove print statements

* fix: reassignment bugs, retries on closed transport, pr review
2024-09-19 10:32:22 -04:00
abelanger5 55eb63d9a4 fix: replay without group keys and status updates (#883) 2024-09-16 16:59:34 -04:00
Gabe Ruttner 2379e3638a fix: reset on replay (#875) 2024-09-16 17:01:51 +00:00
Gabe Ruttner af9ed49f1e fix: events list view (#878)
* fix: filter by event id

* fix: run count

* feat: filter by id api

* feat: filter by Event Id

* chore: default page is runs

* feat: cancel event runs

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-09-16 16:46:31 +00:00
Gabe Ruttner c64c62f66a feat: improved workflow run details page (#821)
* wip: rip prisma

* wip

* wip

* fix: lint

* wip

* wip

* gen

* wip

* wip

* fix trigger

* hide overview

* revert db changes

* feat: wrap up frontend changes and perf

* chore: generate

* chore: frontend build

* fix: workflow transformer

* fix: avoid race conditions on simultaneous parent completions

* fix: 2025 started

* feat: toast for replay/cancel

* fix: toast

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2024-09-16 15:39:49 +00:00
abelanger5 893637cb0f fix: improve LinkStepRunParents to prevent usage of temp files (#874) 2024-09-12 17:19:58 -04:00
abelanger5 bed2cb559a fix: add back sem slots, without row contention (#868)
* fix: add back sem slots, without row contention

* fix: serialize queue step runs to prevent dirty reads

* remove serializable for now

* statement timeouts on create workflow run

* statement timeout for reassign

* proper migration + cleanup

* remove old tables and code

* fix: worker slot state

* remove last unused table from workers
2024-09-11 20:47:49 +00:00
abelanger5 f4c5cd973e feat: more efficient step run timeouts (#863) 2024-09-10 18:23:11 -04:00
abelanger5 b635c875f6 fix: race conditions on release slot (#858)
* fix: race conditions on release slot

* better engine logs for ci

* fix: improve cancellation

* better debug logs and increase timeout
2024-09-10 14:22:32 -04:00