Commit Graph

492 Commits

Author SHA1 Message Date
matt c44c70bd0c Debug: Add debug logs around put log method (#2079)
* feat: add logger to ingestor

* debug: add a bunch of debug logs

* fix: add prefix for grep

* fix:  copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: panic

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-04 11:19:07 -04:00
matt 8480228d79 Fix: Allow bypassing partitioning for events lookup table (#2054)
* fix: allow olap events lt to not be partitioned manually

* chore: gen

* chore: gen
2025-07-31 18:18:49 -04:00
Mohammed Nafees cc1331c59f Use PostgreSQL advisory lock to create task table partitions instead of depending on internal tenant (#1991)
* use pg advisory lock for task table partition

* fix lint

* use a separate transaction for advisory lock

* fix lint

* use PrepareTx

* short circuit return fast if partitions already exist

---------

Co-authored-by: mrkaye97 <mrkaye97@gmail.com>
2025-07-31 18:18:15 -04:00
Mohammed Nafees e6c50ca1a0 Allow member roles to be changed by owners and admins (#2044)
* allow member roles to be changed by owners and admins

* PR comments

* chore: gen

* fix: rm changes to /next/

* chore: gen

---------

Co-authored-by: mrkaye97 <mrkaye97@gmail.com>
2025-07-30 17:42:34 -04:00
matt 392483c5d8 Fix: Weekly partition dropping (#2066)
* fix: check weekly partitions older than a week old

* fix: logic

* chore: gen
2025-07-30 16:28:08 -04:00
matt d6f8be2c0f Feat: OLAP Table for CEL Eval Failures (#2012)
* feat: add table, wire up partitioning

* feat: wire failures into the OLAP db from rabbit

* feat: bubble failures up to controller

* fix: naming

* fix: hack around enum type

* fix: typo

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: typos

* fix: migration name

* feat: log debug failure

* feat: pub message from debug endpoint to log failure

* fix: error handling

* fix: use ingestor

* fix: olap suffix

* fix: pass source through

* fix: dont log ingest failure

* fix: rm debug as enum opt

* chore: gen

* Feat: Webhooks (#1978)

* feat: migration + go gen

* feat: non unique source name

* feat: api types

* fix: rm cruft

* feat: initial api for webhooks

* feat: handle encryption of incoming keys

* fix: nil pointer errors

* fix: import

* feat: add endpoint for incoming webhooks

* fix: naming

* feat: start wiring up basic auth

* feat: wire up cel event parsing

* feat: implement authentication

* fix: hack for plain text content

* feat: add source to enum

* feat: add source name enum

* feat: db source name enum fix

* fix: use source name enums

* feat: nest sources

* feat: first pass at stripe

* fix: clean up source name passing

* fix: use unique name for webhook

* feat: populator test

* fix: null values

* fix: ordering

* fix: rm unnecessary index

* fix: validation

* feat: validation on create

* fix: lint

* fix: naming

* feat: wire triggering webhook name through to events table

* feat: cleanup + python gen + e2e test for basic auth

* feat: query to insert webhook validation errors

* refactor: auth handler

* fix: naming

* refactor: validation errors, part II

* feat: wire up writes through olap

* fix: linting, fallthrough case

* fix: validation

* feat: tests for failure cases for basic auth

* feat: expand tests

* fix: correctly return 404 out of task getter

* chore: generated stuff

* fix: rm cruft

* fix: longer sleep

* debug: print name + events to logs

* feat: limit to N

* feat: add limit env var

* debug: ci test

* fix: apply namespaces to keys

* fix: namespacing, part ii

* fix: sdk config

* fix: handle prefixing

* feat: handle partitioning logic

* chore: gen

* feat: add webhook limit

* feat: wire up limits

* fix: gen

* fix: reverse order of generic fallthrough

* fix: comment for potential unexpected behavior

* fix: add check constraints, improve error handling

* chore: gen

* chore: gen

* fix: improve naming

* feat: scaffold webhooks page

* feat: sidebar

* feat: first pass at page

* feat: improve feedback on UI

* feat: initial work on create modal

* feat: change default to basic

* fix: openapi spec discriminated union

* fix: go side

* feat: start wiring up placeholders for stripe and github

* feat: pre-populated fields for Stripe + Github

* feat: add name section

* feat: copy improvements, show URL

* feat: UI cleanup

* fix: check if tenant populator errors

* feat: add comments

* chore: gen again

* fix: default name

* fix: styling

* fix: improve stripe header processing

* feat: docs, part 1

* fix: lint

* fix: migration order

* feat: implement rate limit per-webhook

* feat: comment

* feat: clean up docs

* chore: gen

* fix: migration versions

* fix: olap naming

* fix: partitions

* chore: gen

* feat: store webhook cel eval failures properly

* fix: pk order

* fix: auth tweaks, move fetches out of populator

* fix: pgtype.Text instead of string pointer

* chore: gen

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-30 13:27:38 -04:00
abelanger5 e2af391a9b fix: remove essential pool to prevent bottleneck on heartbeats (#2060) 2025-07-29 17:06:29 -04:00
Mohammed Nafees 793df41ccb Deploy HyperDX locally via docker-compose and add traces to task controller (#2058)
* deploy jaegar locally and add traces to task controller

* use jaegar v2

* add SERVER_OTEL_COLLECTOR_AUTH

* fix PR comments

* fix span name
2025-07-29 16:24:38 +02:00
matt fc374cb8db hotfix: separate statements for delete-then-insert declarative filters (#2053) 2025-07-25 13:46:56 -04:00
abelanger5 c377e75f61 fix: revert lease updates (#2038) 2025-07-22 12:06:58 +02:00
matt 7295254bfa Fix: filter lookup not retaining scope information (#2036)
* fix: filter lookup not retaining scope information

* fix: copilot suggestion

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: add DISTINCT to filter query

* fix: struct deduping

* feat: test

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-22 11:59:34 +02:00
Mohammed Nafees c26ff03dc0 [hotfix] Fix duration calculation of DAGs and single tasks (#2035)
* fix duration for multi tenant

* use external ids

* fix lint
2025-07-21 20:41:23 +02:00
abelanger5 467c6197ba fix: many updates on the lease table (#2034) 2025-07-21 20:40:38 +02:00
matt 3dcd6059c8 Fix: Partition pruning for PreflightCheckTasksForReplay (#2029)
* feat: partition pruning for PreflightCheckTasksForReplay

* fix: use 1d as placeholder

* fix: use current time instead

* fix: pass inserted ats through correctly

* fix: try adding a CTE

* fix: query cleanup
2025-07-21 20:30:59 +02:00
abelanger5 27435a72d6 feat: option to disable logging (#2030) 2025-07-21 16:53:11 +02:00
matt 5bf9f97720 Fix: Validate payloads + metadata and error on illegal unicode (#2023)
* feat: add helper method to repository

* feat: 400 on event pushes with invalid payloads

* fix: pointer

* feat: add to trigger

* feat: error on bulk trigger

* feat: error on schedule

* fix: validate log lines

* feat: validate crons

* feat: fail the task

* fix: rm debug line
2025-07-20 22:44:28 -04:00
matt c202ec8359 Feat: CEL Debug Endpoint (#2010)
* feat: openapi spec + gen

* feat: scaffold cel service

* feat: impl with discriminated union

* fix: reversed

* chore: gen py

* chore: gen + add cel to hatchet client

* feat: wire up TS CEL client

* chore: versions

* feat: impl for go

* fix: error handling

* feat: python tests
2025-07-20 22:44:08 -04:00
Matt Kaye 7388c6df73 Fix: Improve UpdateDAGStatuses and UpdateTaskStatuses (#2020)
* fix: start improving query

* feat: add helper query for partition pruning

* feat: use helper query

* feat: similar optimizations for tasks query
2025-07-18 08:29:44 -04:00
Mohammed Nafees c5915a3b14 Add rate limiter around scheduler concurrency (#2021)
* add rate limiter around scheduler concurrency

* have upper limit

* loadtest should pass now
2025-07-18 08:24:57 -04:00
Matt Kaye 48734c8cb8 Fix: Multiple tenants for task & dag status updates (#2019)
* feat: add function to fetch tenants in partition

* feat: update updatedagstatuses query to take list of tenants

* feat: wire tenant id through

* feat: hack string delim to wire writes through

* fix: unnest result of first func

* feat: task updates

* fix: error handling

* fix: one more func + migration

* fix: gen

* fix: concurrent tenant prom metrics and remove tenant operations

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2025-07-17 14:59:45 -04:00
Matt Kaye 02601fa0ef Fix: Replay bugs (#2001)
* fix: dedupe tasks before replaying

* fix: two toasts

* fix: send workflow run external id through

* fix: send messages to queue immediately

* fix: clean up types

* fix: dedupe

* fix: return task ids instead of workflow ones
2025-07-16 11:42:36 -04:00
Matt Kaye 4676ae8508 FIx: Feedback on replay / cancel (#1997)
* feat: return a response from replays

* fix: return correct thing

* feat: wire up cancel + replay toasts on FE

* fix: naming

* fix: other refs

* fix: linter setup
2025-07-15 13:29:52 -04:00
Matt Kaye 0b21d74712 Fix: Remove internal replay batching for now (#1992)
* fix: remove batching, run replays serially

* proposal: do this at the replay controller level

* Revert "fix: remove batching, run replays serially"

This reverts commit 21a93bb260.

* feat: advisory lock

* fix: add prefix to lock
2025-07-15 13:10:30 -04:00
abelanger5 24b0d0c9d0 fix: panic when rate limit units are passed as nil (#1963) 2025-07-14 13:28:35 -04:00
Mohammed Nafees c86a65bb0f Add new streaming support to Go SDK (#1955)
* add Go SDK streaming support

* make docs changes for go sdk streaming

* fix git lfs warning

* streaming go example

* fix lint

* fix auto generated snip

* revert poetry lock changes

* some cleanup
2025-07-11 18:00:30 +02:00
Mohammed Nafees f247a63137 add check for cel input nil (#1977) 2025-07-10 09:44:47 -04:00
abelanger5 53020696e9 fix(go-sdk): v1 rate limit config (#1962) 2025-07-07 16:41:41 -04:00
Mohammed Nafees 33ec5fb7d8 add docs for Go SDK bulk operations (#1954) 2025-07-07 13:04:49 +02:00
abelanger5 6e820a120c feat: waterfall component (#1952)
* tmp: waterfall component

* feat: waterfall component

* address pr review comments
2025-07-04 14:47:30 -04:00
Matt Kaye 7679732b15 Fix: Skipping conditions with multiple parents (#1948)
* fix: skipping bug

* fix: move `waits` -> `conditions`

* fix: refs

* chore: ver

* feat: add skipped task to test

* feat: start implementing or groups in wait for

* feat: test of or groups on durable context

* fix: lint

* chore: gen

* fix: lint

* fix: branching hell
2025-07-03 16:50:57 -04:00
Jean-Baptiste Souvestre f08c348710 fix(scheduling): negative weigths ranks were not excluded from the candidate workers pool (#1941)
Co-authored-by: jbsouvestre <jean-baptiste@ubble.ai>
2025-07-03 09:03:12 -04:00
Mohammed Nafees 2ccd434ebf Add Prometheus metric for reassigned task total (#1943)
* add reassigned total metric

* lint fix
2025-07-03 10:52:20 +02:00
Mohammed Nafees 144b8dce9e make sure to default to QUEUED for new task initial state (#1931) 2025-07-02 14:45:09 +02:00
abelanger5 3468709a23 fix: correct config pt 2 (#1938) 2025-07-01 16:56:13 -04:00
abelanger5 e18b0e8f58 fix: don't print output data in CEL exception (#1936)
* fix: don't print output data in CEL exception

* add tzdata to lite and loadtest dockerfiles too
2025-07-01 16:16:19 -04:00
Matt Kaye c805a52e38 Fix: Events query performance improvements (#1930)
* fix: split up event queries for perf

* fix: refs

* fix: event join
2025-07-01 11:58:15 -04:00
Matt Kaye 23bdbbd8a3 Feat: Tenant-in-path (#1923)
* chore: gen

* feat: hook for tenant

* feat: add tenanted routes

* fix: no need for v1 prefix

* feat: remove v1 routes

* fix: remove ui version switcher stuff

* fix: more broken redirects

* fix: start using hooks to fetch tenant

* fix: add (commented out) linting rules

* fix: sidebar

* fix: cruft comment

* fix: layout

* fix: collapsibles

* fix: more refs to v1 paths

* fix: more refs to hold hooks

* fix: more refs

* fix: last few

* fix: more redirects

* fix: rm more refs to `useOutletContext`

* fix: rm tenant-as-prop

* fix: small bugs

* fix: revert unintended changes

* fix: couple more

* fix: last few

* fix: last few

* fix: oooone more

* fix: redirects

* fix: add more redirects

* fix: clean up a bunch more redirects

* fix: copy paste

* fix: more redirects

* fix: zero value bug

* hack: don't set query param on v1

* fix: lint

* fix: copy

* fix: copy

* fix: lint

* fix: rm /next redirect

* make default engine version v1

* feat: crons with timezones

* fix: handle case where tenant is in path

* fix: more hard redirects

* fix: delete v0 cancellation test

---------

Co-authored-by: Alexander Belanger <alexander@hatchet.run>
2025-07-01 11:56:54 -04:00
abelanger5 646adda2a8 fix: concurrency timeout from 5s -> 30s (#1926)
* fix: concurrency timeout from 5s -> 30s

* limits in overwrite file too
2025-07-01 08:05:59 -04:00
abelanger5 0e7cc7e063 add limits to schedule timeout logic (#1925) 2025-06-30 17:18:13 -04:00
abelanger5 1abb2a20e7 fix: hatchet-lite connection leakage and improve listen/notify performance (#1924)
* fix: hatchet-lite connection leakage and improve listen/notify performance

* fix: cancel mq listener

* remove event deps

* skip webhook test for now
2025-06-30 17:13:09 -04:00
abelanger5 b6d5a38c0f refactor: small updates to how task and dag statuses are handled (#1922)
* fix: lengths of update rows

* better fix for task and dag updates
2025-06-30 15:43:31 -04:00
abelanger5 66631764b3 fix: olap poll interval config (#1918) 2025-06-30 07:32:26 -04:00
Mohammed Nafees ef498a6235 Introduce tenant Prometheus metrics (#1875)
* introduce tenant workflow completed metric

* expose tenant prom metrics via handler

* fix workflow and worker id in metrics

* correctly add workflow metrics from workflow controller

* use olap DB to gather information for workflow completion

* fix prom metrics endpoint for tenant

* workflow name from external id

* simplify tenant registry based metrics

* add docs for prometheus metrics

* fix docs lint

* run prettier fix

* WIP metrics work

* use federate prom server URL to proxy metrics

* implement workflow duration histogram metric

* separate prom stack docker compose

* fix duration metrics calls

* move scheduler metrics to prom tenant specific file

* update docs for prom metrics

* fix lint

* use proper indices to query for durations

* reorg tenant metrics

* fix lint for doc

* update docs with promql examples and casing around prom metrics enabled

* update prom server url

* fix lint

* enabled prom metrics for v1 only from controller
2025-06-27 11:46:31 -04:00
Matt Kaye a45816c6c2 Fix: Streaming + Misc SDK Fixes (#1903)
* fix: filters contracts + version bumps

* chore: gen

* feat: implement streaming for ts

* fix: v0 sdk side by side

* fix: optional status on semaphore slots

* fix: gen script

* chore: lint + gen

* chore: gen

* fix: fmt

* fix: revert changes

* feat: handle incorrect return types

* fix: worker status not assigned

* fix: improve handling of other types of pydantic models

* fix: handle null output case

* fix: get group key

* fix: info level log for non-retry

* fix: export non retry at top level

* fix: changelog

* chore: gen

* chore: gen
2025-06-26 17:42:34 -04:00
Matt Kaye e62f7edab3 Fix: Streaming Bugs (#1913)
* fix: bug with json parsing failing

* fix: hang up on cancel and fail

* fix: pub stream events even if tenant pubs are disabled

* fix: condition

* fix: eq
2025-06-26 16:22:56 -04:00
Matt Kaye 25a98cf372 Feat: Copy workflow config from run pages (#1901)
* feat: add workflow details to more responses

* send workflow version back with task summary

* feat: add copy to run page

* feat: wire up button on other pages

* fix: copy
2025-06-25 13:26:13 -04:00
Matt Kaye eb08481483 fix: log error and skip if errors raised by CEL (#1898) 2025-06-25 09:45:21 -04:00
Mohammed Nafees f69f4d68b6 add more debugging around releaseTasks error handling (#1897) 2025-06-25 09:44:53 -04:00
Matt Kaye 33d60dfcd2 Fix: Filter list improvements (#1899)
* fix: uuid validation

* fix: improve filter filtering

* fix: inner join

* fix: bug in workflow cached prop

* chore: bump

* fix: lint

* chore: changelog

* fix: separate filter queries

* feat: improve filter filtering

* fix: queries and the like
2025-06-25 09:44:17 -04:00
Matt Kaye a887c62809 Fix: Store CreateWorkflowVersionOpts for debugging (#1890)
* feat: add json column for opts

* feat: wiring

* feat: send config through the api

* feat: FE

* fix: order

* fix: hide sched timeout

* fix: lint

* fix: return mutated opts

* fix: adv section

* fix: remove unnecessary headers

* feat: styling improvements to settings

* feat: styling, pt ii

* feat: styling, pt iii

* fix: cron
2025-06-23 16:56:22 -04:00