abelanger5
9f463e92d6
refactor: move v1 packages, remove webhook worker references ( #2749 )
...
* chore: move v1 packages, remove webhook worker references
* chore: move msgqueue
* fix: relative paths in sqlc.yaml
2026-01-02 11:42:40 -05:00
abelanger5
f82d3bd071
refactor: consolidate repository methods ( #2730 )
...
* refactor: remove v0 paths from codebase
* remove uiVersion references
* refactor: remove v0-exclusive database queries
* remove webhook test
* chore: move api token repository
* chore: move dispatcher repository to v1
* chore: move health repository to v1
* chore: remove event repository
* remove some unused repositories
* chore: move mq implementation to v1
* chore: consolidate rate limit implementations
* chore: move security check to v1 repository
* chore: move slack to v1 repository
* chore: move sns implementation to v1 repository
* clean up step repository
* chore: move tenant invite to v1 repository
* chore: move limits, workers, tenant alerts to v1 repository
* chore: move user, tenant, userSession to v1 repository
* chore: move ticker to v1 repository
* chore: move scheduled workflows to v1 repository
* chore: remove workflows
* fix: remove pointer for limits config file
* propagate cache value to api token
* propagate cache durations
2025-12-31 16:35:46 -05:00
abelanger5
dd9c36c315
refactor: remove v0 paths from codebase ( #2728 )
...
* refactor: remove v0 paths from codebase
* remove uiVersion references
2025-12-30 09:57:00 -05:00
abelanger5
2249ef3b79
fix: small scheduler optimizations ( #2426 )
...
* fix: actually increment snapshot count
* add a context with timeout to wrap replenish
2025-11-17 15:45:14 -05:00
Mohammed Nafees
cf5c5989ff
add vars to tune concurrency poller ( #2428 )
2025-10-23 11:36:12 -04:00
Mohammed Nafees
e2b1f1353e
Fix OTel span attribute naming convention ( #2409 )
...
* rename spans according to convention
* low cardinality
2025-10-16 18:43:40 +02:00
Mohammed Nafees
ed40a82dbb
Include tenant_id in OTel spans wherever possible ( #2382 )
2025-10-03 18:16:16 +02:00
matt
cf59a7bcd9
Feat: Worker slot Prom metrics ( #2195 )
...
* feat: add slots to prom metrics
* feat: available
* fix: extension instead
* fix: docs
* fix: rm unused query changes
* fix: rm unused struct
* fix: labels
* feat: improve total slots
* fix: pr feedback
* fix: docs
* Revert "fix: docs"
This reverts commit 7fe105da92 .
* fix: derive total slots
2025-09-08 14:07:44 -04:00
abelanger5
2c8ea66a7a
fix: remove rate limited items from in memory buffer ( #2207 )
2025-08-27 14:51:35 -04:00
abelanger5
acf7215b3f
fix: don't query database when flush is called concurrently ( #2202 )
2025-08-26 11:00:47 -04:00
abelanger5
8463b2c4a3
limit frequency of updates to rate limits ( #2173 )
2025-08-21 12:50:22 -04:00
abelanger5
1407594902
fix: move rate limited queue items off the main queue ( #2155 )
...
* fix: move rate limited queue items off the main queue
* preserve FIFO behavior on queues
* fix unit tests, address pr comments
* fix: generated
* rename table
2025-08-18 11:31:21 -04:00
Mohammed Nafees
c5915a3b14
Add rate limiter around scheduler concurrency ( #2021 )
...
* add rate limiter around scheduler concurrency
* have upper limit
* loadtest should pass now
2025-07-18 08:24:57 -04:00
Jean-Baptiste Souvestre
f08c348710
fix(scheduling): negative weigths ranks were not excluded from the candidate workers pool ( #1941 )
...
Co-authored-by: jbsouvestre <jean-baptiste@ubble.ai >
2025-07-03 09:03:12 -04:00
Mohammed Nafees
ef498a6235
Introduce tenant Prometheus metrics ( #1875 )
...
* introduce tenant workflow completed metric
* expose tenant prom metrics via handler
* fix workflow and worker id in metrics
* correctly add workflow metrics from workflow controller
* use olap DB to gather information for workflow completion
* fix prom metrics endpoint for tenant
* workflow name from external id
* simplify tenant registry based metrics
* add docs for prometheus metrics
* fix docs lint
* run prettier fix
* WIP metrics work
* use federate prom server URL to proxy metrics
* implement workflow duration histogram metric
* separate prom stack docker compose
* fix duration metrics calls
* move scheduler metrics to prom tenant specific file
* update docs for prom metrics
* fix lint
* use proper indices to query for durations
* reorg tenant metrics
* fix lint for doc
* update docs with promql examples and casing around prom metrics enabled
* update prom server url
* fix lint
* enabled prom metrics for v1 only from controller
2025-06-27 11:46:31 -04:00
abelanger5
5c5c1aa5a1
feat: more features in the load testing harness ( #1691 )
...
* fix: make stripped payload size configurable
* feat: more load test features
* Update cmd/hatchet-loadtest/do.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix: try to fix load tests
* increase timeout, update goleak ignores
* fix: data race in scheduler with snapshot input
* fix: logger improvements
* add one more goleak ignore
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-05-07 21:39:30 -04:00
Gabe Ruttner
3f4424b0fc
fix: check candidate slots in bounds ( #1684 )
...
* fix: check candidate slots in bounds
* fix
2025-05-06 15:34:52 -04:00
abelanger5
ffbeafc204
revert: add back testing harness ( #1659 )
...
* re-add new testing harness
* add healthcheck port and pick random grpc port to listen on
* feat: parallel load tests and faster tests
* make parallelism = 5
* fix: lint
* add linter to pre
* fix: add back rampup fixes
* reduce matrix on PR, add matrix to pre-release step
* make load tests less likely to block
* make limit strategy group round robin
* uncomment lines
2025-05-01 15:22:30 -04:00
abelanger5
d047813fd8
fix: randomize concurrency loop ( #1644 )
2025-04-30 07:38:34 -04:00
abelanger5
5084934b40
fix: critical deadlock bug in scheduler ( #1621 )
2025-04-25 21:28:15 -04:00
abelanger5
9aead7ab68
feat: global prometheus metrics ( #1568 )
...
* feat: global prometheus metrics
* configure prom with env vars, clean up metrics
* add histogram and docs
* update port
2025-04-17 15:11:38 -04:00
abelanger5
aebcf0bb0c
fix: boundary conditions on 1-second rate limiters ( #1379 )
2025-03-20 21:44:08 +00:00
abelanger5
21bd707ba6
fix(v1): improved query plans for replay and task outputs, reassignment + timeout tweaks ( #1354 )
...
* don't call parent output task when not necessary
* help query planner by refactoring replay task
* fix: use failed task pathway for reassignments and
timeouts
2025-03-17 14:10:32 -04:00
abelanger5
1f2096313d
feat: v1 engine ( #1318 )
2025-03-11 14:57:13 -04:00