* refactor(ruler): add Rule v2 read type and rename storage Rule to StorableRule
* refactor(ruler): map GettableRule to Rule before responding on v2 routes
* docs(openapi): regenerate spec with RuletypesRule on v2 rules routes
* docs(frontend): regenerate API clients with RuletypesRuleDTO
* refactor(ruler): validate uuid-v7 on delete rule handler
* refactor(ruler): add Enum() on AlertType
* refactor(ruler): convert RepeatType and RepeatOn to valuer.String with Enum()
* refactor(ruler): mark required fields on Recurrence
* refactor(ruler): mark required tag on CumulativeSchedule.Type
* refactor(ruler): rename GettablePlannedMaintenance to PlannedMaintenance
* docs: regenerate OpenAPI spec and frontend clients with tightened schema
* refactor(ruler): add PostablePlannedMaintenance input type with Validate
* refactor(ruler): rename EditPlannedMaintenance to Update and GetAll to List
* refactor(ruler): switch Create/Update to *PostablePlannedMaintenance
* refactor(ruler): convert PlannedMaintenance.Id string to ID valuer.UUID
* refactor(ruler): return *PlannedMaintenance from CreatePlannedMaintenance
* docs: regenerate OpenAPI spec and frontend clients for Postable/ID changes
* refactor(ruler): type PlannedMaintenance.Status as MaintenanceStatus enum
* refactor(ruler): type PlannedMaintenance.Kind as MaintenanceKind enum
* refactor(ruler): mark GettableRule.Id required
* refactor(ruler): mark GettableRule.State required
* refactor(ruler): make GettableRule timestamps non-pointer and users nullable
* refactor(ruler): return bare array from v2 ListRules instead of wrapped object
* docs: regenerate OpenAPI spec and frontend clients for schema pass
* refactor(ruler): define Ruler and Handler interfaces with signozruler implementation
Expand the Ruler interface with rule management and planned maintenance
methods matching rules.Manager signatures. Add Handler interface for
HTTP endpoints. Implement handler in signozruler wrapping ruler.Ruler,
and update provider to embed *rules.Manager for interface satisfaction.
* refactor(ruler): move eval_delay from query-service constants to ruler config
Replace constants.GetEvalDelay() with config.EvalDelay on ruler.Config,
defaulting to 2m. This removes the signozruler dependency on
pkg/query-service/constants.
* refactor(ruler): use time.Duration for eval_delay config
Match the convention used by all other configs in the codebase.
TextDuration is for preserving human-readable text through JSON
round-trips in user-facing rule definitions, not for internal config.
* refactor(ruler): add godoc comments and spacing to Ruler interface
* refactor(ruler): wire ruler handler through signoz.New and signozapiserver
- Add Start/Stop to Ruler interface for lifecycle management
- Add rulerCallback to signoz.New() for EE customization
- Wire ruler.Handler through Handlers, signozapiserver provider
- Register 12 routes in signozapiserver/ruler.go (7 rules, 5 downtime)
- Update cmd/community and cmd/enterprise to pass rulerCallback
- Move rules.Manager creation from server.go to signoz.New via callback
- Change APIHandler.ruleManager type from *rules.Manager to ruler.Ruler
- Remove makeRulesManager from both OSS and EE server.go
* refactor(ruler): remove old rules and downtime_schedules routes from http_handler
Remove 7 rules CRUD routes and 5 downtime_schedules routes plus their
handler methods from http_handler.go. These are now served by
signozapiserver/ruler.go via handler.New() with OpenAPIDef.
The 4 v1 history routes (stats, timeline, top_contributors,
overall_status) remain in http_handler.go as they depend on
interfaces.Reader and have v2 equivalents already in signozapiserver.
* refactor(ruler): use ProviderFactory pattern and register in factory.Registry
Replace the rulerCallback with rulerProviderFactories following the
standard ProviderFactory pattern (like auditorProviderFactories). The
ruler is now created via factory.NewProviderFromNamedMap and registered
in factory.Registry for lifecycle management. Start/Stop are no longer
called manually in server.go.
- Ruler interface embeds factory.Service (Start/Stop return error)
- signozruler.NewFactory accepts all deps including EE task funcs
- provider uses named field (not embedding) with explicit delegation
- cmd/community passes nil task funcs, cmd/enterprise passes EE funcs
- Remove NewRulerProviderFactories (replaced by callback from cmd/)
- Remove manual Start/Stop from both OSS and EE server.go
* fix(ruler): make Start block on stopC per factory.Service contract
rules.Manager.Start is non-blocking (run() just closes a channel).
Add stopC to provider so Start blocks until Stop closes it, matching
the factory.Service contract used by the Registry.
* refactor(ruler): remove unused RM() accessor from EE APIHandler
* refactor(ruler): remove RuleManager from APIHandlerOpts
Use Signoz.Ruler directly instead of passing it through opts.
* refactor(ruler): add /api/v1/rules/test and mark /api/v1/testRule as deprecated
* refactor(ruler): use binding.JSON.BindBody for downtime schedule decode
* refactor(ruler): add TODOs for raw string params on Ruler interface
Mark CreateRule, EditRule, PatchRule, TestNotification, and DeleteRule
with TODOs to accept typed params instead of raw JSON strings. Requires
changing the storage model since the manager stores raw JSON as Data.
* refactor(ruler): add TODO on MaintenanceStore to not expose store directly
* docs: regenerate OpenAPI spec and frontend API clients with ruler routes
* refactor(ruler): rename downtime_schedules tag to downtimeschedules
* refactor(ruler): add query params to ListDowntimeSchedules OpenAPIDef
Add ListPlannedMaintenanceParams struct with active/recurring fields.
Use binding.Query.BindQuery in the handler instead of raw URL parsing.
Add RequestQuery to the OpenAPIDef so params appear in the OpenAPI spec
and generated frontend client.
* refactor(ruler): add GettableTestRule response type to TestRule endpoint
Define GettableTestRule struct with AlertCount and Message fields.
Use it as the Response in TestRule OpenAPIDef so the generated frontend
client has a proper response type instead of string.
* refactor(ruler): tighten schema with oneOf unions and required fields
Surface the polymorphism in RuleThresholdData and EvaluationEnvelope via
JSONSchemaOneOf (the same pattern as QueryEnvelope), so the generated
TS types are discriminated unions with typed `spec` instead of unknown.
Also mark `alert`, `ruleType`, and `condition` required on PostableRule
so the generated TS types are non-optional for callers.
* refactor(ruler): add Enum() on EvaluationKind, ScheduleType, ThresholdKind
Surface the fixed set of accepted values for these valuer-wrapped kind
types so OpenAPI emits proper string-enum schemas and the generated TS
types become string-literal unions instead of plain string.
* refactor(ruler): mark required fields on nested rule and maintenance types
Surface fields already enforced by Validate()/UnmarshalJSON as required
in the OpenAPI schema so the generated TS types match runtime behavior.
Touches RuleCondition (compositeQuery, op, matchType), RuleThresholdData
(kind, spec), BasicRuleThreshold (name, target, op, matchType),
RollingWindow (evalWindow, frequency), CumulativeWindow (schedule,
frequency, timezone), EvaluationEnvelope (kind, spec), Schedule
(timezone), GettablePlannedMaintenance (name, schedule).
Does not mark server-populated fields (id, createdAt, updatedAt, status,
kind) on GettablePlannedMaintenance required, since the same struct is
reused for request bodies in MaintenanceStore.CreatePlannedMaintenance.
* refactor(ruler): tighten AlertCompositeQuery, QueryType, PanelType schema
Missed in the earlier tightening pass. AlertCompositeQuery.queries,
panelType, queryType are all required for a valid composite query;
QueryType and PanelType are valuer-wrapped with fixed value sets, so
expose them as enums in the OpenAPI schema.
* refactor(ruler): wrap sql.ErrNoRows as TypeNotFound in by-ID lookups
GetStoredRule and GetPlannedMaintenanceByID previously returned bun's
raw Scan error, so a missing ID leaked "sql: no rows in result set" to
the HTTP response with a 500 status. WrapNotFoundErrf converts
sql.ErrNoRows into TypeNotFound so render.Error emits 404 with a stable
`not_found` code, and passes other errors through unchanged.
* refactor(ruler): move migrated rules routes to /api/v2/rules
The 7 rules routes now live at /api/v2/rules, /api/v2/rules/{id}, and
/api/v2/rules/test — served via handler.New with render.Success and
render.Error. The legacy /api/v1/rules paths will be restored in the
query-service http handler in a follow-up so existing clients keep
receiving the SuccessResponse envelope unchanged.
Drop the /api/v1/testRule deprecated alias from signozapiserver; the
original lives on main's http_handler.go and is restored alongside the
other v1 paths.
Downtime schedule routes stay at /api/v1/downtime_schedules — single
track, no legacy restore planned.
* refactor(ruler): restore /api/v1/rules legacy handlers for back-compat
Bring the 7 rule CRUD/test handlers and their router.HandleFunc lines
back to http_handler.go so /api/v1/rules, /api/v1/rules/{id}, and
/api/v1/testRule continue to emit the legacy SuccessResponse envelope.
The v2 versions under signozapiserver are the new home for the render
envelope used by generated clients.
Delegation uses aH.ruleManager (populated from opts.Signoz.Ruler in
NewAPIHandler), so a single ruler.Ruler instance serves both paths — no
second rules.Manager is instantiated.
Downtime schedules stay single-track under signozapiserver; the 5
downtime handlers are not restored.
* docs: regenerate OpenAPI spec and frontend clients for /api/v2/rules
* refactor(ruler): return 201 Created on POST /api/v2/rules
A successful create now responds with 201 Created and the full
GettableRule body, matching REST convention for resource creation.
Regenerates the OpenAPI spec and frontend clients to reflect the new
status code.
* refactor(ruler): restore dropped sorter TODO in legacy listRules
The legacy listRules handler was copied verbatim from main during the
v1 back-compat restore, but an inner blank line and the load-bearing
`// todo(amol): need to add sorter` comment were stripped. Put them
back so the legacy block round-trips cleanly against main.
* refactor(ruler): return 201 Created on POST /api/v1/downtime_schedules
Match the REST convention already applied to POST /api/v2/rules:
successful creates respond with 201 Created. Response body remains
empty (nil); the generated frontend client surface is unchanged since
no response type was declared.
A richer "return the created resource" response body is a separate
follow-up — holding off until the ruletypes naming cleanup lands.
* fix(ruler): signal Healthy only after manager.Start closes m.block
The ruler provider didn't implement factory.Healthy, so the registry
fell back to factory.closedC and marked the service StateRunning the
instant its Start goroutine spawned — before rules.Manager.Start had
closed m.block. /api/v2/healthz therefore returned 200 while rule
evaluation was still gated, and integration tests that POSTed a rule
immediately after the readiness check saw their task goroutines stuck
on <-m.block until the next frequency tick.
Add a healthyC channel and close it inside Start only after
manager.Start returns; implement factory.Healthy so the registry and
/api/v2/healthz wait on the real readiness signal.
* fix: add the withhealthy interface
* fix(ruler): alias legacy RULES_EVAL_DELAY env var in backward-compat
The eval_delay config was moved from query-service constants (read from
RULES_EVAL_DELAY) onto ruler.Config (read via mapstructure from
SIGNOZ_RULER_EVAL__DELAY). That silently broke the legacy env var for
any existing deployment — notably the alerts integration-test fixture
which sets RULES_EVAL_DELAY=0s to let rules evaluate against just-
inserted data. The resulting default 2m delay pushed the query window
far enough back that the fixture's rate spike fell outside it, causing
8 of 24 parametrize cases in 02_basic_alert_conditions.py to fail with
"Expected N alerts to be fired but got 0 alerts".
Add RULES_EVAL_DELAY to mergeAndEnsureBackwardCompatibility alongside
the ~10 other aliased legacy env vars. Emits the standard deprecation
warning and overrides config.Ruler.EvalDelay.
* feat(instrumentation): add OTel exception semantic convention log handler
Add a loghandler.Wrapper that enriches error log records with OpenTelemetry
exception semantic convention attributes (exception.type, exception.code,
exception.message, exception.stacktrace).
- Add errors.Attr() helper for standardized error logging under "exception" key
- Add exception log handler that replaces raw error attrs with structured group
- Wire exception handler into the instrumentation SDK logger chain
- Remove LogValue() from errors.base as the handler now owns structuring
* refactor: replace "error", err with errors.Attr(err) across codebase
Migrate all slog error logging from ad-hoc "error", err key-value pairs
to the standardized errors.Attr(err) helper, enabling the exception log
handler to enrich these logs with OTel semantic convention attributes.
* refactor: enforce attr-only slog style across codebase
Change sloglint from kv-only to attr-only, requiring all slog calls to
use typed attributes (slog.String, slog.Any, etc.) instead of key-value
pairs. Convert all existing kv-style slog calls in non-excluded paths.
* refactor: tighten slog.Any to specific types and standardize error attrs
- Replace slog.Any with slog.String for string values (action, key, where_clause)
- Replace slog.Any with slog.Uint64 for uint64 values (start, end, step, etc.)
- Replace slog.Any("err", err) with errors.Attr(err) in dispatcher and segment analytics
- Replace slog.Any("error", ctx.Err()) with errors.Attr in factory registry
* fix(instrumentation): use Unwrapb message for exception.message
Use the explicit error message (m) from Unwrapb instead of
foundErr.Error(), which resolves to the inner cause's message
for wrapped errors.
* feat(errors): capture stacktrace at error creation time
Store program counters ([]uintptr) in base errors at creation time
using runtime.Callers, inspired by thanos-io/thanos/pkg/errors. The
exception log handler reads the stacktrace from the error instead of
capturing at log time, showing where the error originated.
* fix(instrumentation): apply default log wrappers uniformly in NewLogger
Move correlation, filtering, and exception wrappers into NewLogger so
all call sites (including CLI loggers in cmd/) get them automatically.
* refactor(instrumentation): remove variadic wrappers from NewLogger
NewLogger no longer accepts arbitrary wrappers. The core wrappers
(correlation, filtering, exception) are hardcoded, preventing callers
from accidentally duplicating behavior.
* refactor: migrate remaining "error", <var> to errors.Attr across legacy paths
Replace all remaining "error", <variable> key-value pairs with
errors.Attr(<variable>) in pkg/query-service/ and ee/query-service/
paths that were missed in the initial migration due to non-standard
variable names (res.Err, filterErr, apiErrorObj.Err, etc).
* refactor(instrumentation): use flat exception.* keys instead of nested group
Use flat keys (exception.type, exception.code, exception.message,
exception.stacktrace) instead of a nested slog.Group in the exception
log handler.
* fix: fixed edit and patch rule functionality
* fix: fixed edit and patch rule functionality
* fix: fixed edit and patch rule functionality
* fix: added patch rule test and rule mock store
* fix: removed schema version field
* fix: removed schema version field
* fix: added test cases for patch, create, edit
* fix: removed schema version field
* chore(linter): add more linters and deprecate zap
* chore(linter): add more linters and deprecate zap
* chore(linter): add more linters and deprecate zap
* chore(linter): add more linters and deprecate zap
* chore(savedview): refactor into module and handler
* chore(rule): move telemetry inside telemetry
* chore(apdex): refactor apdex and delete dao
* chore(dashboard): create a dashboard module
* chore(ee): get rid of the init nonesense
* chore(dashboard): fix err and apierror confusion
* chore: address comments
* feat(ruler): base setup for rules and planned maintenance tables
* feat(ruler): more changes for making ruler org aware
* feat(ruler): fix lint
* feat(ruler): update the edit planned maintenance function
* feat(ruler): local testing edits for planned maintenance
* feat(ruler): abstract store and types from rules pkg
* feat(ruler): abstract store and types from rules pkg
* feat(ruler): abstract out store and add migration
* feat(ruler): frontend changes and review comments
* feat(ruler): add back compareAndSelectConfig
* feat(ruler): changes for alertmanager matchers
* feat(ruler): addressed review comments
* feat(ruler): remove the cascade operations from rules table
* feat(ruler): update the template for alertmanager
* feat(ruler): implement the rule history changes
* feat(ruler): implement the rule history changes
* feat(ruler): implement the rule history changes
---------
Co-authored-by: Vibhu Pandey <vibhupandey28@gmail.com>