align docs with feature

2025-12-21 08:40:10 -06:00 · 2025-12-16 08:39:03 -05:00
parent d7a891acdf
commit ab7371bb1f
1 changed files with 41 additions and 40 deletions
--- a/frontend/docs/pages/home/batch-assignment.mdx
+++ b/frontend/docs/pages/home/batch-assignment.mdx
@@ -9,37 +9,38 @@ import { Callout } from "nextra/components";

 Batch assignment lets Hatchet buffer individual task invocations and flush them
 as configurable batches for a worker to process together. This is ideal when
-workloads benefit from shared setup (e.g. vectorized work with shared memory, batched external API calls, or upstream rate limits
-that prefer grouped delivery). The scheduler tracks batches per task, worker, and
-batch key so that tasks are still isolated when necessary.
+workloads benefit from shared setup (e.g. vectorized work with shared memory,
+batched external API calls, or upstream rate limits that prefer grouped
+delivery). Hatchet batches per **step** and **batch key**, then assigns a flushed
+batch to a single worker.

 ## Configuring a batched task

-A step becomes batchable when you set one or more of the following properties in
-its definition (for example via workflow manifests or SDK helpers):
+To enable batching for a step, set its **batch configuration** (via workflow
+definitions/manifests or SDK helpers). The current fields are:

-// TODO should probably default to 2?
+- `batch_size` (required): the maximum number of items to flush at once. Must be
+  at least `1`.
+- `flush_interval_ms` (optional): the maximum time (in milliseconds) an item can
+  wait before a flush is triggered. **Defaults to `1000`.**
+- `batch_key` (optional): a CEL expression that groups tasks into logical
+  batches. Tasks with different keys are always processed separately. **Defaults
+  to `'default'`.**
+- `max_runs` (optional): the maximum number of concurrently running batches per
+  key. Additional batches wait for an earlier batch with the same key to finish.
+  **Defaults to `100`.**

- `batch_size`: the maximum number of items to flush at once. Defaults to `1`
-  and is clamped to at least one.
- `batch_flush_interval_ms` (optional): the longest time (in milliseconds) that the buffer
-  will wait before flushing even if the batch size has not been reached.
- `batch_key_expression` (optional): a CEL expression that groups tasks into logical
-  batches. Tasks with different keys are always processed separately.
- `batch_max_runs` (optional): the maximum number of active batches per key. Additional
-  tasks wait for a previous batch to complete before another begins.
-
-If you set only `batch_size`, the scheduler flushes as soon as the buffer fills
-up. Adding `batch_flush_interval_ms` guarantees bounded latency for sporadic
-traffic. The other options are optional but give you finer control over
-grouping and concurrency.
+`batch_size` controls throughput; `flush_interval_ms` bounds latency for
+sporadic traffic; `batch_key` controls partitioning; `max_runs` bounds batch
+concurrency per key.

 ## Supplying a batch key

-The batch key expression runs with the same context that powers other CEL-based
-features in Hatchet. It can reference `input`, `additionalMetadata`,
-`workflowRunId`, and parent trigger data. The expression must return a string
-after trimming whitespace. An empty result will group as a single batch.
+The batch key expression runs with the same CEL context used by other Hatchet
+features. It can reference `input`, `additionalMetadata`, `workflowRunId`, and
+parent trigger data. The expression must evaluate to a **string**; Hatchet
+trims whitespace. An empty/whitespace result falls back to the default key
+(`'default'`).

 ```cel
 // group by tenant and logical bucket
@@ -47,27 +48,25 @@ input.customerId + ":" + string(additionalMetadata.bucket)
 ```

 If the expression fails to evaluate or produces an unsupported type, Hatchet
-rejects the task creation to keep buffering logic predictable. Use CEL tooling
+rejects task creation to keep buffering logic predictable. Use CEL tooling
 locally to validate expressions before deploying.

 ## Worker execution model

 Workers that opt into batchable tasks receive an array of inputs instead of a
 single payload. SDKs expose this through dedicated helpers (for example,
-`hatchet.batchTask` in the TypeScript SDK) that pass the aggregated inputs to
-your handler. Each element in the array carries the original task metadata, and
-individual items continue to respect per-task retry policies.
-
-{/* TODO add snippet */}
+`hatchet.batchTask` in the TypeScript SDK) that pass the aggregated inputs (and
+per-item contexts) to your handler. Items continue to respect per-task retry
+policies; batching changes how tasks are *delivered*, not how they’re retried.

 ```ts
 const batch = hatchet.batchTask({
  name: "simple",
  batchSize: 2,
-  flushInterval: 1_000,
-  batchKey: "input.batchId",
-  maxRuns: 2,
-  fn: async (inputs) => {
+  flushInterval: 1_000, // defaults to 1000ms if omitted
+  batchKey: "input.batchId", // defaults to `'default'` if omitted
+  maxRuns: 100, // defaults to 100 if omitted
+  fn: async (inputs, ctxs) => {
    return inputs.map((input, index) => ({
      message: `${input.Message.toLowerCase()}#${index}`,
    }));
@@ -75,15 +74,16 @@ const batch = hatchet.batchTask({
 });
 ```

-When a batch flushes, Hatchet records the `batchId`, `batchSize`, and index for
-each task in its runtime metadata. This data becomes available via the API and
-analytics endpoints so you can trace which invocations ran together.
+When a batch flushes, Hatchet records `batchId`, `batchSize`, `batchIndex`, and
+`batchKey` on each task’s runtime metadata. This is available via the API (and
+SDK contexts) so you can trace which invocations ran together.

 ## Observability and lifecycle

 Hatchet emits dedicated events while tasks wait for a batch (`WAITING_FOR_BATCH`
 and `BATCH_BUFFERED`) and once a flush is triggered (`BATCH_FLUSHED`). These
-events surface in the run timeline on the dashboard.
+events surface in the run timeline on the dashboard. `BATCH_FLUSHED` includes a
+reason (for example `batch_size_reached` or `interval_elapsed`).

 Batch runs are reference counted inside the repository. When all tasks in a
 batch complete (successfully or by cancellation), Hatchet automatically releases
@@ -92,6 +92,7 @@ the next batch.

 ## Notes on evaluation order

-Concurrency, rate limiting, sticky assignment, manual slot releases, and other advanced features continue
-to work alongside batching; Hatchet will evaluate task configuration first and then when able to queue, simply batches
-the tasks per unique combination of worker, task, and batch key.
+Concurrency, rate limiting, sticky assignment, manual slot releases, and other
+advanced features continue to work alongside batching. Hatchet evaluates task
+configuration first; when tasks are eligible to be scheduled, they are buffered
+and flushed per **batch key**.