Feat: Python SDK Documentation, Part I (#1567)

* feat: initial mkdocs setup * chore: lock * fix: config + start getting docs working * fix: remove lots more redundant :type docs, update config more * feat: split up clients * feat: add pydoclint * fix: rm defaults from docstrings * fix: pydoclint errors * feat: run pydoclint in ci * fix: lint on 3.13 * debug: try explicit config path * fix: ignore venv * feat: index, styling * fix: rm footer * fix: more style tweaks * feat: generated docs * fix: refactor a bit * fix: regen * Revert "fix: regen" This reverts commit 7f66adc77840ad96d0eafe55c8dd467f71eb50fb. * feat: improve prompting * feat: add docs, modify theme config to enable toc for docs * fix: lint * fix: lint * feat: regenerate * feat: bs4 for html parsing * feat: preview correctly * fix: exclude site subdir from all the linters * refactor: break up script into components * feat: remove a bunch more stuff from the html * feat: prettier, enable toc * fix: enable tocs in more places + sort properly * fix: code blocks, ordering * fix: ordering * feat: finish up feature clients * fix: rm unused deps * fix: routing + property tags + sidebar * fix: hatchet client + formatting * fix: allow selecting single set of files * fix: lint * rm: cruft * fix: naming * fix: runs client attrs * fix: rm cruft page * feat: internal linking + top level description * [Python]: Fixing some more issues (#1573) * fix: pass priority through from the task * fix: improve eof handling slightly * chore: version * fix: improve eof handling * fix: send prio from durable * fix: naming * cleanup: use a variable * chore: version * feat: comment explaining page depth thing * chore: bump ver * feat: standalone docs * fix: prompting + heading levels
2026-05-05 09:09:25 -05:00 · 2025-04-18 15:34:07 -04:00
parent 8de5cea480
commit c8f56e0872
68 changed files with 4370 additions and 531 deletions
@@ -0,0 +1,24 @@
+# Hatchet Python SDK Reference
+
+This is the Python SDK reference, documenting methods available for interacting with Hatchet resources. Check out the [user guide](https://docs.hatchet.run/home) for an introduction for getting your first tasks running
+
+## The Hatchet Python Client
+
+::: hatchet.Hatchet
+    options:
+      members:
+        - cron
+        - event
+        - logs
+        - metrics
+        - rate_limits
+        - runs
+        - scheduled
+        - workers
+        - workflows
+        - tenant_id
+        - namespace
+        - worker
+        - workflow
+        - task
+        - durable_task
@@ -0,0 +1,3 @@
+# Cron Client
+
+::: features.cron.CronClient
@@ -0,0 +1,3 @@
+# Logs Client
+
+::: features.logs.LogsClient
@@ -0,0 +1,3 @@
+# Metrics Client
+
+::: features.metrics.MetricsClient
@@ -0,0 +1,3 @@
+# Rate Limits Client
+
+::: features.rate_limits.RateLimitsClient
@@ -0,0 +1,22 @@
+# Runs Client
+
+::: features.runs.RunsClient
+    options:
+      members:
+        - get
+        - aio_get
+        - list
+        - aio_list
+        - create
+        - aio_create
+        - replay
+        - aio_replay
+        - bulk_replay
+        - aio_bulk_replay
+        - cancel
+        - aio_cancel
+        - bulk_cancel
+        - aio_bulk_cancel
+        - get_result
+        - aio_get_result
+        - get_run_ref
@@ -0,0 +1,3 @@
+# Scheduled Client
+
+::: features.scheduled.ScheduledClient
@@ -0,0 +1,3 @@
+# Workers Client
+
+::: features.workers.WorkersClient
@@ -0,0 +1,3 @@
+# Workflows Client
+
+::: features.workflows.WorkflowsClient
@@ -0,0 +1,115 @@
+import argparse
+import asyncio
+import os
+from typing import cast
+
+from docs.generator.llm import parse_markdown
+from docs.generator.paths import crawl_directory, find_child_paths
+from docs.generator.shared import TMP_GEN_PATH
+from docs.generator.types import Document
+from docs.generator.utils import gather_max_concurrency, rm_rf
+
+
+async def clean_markdown_with_openai(document: Document) -> None:
+    print("Generating mdx for", document.readable_source_path)
+
+    with open(document.source_path, "r", encoding="utf-8") as f:
+        original_md = f.read()
+
+    content = await parse_markdown(original_markdown=original_md)
+
+    if not content:
+        return None
+
+    with open(document.mdx_output_path, "w", encoding="utf-8") as f:
+        f.write(content)
+
+
+def generate_sub_meta_entry(child: str) -> str:
+    child = child.replace("/", "")
+    return f"""
+        "{child}": {{
+            "title": "{child.replace("-", " ").title()}",
+            "theme": {{
+                "toc": true
+            }},
+        }},
+    """
+
+
+def generate_meta_js(docs: list[Document], children: set[str]) -> str:
+    prefix = docs[0].directory
+    subentries = [doc.meta_js_entry for doc in docs] + [
+        generate_sub_meta_entry(child.replace(prefix, "")) for child in children
+    ]
+
+    sorted_subentries = sorted(
+        subentries,
+        key=lambda x: x.strip().split(":")[0].strip('"').lower(),
+    )
+
+    entries = "".join(sorted_subentries)
+
+    return f"export default {{{entries}}}"
+
+
+def update_meta_js(documents: list[Document]) -> None:
+    meta_js_out_paths = {d.mdx_output_meta_js_path for d in documents}
+
+    for path in meta_js_out_paths:
+        relevant_documents = [d for d in documents if d.mdx_output_meta_js_path == path]
+
+        exemplar = relevant_documents[0]
+
+        directory = exemplar.directory
+
+        children = find_child_paths(directory, documents)
+
+        meta = generate_meta_js(relevant_documents, children)
+
+        out_path = exemplar.mdx_output_meta_js_path
+
+        with open(out_path, "w", encoding="utf-8") as f:
+            f.write(meta)
+
+
+async def run(selections: list[str]) -> None:
+    rm_rf(TMP_GEN_PATH)
+
+    try:
+        os.system("poetry run mkdocs build")
+        documents = crawl_directory(TMP_GEN_PATH, selections)
+
+        await gather_max_concurrency(
+            *[clean_markdown_with_openai(d) for d in documents], max_concurrency=10
+        )
+
+        if not selections:
+            update_meta_js(documents)
+
+        os.chdir("../../frontend/docs")
+        os.system("pnpm lint:fix")
+    finally:
+        rm_rf("docs/site")
+        rm_rf("site")
+        rm_rf(TMP_GEN_PATH)
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--select",
+        nargs="*",
+        type=str,
+        help="Select a subset of docs to generate. Note that this will prevent the `_meta.js` file from being generated.",
+    )
+
+    args = parser.parse_args()
+
+    selections = cast(list[str], args.select or [])
+
+    asyncio.run(run(selections))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,20 @@
+from openai import AsyncOpenAI
+from pydantic_settings import BaseSettings
+
+from docs.generator.prompts import create_prompt_messages
+
+
+class Settings(BaseSettings):
+    openai_api_key: str = "fake-key"
+
+
+settings = Settings()
+client = AsyncOpenAI(api_key=settings.openai_api_key)
+
+
+async def parse_markdown(original_markdown: str) -> str | None:
+    response = await client.chat.completions.create(
+        model="gpt-4o", messages=create_prompt_messages(original_markdown)
+    )
+
+    return response.choices[0].message.content
@@ -0,0 +1,147 @@
+import os
+from typing import cast
+
+from bs4 import BeautifulSoup, Tag
+from markdownify import markdownify  # type: ignore[import-untyped]
+from mkdocs.config.defaults import MkDocsConfig
+from mkdocs.plugins import BasePlugin
+from mkdocs.structure.pages import Page
+
+from docs.generator.shared import TMP_GEN_PATH
+
+
+class MarkdownExportPlugin(BasePlugin):  # type: ignore
+    def __init__(self) -> None:
+        super().__init__()
+        self.soup: BeautifulSoup
+        self.page_source_path: str
+
+    def _remove_async_tags(self) -> "MarkdownExportPlugin":
+        spans = self.soup.find_all("span", class_="doc doc-labels")
+
+        for span in spans:
+            if span.find(string="async") or (
+                span.text and "async" == span.get_text().strip()
+            ):
+                span.decompose()
+
+        return self
+
+    def _remove_hash_links(self) -> "MarkdownExportPlugin":
+        links = self.soup.find_all("a", class_="headerlink")
+        for link in links:
+            href = cast(str, link["href"])
+            if href.startswith("#"):
+                link.decompose()
+
+        return self
+
+    def _remove_toc(self) -> "MarkdownExportPlugin":
+        tocs = self.soup.find_all("nav")
+        for toc in tocs:
+            toc.decompose()
+
+        return self
+
+    def _remove_footer(self) -> "MarkdownExportPlugin":
+        footer = self.soup.find("footer")
+        if footer and isinstance(footer, Tag):
+            footer.decompose()
+
+        return self
+
+    def _remove_navbar(self) -> "MarkdownExportPlugin":
+        navbar = self.soup.find("div", class_="navbar")
+        if navbar and isinstance(navbar, Tag):
+            navbar.decompose()
+
+        navbar_header = self.soup.find("div", class_="navbar-header")
+        if navbar_header and isinstance(navbar_header, Tag):
+            navbar_header.decompose()
+        navbar_collapse = self.soup.find("div", class_="navbar-collapse")
+        if navbar_collapse and isinstance(navbar_collapse, Tag):
+            navbar_collapse.decompose()
+
+        return self
+
+    def _remove_keyboard_shortcuts_modal(self) -> "MarkdownExportPlugin":
+        modal = self.soup.find("div", id="mkdocs_keyboard_modal")
+
+        if modal and isinstance(modal, Tag):
+            modal.decompose()
+
+        return self
+
+    def _remove_title(self) -> "MarkdownExportPlugin":
+        title = self.soup.find("h1", class_="title")
+
+        if title and isinstance(title, Tag):
+            title.decompose()
+
+        return self
+
+    def _remove_property_tags(self) -> "MarkdownExportPlugin":
+        property_tags = self.soup.find_all("code", string="property")
+
+        for tag in property_tags:
+            tag.decompose()
+
+        return self
+
+    def _interpolate_docs_links(self) -> "MarkdownExportPlugin":
+        links = self.soup.find_all("a")
+        page_depth = self.page_source_path.count("/")
+
+        ## Using the depth + 2 here because the links are relative to the root of
+        ## the SDK docs subdir, which sits at `/sdks/python` (two levels below the root)
+        dirs_up_prefix = "../" * (page_depth + 2)
+
+        for link in links:
+            href = link.get("href")
+
+            if not href:
+                continue
+
+            href = cast(str, link["href"])
+
+            if href.startswith("https://docs.hatchet.run/"):
+                link["href"] = href.replace("https://docs.hatchet.run/", dirs_up_prefix)
+
+        return self
+
+    def _preprocess_html(self, content: str) -> str:
+        self.soup = BeautifulSoup(content, "html.parser")
+
+        (
+            self._remove_async_tags()
+            ._remove_hash_links()
+            ._remove_toc()
+            ._remove_footer()
+            ._remove_keyboard_shortcuts_modal()
+            ._remove_navbar()
+            ._remove_title()
+            ._remove_property_tags()
+            ._interpolate_docs_links()
+        )
+
+        return str(self.soup)
+
+    def on_post_page(
+        self, output_content: str, page: Page, config: MkDocsConfig
+    ) -> str:
+        self.page_source_path = page.file.src_uri
+
+        content = self._preprocess_html(output_content)
+        md_content = markdownify(content, heading_style="ATX", wrap=False)
+
+        if not md_content:
+            return content
+
+        dest = os.path.splitext(page.file.dest_path)[0] + ".md"
+        out_path = os.path.join(TMP_GEN_PATH, dest)
+        os.makedirs(os.path.dirname(out_path), exist_ok=True)
+
+        with open(out_path, "w", encoding="utf-8") as f:
+            f.write(md_content)
+
+        return content
@@ -0,0 +1,24 @@
+import os
+
+from docs.generator.types import Document
+
+
+def crawl_directory(directory: str, only_include: list[str]) -> list[Document]:
+    return [
+        d
+        for root, _, filenames in os.walk(directory)
+        for filename in filenames
+        if (d := Document.from_path(os.path.join(root, filename))).readable_source_path
+        in only_include
+        or not only_include
+    ]
+
+
+def find_child_paths(prefix: str, docs: list[Document]) -> set[str]:
+    return {
+        doc.directory
+        for doc in docs
+        if doc.directory.startswith(prefix)
+        and doc.directory != prefix
+        and doc.directory.count("/") == prefix.count("/") + 1
+    }
@@ -0,0 +1,36 @@
+from typing import ParamSpec, TypeVar, cast
+
+from openai.types.chat import (
+    ChatCompletionMessageParam,
+    ChatCompletionSystemMessageParam,
+    ChatCompletionUserMessageParam,
+)
+
+T = TypeVar("T")
+P = ParamSpec("P")
+R = TypeVar("R")
+
+
+SYSTEM_PROMPT = """
+You're an SDK documentation expert working on improving the readability of Hatchet's Python SDK documentation. You will be given
+a markdown file, and your task is to fix any broken MDX so it can be used as a page on our Nextra documentation site.
+
+In your work, follow these instructions:
+
+1. Strip any unnecessary paragraph characters, but do not change any actual code, sentences, or content. You should keep the documentation as close to the original as possible, meaning that you should not generate new content, you should not consolidate existing content, you should not rearrange content, and so on.
+2. Return only the content. You should not enclode the markdown in backticks or any other formatting.
+3. You must ensure that MDX will render any tables correctly. One thing in particular to be on the lookout for is the use of the pipe `|` in type hints in the tables. For example, `int | None` is the Python type `Optional[int]` and should render in a single column with an escaped pipe character.
+4. All code blocks should be formatted as `python`.
+"""
+
+
+def create_prompt_messages(
+    user_prompt_content: str,
+) -> list[ChatCompletionMessageParam]:
+    return cast(
+        list[ChatCompletionMessageParam],
+        [
+            ChatCompletionSystemMessageParam(content=SYSTEM_PROMPT, role="system"),
+            ChatCompletionUserMessageParam(content=user_prompt_content, role="user"),
+        ],
+    )
@@ -0,0 +1 @@
+TMP_GEN_PATH = "/tmp/hatchet-python/docs/gen"
@@ -0,0 +1,67 @@
+import os
+import re
+
+from pydantic import BaseModel
+
+from docs.generator.shared import TMP_GEN_PATH
+
+FRONTEND_DOCS_RELATIVE_PATH = "../../frontend/docs/pages/sdks/python"
+
+MD_EXTENSION = "md"
+MDX_EXTENSION = "mdx"
+PY_EXTENSION = "py"
+
+
+class Document(BaseModel):
+    source_path: str
+    readable_source_path: str
+    mdx_output_path: str
+    mdx_output_meta_js_path: str
+
+    is_index: bool
+
+    directory: str
+    basename: str
+
+    title: str = ""
+    meta_js_entry: str = ""
+
+    @staticmethod
+    def from_path(path: str) -> "Document":
+        # example path /tmp/hatchet-python/docs/gen/runnables.md
+
+        basename = os.path.splitext(os.path.basename(path))[0]
+
+        is_index = basename == "index"
+
+        title = (
+            "Introduction"
+            if is_index
+            else re.sub(
+                "[^0-9a-zA-Z ]+", "", basename.replace("_", " ").replace("-", " ")
+            ).title()
+        )
+
+        mdx_out_path = path.replace(
+            TMP_GEN_PATH, "../../frontend/docs/pages/sdks/python"
+        )
+        mdx_out_dir = os.path.dirname(mdx_out_path)
+
+        return Document(
+            directory=os.path.dirname(path).replace(TMP_GEN_PATH, ""),
+            basename=basename,
+            title=title,
+            meta_js_entry=f"""
+                "{basename}": {{
+                    "title": "{title}",
+                    "theme": {{
+                        "toc": true,
+                    }}
+                }},
+            """,
+            source_path=path,
+            readable_source_path=path.replace(TMP_GEN_PATH, "")[1:],
+            mdx_output_path=mdx_out_path.replace(".md", ".mdx"),
+            mdx_output_meta_js_path=mdx_out_dir + "/_meta.js",
+            is_index=basename == "index",
+        )
@@ -0,0 +1,39 @@
+import asyncio
+import shutil
+from typing import Coroutine, ParamSpec, TypeVar
+
+from openai import AsyncOpenAI
+from pydantic_settings import BaseSettings
+
+T = TypeVar("T")
+P = ParamSpec("P")
+R = TypeVar("R")
+
+
+class Settings(BaseSettings):
+    openai_api_key: str = "fake-key"
+
+
+settings = Settings()
+client = AsyncOpenAI(api_key=settings.openai_api_key)
+
+
+async def gather_max_concurrency(
+    *tasks: Coroutine[None, None, T],
+    max_concurrency: int,
+) -> list[T]:
+    """asyncio.gather with cap on subtasks executing at once."""
+    sem = asyncio.Semaphore(max_concurrency)
+
+    async def task_wrapper(task: Coroutine[None, None, T]) -> T:
+        async with sem:
+            return await task
+
+    return await asyncio.gather(
+        *(task_wrapper(task) for task in tasks),
+        return_exceptions=False,
+    )
+
+
+def rm_rf(path: str) -> None:
+    shutil.rmtree(path, ignore_errors=True)
@@ -0,0 +1,52 @@
+# Runnables
+
+`Runnables` in the Hatchet SDK are things that can be run, namely tasks and workflows. The two main types of runnables you'll encounter are:
+
+* `Workflow`, which lets you define tasks and call all of the run, schedule, etc. methods
+* `Standalone`, which is a single task that's returned by `hatchet.task` and can be run, scheduled, etc.
+
+## Workflow
+
+::: runnables.workflow.Workflow
+    options:
+      members:
+        - task
+        - durable_task
+        - on_failure_task
+        - on_success_task
+        - run
+        - aio_run
+        - run_no_wait
+        - aio_run_no_wait
+        - run_many
+        - aio_run_many
+        - run_many_no_wait
+        - aio_run_many_no_wait
+        - schedule
+        - aio_schedule
+        - create_cron
+        - aio_create_cron
+        - create_bulk_run_item
+        - name
+        - tasks
+        - is_durable
+
+## Standalone
+
+::: runnables.standalone.Standalone
+    options:
+      members:
+        - run
+        - aio_run
+        - run_no_wait
+        - aio_run_no_wait
+        - run_many
+        - aio_run_many
+        - run_many_no_wait
+        - aio_run_many_no_wait
+        - schedule
+        - aio_schedule
+        - create_cron
+        - aio_create_cron
+        - create_bulk_run_item
+        - is_durable
				`@@ -0,0 +1 @@`
				`TMP_GEN_PATH = "/tmp/hatchet-python/docs/gen"`