Feat: Python SDK Documentation, Part I (#1567)

* feat: initial mkdocs setup

* chore: lock

* fix: config + start getting docs working

* fix: remove lots more redundant :type docs, update config more

* feat: split up clients

* feat: add pydoclint

* fix: rm defaults from docstrings

* fix: pydoclint errors

* feat: run pydoclint in ci

* fix: lint on 3.13

* debug: try explicit config path

* fix: ignore venv

* feat: index, styling

* fix: rm footer

* fix: more style tweaks

* feat: generated docs

* fix: refactor a bit

* fix: regen

* Revert "fix: regen"

This reverts commit 7f66adc77840ad96d0eafe55c8dd467f71eb50fb.

* feat: improve prompting

* feat: add docs, modify theme config to enable toc for docs

* fix: lint

* fix: lint

* feat: regenerate

* feat: bs4 for html parsing

* feat: preview correctly

* fix: exclude site subdir from all the linters

* refactor: break up script into components

* feat: remove a bunch more stuff from the html

* feat: prettier, enable toc

* fix: enable tocs in more places + sort properly

* fix: code blocks, ordering

* fix: ordering

* feat: finish up feature clients

* fix: rm unused deps

* fix: routing + property tags + sidebar

* fix: hatchet client + formatting

* fix: allow selecting single set of files

* fix: lint

* rm: cruft

* fix: naming

* fix: runs client attrs

* fix: rm cruft page

* feat: internal linking + top level description

* [Python]: Fixing some more issues (#1573)

* fix: pass priority through from the task

* fix: improve eof handling slightly

* chore: version

* fix: improve eof handling

* fix: send prio from durable

* fix: naming

* cleanup: use a variable

* chore: version

* feat: comment explaining page depth thing

* chore: bump ver

* feat: standalone docs

* fix: prompting + heading levels
This commit is contained in:
Matt Kaye
2025-04-18 15:34:07 -04:00
committed by GitHub
parent 8de5cea480
commit c8f56e0872
68 changed files with 4370 additions and 531 deletions
View File
+24
View File
@@ -0,0 +1,24 @@
# Hatchet Python SDK Reference
This is the Python SDK reference, documenting methods available for interacting with Hatchet resources. Check out the [user guide](https://docs.hatchet.run/home) for an introduction for getting your first tasks running
## The Hatchet Python Client
::: hatchet.Hatchet
options:
members:
- cron
- event
- logs
- metrics
- rate_limits
- runs
- scheduled
- workers
- workflows
- tenant_id
- namespace
- worker
- workflow
- task
- durable_task
+3
View File
@@ -0,0 +1,3 @@
# Cron Client
::: features.cron.CronClient
+3
View File
@@ -0,0 +1,3 @@
# Logs Client
::: features.logs.LogsClient
@@ -0,0 +1,3 @@
# Metrics Client
::: features.metrics.MetricsClient
@@ -0,0 +1,3 @@
# Rate Limits Client
::: features.rate_limits.RateLimitsClient
+22
View File
@@ -0,0 +1,22 @@
# Runs Client
::: features.runs.RunsClient
options:
members:
- get
- aio_get
- list
- aio_list
- create
- aio_create
- replay
- aio_replay
- bulk_replay
- aio_bulk_replay
- cancel
- aio_cancel
- bulk_cancel
- aio_bulk_cancel
- get_result
- aio_get_result
- get_run_ref
@@ -0,0 +1,3 @@
# Scheduled Client
::: features.scheduled.ScheduledClient
@@ -0,0 +1,3 @@
# Workers Client
::: features.workers.WorkersClient
@@ -0,0 +1,3 @@
# Workflows Client
::: features.workflows.WorkflowsClient
+115
View File
@@ -0,0 +1,115 @@
import argparse
import asyncio
import os
from typing import cast
from docs.generator.llm import parse_markdown
from docs.generator.paths import crawl_directory, find_child_paths
from docs.generator.shared import TMP_GEN_PATH
from docs.generator.types import Document
from docs.generator.utils import gather_max_concurrency, rm_rf
async def clean_markdown_with_openai(document: Document) -> None:
print("Generating mdx for", document.readable_source_path)
with open(document.source_path, "r", encoding="utf-8") as f:
original_md = f.read()
content = await parse_markdown(original_markdown=original_md)
if not content:
return None
with open(document.mdx_output_path, "w", encoding="utf-8") as f:
f.write(content)
def generate_sub_meta_entry(child: str) -> str:
child = child.replace("/", "")
return f"""
"{child}": {{
"title": "{child.replace("-", " ").title()}",
"theme": {{
"toc": true
}},
}},
"""
def generate_meta_js(docs: list[Document], children: set[str]) -> str:
prefix = docs[0].directory
subentries = [doc.meta_js_entry for doc in docs] + [
generate_sub_meta_entry(child.replace(prefix, "")) for child in children
]
sorted_subentries = sorted(
subentries,
key=lambda x: x.strip().split(":")[0].strip('"').lower(),
)
entries = "".join(sorted_subentries)
return f"export default {{{entries}}}"
def update_meta_js(documents: list[Document]) -> None:
meta_js_out_paths = {d.mdx_output_meta_js_path for d in documents}
for path in meta_js_out_paths:
relevant_documents = [d for d in documents if d.mdx_output_meta_js_path == path]
exemplar = relevant_documents[0]
directory = exemplar.directory
children = find_child_paths(directory, documents)
meta = generate_meta_js(relevant_documents, children)
out_path = exemplar.mdx_output_meta_js_path
with open(out_path, "w", encoding="utf-8") as f:
f.write(meta)
async def run(selections: list[str]) -> None:
rm_rf(TMP_GEN_PATH)
try:
os.system("poetry run mkdocs build")
documents = crawl_directory(TMP_GEN_PATH, selections)
await gather_max_concurrency(
*[clean_markdown_with_openai(d) for d in documents], max_concurrency=10
)
if not selections:
update_meta_js(documents)
os.chdir("../../frontend/docs")
os.system("pnpm lint:fix")
finally:
rm_rf("docs/site")
rm_rf("site")
rm_rf(TMP_GEN_PATH)
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument(
"--select",
nargs="*",
type=str,
help="Select a subset of docs to generate. Note that this will prevent the `_meta.js` file from being generated.",
)
args = parser.parse_args()
selections = cast(list[str], args.select or [])
asyncio.run(run(selections))
if __name__ == "__main__":
main()
+20
View File
@@ -0,0 +1,20 @@
from openai import AsyncOpenAI
from pydantic_settings import BaseSettings
from docs.generator.prompts import create_prompt_messages
class Settings(BaseSettings):
openai_api_key: str = "fake-key"
settings = Settings()
client = AsyncOpenAI(api_key=settings.openai_api_key)
async def parse_markdown(original_markdown: str) -> str | None:
response = await client.chat.completions.create(
model="gpt-4o", messages=create_prompt_messages(original_markdown)
)
return response.choices[0].message.content
@@ -0,0 +1,147 @@
import os
from typing import cast
from bs4 import BeautifulSoup, Tag
from markdownify import markdownify # type: ignore[import-untyped]
from mkdocs.config.defaults import MkDocsConfig
from mkdocs.plugins import BasePlugin
from mkdocs.structure.pages import Page
from docs.generator.shared import TMP_GEN_PATH
class MarkdownExportPlugin(BasePlugin): # type: ignore
def __init__(self) -> None:
super().__init__()
self.soup: BeautifulSoup
self.page_source_path: str
def _remove_async_tags(self) -> "MarkdownExportPlugin":
spans = self.soup.find_all("span", class_="doc doc-labels")
for span in spans:
if span.find(string="async") or (
span.text and "async" == span.get_text().strip()
):
span.decompose()
return self
def _remove_hash_links(self) -> "MarkdownExportPlugin":
links = self.soup.find_all("a", class_="headerlink")
for link in links:
href = cast(str, link["href"])
if href.startswith("#"):
link.decompose()
return self
def _remove_toc(self) -> "MarkdownExportPlugin":
tocs = self.soup.find_all("nav")
for toc in tocs:
toc.decompose()
return self
def _remove_footer(self) -> "MarkdownExportPlugin":
footer = self.soup.find("footer")
if footer and isinstance(footer, Tag):
footer.decompose()
return self
def _remove_navbar(self) -> "MarkdownExportPlugin":
navbar = self.soup.find("div", class_="navbar")
if navbar and isinstance(navbar, Tag):
navbar.decompose()
navbar_header = self.soup.find("div", class_="navbar-header")
if navbar_header and isinstance(navbar_header, Tag):
navbar_header.decompose()
navbar_collapse = self.soup.find("div", class_="navbar-collapse")
if navbar_collapse and isinstance(navbar_collapse, Tag):
navbar_collapse.decompose()
return self
def _remove_keyboard_shortcuts_modal(self) -> "MarkdownExportPlugin":
modal = self.soup.find("div", id="mkdocs_keyboard_modal")
if modal and isinstance(modal, Tag):
modal.decompose()
return self
def _remove_title(self) -> "MarkdownExportPlugin":
title = self.soup.find("h1", class_="title")
if title and isinstance(title, Tag):
title.decompose()
return self
def _remove_property_tags(self) -> "MarkdownExportPlugin":
property_tags = self.soup.find_all("code", string="property")
for tag in property_tags:
tag.decompose()
return self
def _interpolate_docs_links(self) -> "MarkdownExportPlugin":
links = self.soup.find_all("a")
page_depth = self.page_source_path.count("/")
## Using the depth + 2 here because the links are relative to the root of
## the SDK docs subdir, which sits at `/sdks/python` (two levels below the root)
dirs_up_prefix = "../" * (page_depth + 2)
for link in links:
href = link.get("href")
if not href:
continue
href = cast(str, link["href"])
if href.startswith("https://docs.hatchet.run/"):
link["href"] = href.replace("https://docs.hatchet.run/", dirs_up_prefix)
return self
def _preprocess_html(self, content: str) -> str:
self.soup = BeautifulSoup(content, "html.parser")
(
self._remove_async_tags()
._remove_hash_links()
._remove_toc()
._remove_footer()
._remove_keyboard_shortcuts_modal()
._remove_navbar()
._remove_title()
._remove_property_tags()
._interpolate_docs_links()
)
return str(self.soup)
def on_post_page(
self, output_content: str, page: Page, config: MkDocsConfig
) -> str:
self.page_source_path = page.file.src_uri
content = self._preprocess_html(output_content)
md_content = markdownify(content, heading_style="ATX", wrap=False)
if not md_content:
return content
dest = os.path.splitext(page.file.dest_path)[0] + ".md"
out_path = os.path.join(TMP_GEN_PATH, dest)
os.makedirs(os.path.dirname(out_path), exist_ok=True)
with open(out_path, "w", encoding="utf-8") as f:
f.write(md_content)
return content
+24
View File
@@ -0,0 +1,24 @@
import os
from docs.generator.types import Document
def crawl_directory(directory: str, only_include: list[str]) -> list[Document]:
return [
d
for root, _, filenames in os.walk(directory)
for filename in filenames
if (d := Document.from_path(os.path.join(root, filename))).readable_source_path
in only_include
or not only_include
]
def find_child_paths(prefix: str, docs: list[Document]) -> set[str]:
return {
doc.directory
for doc in docs
if doc.directory.startswith(prefix)
and doc.directory != prefix
and doc.directory.count("/") == prefix.count("/") + 1
}
+36
View File
@@ -0,0 +1,36 @@
from typing import ParamSpec, TypeVar, cast
from openai.types.chat import (
ChatCompletionMessageParam,
ChatCompletionSystemMessageParam,
ChatCompletionUserMessageParam,
)
T = TypeVar("T")
P = ParamSpec("P")
R = TypeVar("R")
SYSTEM_PROMPT = """
You're an SDK documentation expert working on improving the readability of Hatchet's Python SDK documentation. You will be given
a markdown file, and your task is to fix any broken MDX so it can be used as a page on our Nextra documentation site.
In your work, follow these instructions:
1. Strip any unnecessary paragraph characters, but do not change any actual code, sentences, or content. You should keep the documentation as close to the original as possible, meaning that you should not generate new content, you should not consolidate existing content, you should not rearrange content, and so on.
2. Return only the content. You should not enclode the markdown in backticks or any other formatting.
3. You must ensure that MDX will render any tables correctly. One thing in particular to be on the lookout for is the use of the pipe `|` in type hints in the tables. For example, `int | None` is the Python type `Optional[int]` and should render in a single column with an escaped pipe character.
4. All code blocks should be formatted as `python`.
"""
def create_prompt_messages(
user_prompt_content: str,
) -> list[ChatCompletionMessageParam]:
return cast(
list[ChatCompletionMessageParam],
[
ChatCompletionSystemMessageParam(content=SYSTEM_PROMPT, role="system"),
ChatCompletionUserMessageParam(content=user_prompt_content, role="user"),
],
)
+1
View File
@@ -0,0 +1 @@
TMP_GEN_PATH = "/tmp/hatchet-python/docs/gen"
+67
View File
@@ -0,0 +1,67 @@
import os
import re
from pydantic import BaseModel
from docs.generator.shared import TMP_GEN_PATH
FRONTEND_DOCS_RELATIVE_PATH = "../../frontend/docs/pages/sdks/python"
MD_EXTENSION = "md"
MDX_EXTENSION = "mdx"
PY_EXTENSION = "py"
class Document(BaseModel):
source_path: str
readable_source_path: str
mdx_output_path: str
mdx_output_meta_js_path: str
is_index: bool
directory: str
basename: str
title: str = ""
meta_js_entry: str = ""
@staticmethod
def from_path(path: str) -> "Document":
# example path /tmp/hatchet-python/docs/gen/runnables.md
basename = os.path.splitext(os.path.basename(path))[0]
is_index = basename == "index"
title = (
"Introduction"
if is_index
else re.sub(
"[^0-9a-zA-Z ]+", "", basename.replace("_", " ").replace("-", " ")
).title()
)
mdx_out_path = path.replace(
TMP_GEN_PATH, "../../frontend/docs/pages/sdks/python"
)
mdx_out_dir = os.path.dirname(mdx_out_path)
return Document(
directory=os.path.dirname(path).replace(TMP_GEN_PATH, ""),
basename=basename,
title=title,
meta_js_entry=f"""
"{basename}": {{
"title": "{title}",
"theme": {{
"toc": true,
}}
}},
""",
source_path=path,
readable_source_path=path.replace(TMP_GEN_PATH, "")[1:],
mdx_output_path=mdx_out_path.replace(".md", ".mdx"),
mdx_output_meta_js_path=mdx_out_dir + "/_meta.js",
is_index=basename == "index",
)
+39
View File
@@ -0,0 +1,39 @@
import asyncio
import shutil
from typing import Coroutine, ParamSpec, TypeVar
from openai import AsyncOpenAI
from pydantic_settings import BaseSettings
T = TypeVar("T")
P = ParamSpec("P")
R = TypeVar("R")
class Settings(BaseSettings):
openai_api_key: str = "fake-key"
settings = Settings()
client = AsyncOpenAI(api_key=settings.openai_api_key)
async def gather_max_concurrency(
*tasks: Coroutine[None, None, T],
max_concurrency: int,
) -> list[T]:
"""asyncio.gather with cap on subtasks executing at once."""
sem = asyncio.Semaphore(max_concurrency)
async def task_wrapper(task: Coroutine[None, None, T]) -> T:
async with sem:
return await task
return await asyncio.gather(
*(task_wrapper(task) for task in tasks),
return_exceptions=False,
)
def rm_rf(path: str) -> None:
shutil.rmtree(path, ignore_errors=True)
+52
View File
@@ -0,0 +1,52 @@
# Runnables
`Runnables` in the Hatchet SDK are things that can be run, namely tasks and workflows. The two main types of runnables you'll encounter are:
* `Workflow`, which lets you define tasks and call all of the run, schedule, etc. methods
* `Standalone`, which is a single task that's returned by `hatchet.task` and can be run, scheduled, etc.
## Workflow
::: runnables.workflow.Workflow
options:
members:
- task
- durable_task
- on_failure_task
- on_success_task
- run
- aio_run
- run_no_wait
- aio_run_no_wait
- run_many
- aio_run_many
- run_many_no_wait
- aio_run_many_no_wait
- schedule
- aio_schedule
- create_cron
- aio_create_cron
- create_bulk_run_item
- name
- tasks
- is_durable
## Standalone
::: runnables.standalone.Standalone
options:
members:
- run
- aio_run
- run_no_wait
- aio_run_no_wait
- run_many
- aio_run_many
- run_many_no_wait
- aio_run_many_no_wait
- schedule
- aio_schedule
- create_cron
- aio_create_cron
- create_bulk_run_item
- is_durable