diff --git a/README.md b/README.md index fcac4a75..3da4464e 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,7 @@ With the Agent SDK, you can: - `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B` - `omniparser+any LLM` - `huggingface-local/HelloKKMe/GTA1-7B+any LLM` (using [Composed Agents](https://docs.trycua.com/docs/agent-sdk/supported-agents/composed-agents)) + - `human/human` (using [Human-in-the-Loop](https://docs.trycua.com/docs/agent-sdk/supported-agents/human-in-the-loop)) Missing a model? [Raise a feature request](https://github.com/trycua/cua/issues/new?assignees=&labels=enhancement&projects=&title=%5BAgent%5D%3A+Add+model+support+for+) or [contribute](https://github.com/trycua/cua/blob/main/CONTRIBUTING.md)! diff --git a/docs/content/docs/agent-sdk/supported-agents/human-in-the-loop.mdx b/docs/content/docs/agent-sdk/supported-agents/human-in-the-loop.mdx new file mode 100644 index 00000000..8d084d7e --- /dev/null +++ b/docs/content/docs/agent-sdk/supported-agents/human-in-the-loop.mdx @@ -0,0 +1,66 @@ +--- +title: Human-In-The-Loop +description: Use humans as agents for evaluation, demonstrations, and interactive control +--- + +The Agent SDK provides a human tool, with native support for using a human-in-the-loop as a way to evaluate your environment, tools, or to create demonstrations. You can use it by doing `grounding_model+human/human` or `human/human` directly. + +## Getting Started + +To start the human agent tool, simply run: + +```bash +python -m agent.human_tool +``` + +The UI will show you pending completions. Select a completion to take control of the agent. + +## Usage Examples + +### Direct Human Agent + +```python +from agent import ComputerAgent +from agent.computer import computer + +agent = ComputerAgent( + "human/human", + tools=[computer] +) + +async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"): + pass +``` + +### Composed with Grounding Model + +```python +agent = ComputerAgent( + "huggingface-local/HelloKKMe/GTA1-7B+human/human", + tools=[computer] +) + +async for _ in agent.run("Navigate to the settings page and enable dark mode"): + pass +``` + +## Features + +The human-in-the-loop interface provides: + +- **Interactive UI**: Web-based interface for reviewing and responding to agent requests +- **Image Display**: Screenshots with click handlers for direct interaction +- **Action Accordions**: Support for various computer actions (click, type, keypress, etc.) +- **Tool Calls**: Full OpenAI-compatible tool call support +- **Real-time Updates**: Smart polling for responsive UI updates + +## Use Cases + +- **Evaluation**: Have humans evaluate agent performance and provide ground truth responses +- **Demonstrations**: Create training data by having humans demonstrate tasks +- **Interactive Control**: Take manual control when automated agents need human guidance +- **Testing**: Validate agent, tool, and environment behavior manually + +--- + +For more details on the human tool implementation, see the [Human Tool Documentation](../../tools/human-tool). diff --git a/docs/content/docs/agent-sdk/supported-agents/meta.json b/docs/content/docs/agent-sdk/supported-agents/meta.json index 092fd051..5d50b124 100644 --- a/docs/content/docs/agent-sdk/supported-agents/meta.json +++ b/docs/content/docs/agent-sdk/supported-agents/meta.json @@ -4,6 +4,7 @@ "pages": [ "computer-use-agents", "grounding-models", - "composed-agents" + "composed-agents", + "human-in-the-loop" ] }