mirror of
https://github.com/trycua/lume.git
synced 2026-01-06 04:20:03 -06:00
Revert "refractor docs into 6 sections"
This commit is contained in:
22
README.md
22
README.md
@@ -32,14 +32,14 @@ Computer-Use Agents (CUAs) are AI systems that can autonomously interact with co
|
||||
|
||||
With the [Computer SDK](#computer-sdk), you can:
|
||||
|
||||
- automate Windows, Linux, and macOS VMs with a consistent, [pyautogui-like API](https://cua.ai/docs/computer/commands)
|
||||
- automate Windows, Linux, and macOS VMs with a consistent, [pyautogui-like API](https://cua.ai/docs/computer-sdk/commands)
|
||||
- create & manage VMs [locally](https://cua.ai/docs/quickstart-devs#using-computer) or using [Cua cloud](https://www.cua.ai/)
|
||||
|
||||
With the [Agent SDK](#agent-sdk), you can:
|
||||
|
||||
- run computer-use models with a [consistent schema](https://cua.ai/docs/agent/message-format)
|
||||
- benchmark on OSWorld-Verified (369 tasks), SheetBench-V2, and ScreenSpot [with a single line of code using HUD](https://cua.ai/docs/agent/integrations/hud) - see [benchmark results](#research--benchmarks) ([Notebook](https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb))
|
||||
- combine UI grounding models with any LLM using [composed agents](https://cua.ai/docs/agent/supported-agents/composed-agents)
|
||||
- run computer-use models with a [consistent schema](https://cua.ai/docs/agent-sdk/message-format)
|
||||
- benchmark on OSWorld-Verified (369 tasks), SheetBench-V2, and ScreenSpot [with a single line of code using HUD](https://cua.ai/docs/agent-sdk/integrations/hud) - see [benchmark results](#research--benchmarks) ([Notebook](https://github.com/trycua/cua/blob/main/notebooks/eval_osworld.ipynb))
|
||||
- combine UI grounding models with any LLM using [composed agents](https://cua.ai/docs/agent-sdk/supported-agents/composed-agents)
|
||||
- use new UI agent models and UI grounding models from the Model Zoo below with just a model string (e.g., `ComputerAgent(model="openai/computer-use-preview")`)
|
||||
- use API or local inference by changing a prefix (e.g., `openai/`, `openrouter/`, `ollama/`, `huggingface-local/`, `mlx/`, [etc.](https://docs.litellm.ai/docs/providers))
|
||||
|
||||
@@ -208,12 +208,12 @@ Cua uses the OpenAI Agent response format.
|
||||
|
||||
These are the valid model configurations for `ComputerAgent(model="...")`:
|
||||
|
||||
| Configuration | Description |
|
||||
| ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `{computer-use-model}` | A single model to perform all computer-use tasks |
|
||||
| `{grounding-model}+{any-vlm-with-tools}` | [Composed](https://cua.ai/docs/agent/supported-agents/composed-agents) with VLM for captioning and grounding LLM for element detection |
|
||||
| `moondream3+{any-llm-with-tools}` | [Composed](https://cua.ai/docs/agent/supported-agents/composed-agents) with Moondream3 for captioning and UI element detection |
|
||||
| `human/human` | A [human-in-the-loop](https://cua.ai/docs/agent/supported-agents/human-in-the-loop) in place of a model |
|
||||
| Configuration | Description |
|
||||
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `{computer-use-model}` | A single model to perform all computer-use tasks |
|
||||
| `{grounding-model}+{any-vlm-with-tools}` | [Composed](https://cua.ai/docs/agent-sdk/supported-agents/composed-agents) with VLM for captioning and grounding LLM for element detection |
|
||||
| `moondream3+{any-llm-with-tools}` | [Composed](https://cua.ai/docs/agent-sdk/supported-agents/composed-agents) with Moondream3 for captioning and UI element detection |
|
||||
| `human/human` | A [human-in-the-loop](https://cua.ai/docs/agent-sdk/supported-agents/human-in-the-loop) in place of a model |
|
||||
|
||||
### Model Capabilities
|
||||
|
||||
@@ -246,7 +246,7 @@ The following table shows which capabilities are supported by each model:
|
||||
|
||||
**Composition Examples:**
|
||||
|
||||
See more examples on our [composition docs](https://cua.ai/docs/agent/supported-agents/composed-agents).
|
||||
See more examples on our [composition docs](https://cua.ai/docs/agent-sdk/supported-agents/composed-agents).
|
||||
|
||||
```python
|
||||
# Use OpenAI's GPT-5 for planning with specialized grounding
|
||||
|
||||
@@ -63,8 +63,8 @@ agent = ComputerAgent(
|
||||
)
|
||||
```
|
||||
|
||||
- See `docs/agent/custom-tools` for authoring function tools.
|
||||
- See `docs/computer/custom-computer-handlers` for building full computer interfaces.
|
||||
- See `docs/agent-sdk/custom-tools` for authoring function tools.
|
||||
- See `docs/agent-sdk/custom-computer-handlers` for building full computer interfaces.
|
||||
|
||||
## 3) Intermediate: Callbacks
|
||||
|
||||
@@ -125,7 +125,7 @@ Both single-task and full-dataset runs share a common set of configuration optio
|
||||
|
||||
HUD provides multiple benchmark datasets for realistic evaluation.
|
||||
|
||||
1. **[OSWorld-Verified](/agent/benchmarks/osworld-verified)** – Benchmark on 369+ real-world desktop tasks across Chrome, LibreOffice, GIMP, VS Code, etc.
|
||||
1. **[OSWorld-Verified](/agent-sdk/benchmarks/osworld-verified)** – Benchmark on 369+ real-world desktop tasks across Chrome, LibreOffice, GIMP, VS Code, etc.
|
||||
_Best for_: evaluating full computer-use agents in realistic environments.
|
||||
_Verified variant_: fixes 300+ issues from earlier versions for reliability.
|
||||
|
||||
@@ -120,4 +120,4 @@ All MCP clients can configure the server using environment variables:
|
||||
- `CUA_MAX_IMAGES` - Maximum images to keep in context
|
||||
- `CUA_USE_HOST_COMPUTER_SERVER` - Use host system instead of VM
|
||||
|
||||
See the [Configuration](/mcp/configuration) page for detailed configuration options.
|
||||
See the [Configuration](/docs/libraries/mcp-server/configuration) page for detailed configuration options.
|
||||
@@ -1,8 +1,5 @@
|
||||
{
|
||||
"title": "MCP",
|
||||
"description": "MCP server for using Cua agents and computers",
|
||||
"pages": [
|
||||
"index",
|
||||
"installation",
|
||||
"configuration",
|
||||
"usage",
|
||||
@@ -82,7 +82,7 @@ Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client c
|
||||
|
||||
<Tabs items={['Claude Desktop', 'Other MCP Clients']}>
|
||||
<Tab value="Claude Desktop">
|
||||
Update your Claude Desktop config (see [Installation](/mcp/installation)) to include the environment variable:
|
||||
Update your Claude Desktop config (see [Installation](/docs/libraries/mcp-server/installation)) to include the environment variable:
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"title": "Agent",
|
||||
"title": "Agent SDK",
|
||||
"description": "Build computer-using agents with the Agent SDK",
|
||||
"pages": [
|
||||
"agent-loops",
|
||||
@@ -14,6 +14,7 @@
|
||||
"usage-tracking",
|
||||
"telemetry",
|
||||
"benchmarks",
|
||||
"integrations"
|
||||
"integrations",
|
||||
"mcp-server"
|
||||
]
|
||||
}
|
||||
@@ -7,7 +7,7 @@ These models support complete computer-use agent functionality through `Computer
|
||||
|
||||
All agent loops are compatible with any LLM provider supported by LiteLLM.
|
||||
|
||||
See [Running Models Locally](/agent/supported-model-providers/local-models) for how to use Hugging Face and MLX models on your own machine.
|
||||
See [Running Models Locally](/agent-sdk/supported-model-providers/local-models) for how to use Hugging Face and MLX models on your own machine.
|
||||
|
||||
## Gemini CUA
|
||||
|
||||
@@ -224,7 +224,7 @@ Requests are billed in **credits**:
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** Cua VLM Router is a fully managed cloud service. If you want to use your own provider API keys directly (BYOK), see the [Supported Model Providers](/agent/supported-model-providers/) page for direct provider access via the agent SDK.
|
||||
**Note:** Cua VLM Router is a fully managed cloud service. If you want to use your own provider API keys directly (BYOK), see the [Supported Model Providers](/agent-sdk/supported-model-providers/) page for direct provider access via the agent SDK.
|
||||
|
||||
## Response Metadata
|
||||
|
||||
@@ -435,7 +435,7 @@ That's it! Same code structure, just different model format. Cua manages all pro
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Explore [Agent Loops](/agent/agent-loops) to customize agent behavior
|
||||
- Learn about [Cost Saving Callbacks](/agent/callbacks/cost-saving)
|
||||
- Explore [Agent Loops](/agent-sdk/agent-loops) to customize agent behavior
|
||||
- Learn about [Cost Saving Callbacks](/agent-sdk/callbacks/cost-saving)
|
||||
- Try [Example Use Cases](/example-usecases/form-filling)
|
||||
- Review [Supported Model Providers](/agent/supported-model-providers/) for all options
|
||||
- Review [Supported Model Providers](/agent-sdk/supported-model-providers/) for all options
|
||||
@@ -24,7 +24,7 @@ model="cua/google/gemini-3-flash-preview" # Gemini 3 Flash Preview (fastest and
|
||||
- Cost tracking and optimization
|
||||
- Fully managed infrastructure (no provider keys to manage)
|
||||
|
||||
[Learn more about Cua VLM Router →](/agent/supported-model-providers/cua-vlm-router)
|
||||
[Learn more about Cua VLM Router →](/agent-sdk/supported-model-providers/cua-vlm-router)
|
||||
|
||||
---
|
||||
|
||||
@@ -429,5 +429,5 @@ watch -n 5 cua list
|
||||
## Next Steps
|
||||
|
||||
- [Get started with the quickstart guide](/get-started/quickstart#cli-quickstart)
|
||||
- [Learn about CUA computers](/computer/computers)
|
||||
- [Explore agent automation](/agent/agent-loops)
|
||||
- [Learn about CUA computers](/computer-sdk/computers)
|
||||
- [Explore agent automation](/agent-sdk/agent-loops)
|
||||
|
||||
@@ -5,7 +5,7 @@ description: Computer commands and interface methods
|
||||
|
||||
This page describes the set of supported **commands** you can use to control a Cua Computer Framework directly via the Python SDK.
|
||||
|
||||
These commands map to the same actions available in the [Computer Server API Commands Reference](/computer/computer-server/Commands), and provide low-level, async access to system operations from your agent or automation code.
|
||||
These commands map to the same actions available in the [Computer Server API Commands Reference](/computer-sdk/computer-server/Commands), and provide low-level, async access to system operations from your agent or automation code.
|
||||
|
||||
## Shell Actions
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"title": "Computer",
|
||||
"title": "Computer SDK",
|
||||
"description": "Build computer-using agents with the Computer SDK",
|
||||
"pages": [
|
||||
"computers",
|
||||
@@ -10,7 +10,7 @@ import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
|
||||
|
||||
Cua can be used to automate interactions between form filling and local file systems over any operating system. Cua let's you interact with all the elements of a web page and local file systems to integrate between the two.
|
||||
|
||||
This preset usecase uses [Cua Computer Framework](/computer/computers) to interact with a web page and local file systems along with [Agent Loops](/agent/agent-loops) to run the agent in a loop with message history.
|
||||
This preset usecase uses [Cua Computer Framework](/computer-sdk/computers) to interact with a web page and local file systems along with [Agent Loops](/agent-sdk/agent-loops) to run the agent in a loop with message history.
|
||||
|
||||
---
|
||||
|
||||
@@ -494,7 +494,7 @@ Monitor the output to see the agent's progress through each task.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Learn more about [Cua Computer Framework](/computer/computers) and [Computer Commands](/computer/commands)
|
||||
- Read about [Agent Loops](/agent/agent-loops), [Tools](/agent/custom-tools), and [Supported Model Providers](/agent/supported-model-providers/)
|
||||
- Experiment with different [Models and Providers](/agent/supported-model-providers/)
|
||||
- Learn more about [Cua Computer Framework](/computer-sdk/computers) and [Computer Commands](/computer-sdk/commands)
|
||||
- Read about [Agent Loops](/agent-sdk/agent-loops), [Tools](/agent-sdk/custom-tools), and [Supported Model Providers](/agent-sdk/supported-model-providers/)
|
||||
- Experiment with different [Models and Providers](/agent-sdk/supported-model-providers/)
|
||||
- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help
|
||||
|
||||
@@ -633,7 +633,7 @@ Ensure billing is enabled for your Google Cloud project. Visit the [Billing sect
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Learn more about [OmniParser agent loops](/agent/agent-loops)
|
||||
- Learn more about [OmniParser agent loops](/agent-sdk/agent-loops)
|
||||
- Explore [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing)
|
||||
- Read about [ScreenSpot-Pro benchmark](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding)
|
||||
- Check out [Google's Gemini 3 announcement](https://blog.google/products/gemini/gemini-3/)
|
||||
|
||||
@@ -12,7 +12,7 @@ After networking events, you need to export new connections from LinkedIn, X, or
|
||||
|
||||
**The workflow**: Kick off the script after an event and let it run overnight. Wake up to a clean CSV ready for your CRM or email tool.
|
||||
|
||||
This example focuses on LinkedIn but works across platforms. It uses [Cua Computer Framework](/computer/computers) to interact with web interfaces and [Agent Loops](/agent/agent-loops) to iterate through connections with conversation history.
|
||||
This example focuses on LinkedIn but works across platforms. It uses [Cua Computer Framework](/computer-sdk/computers) to interact with web interfaces and [Agent Loops](/agent-sdk/agent-loops) to iterate through connections with conversation history.
|
||||
|
||||
### Why Cua is Perfect for This
|
||||
|
||||
@@ -475,8 +475,8 @@ This script demonstrates a practical workflow for extracting LinkedIn connection
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Learn more about [Cua Computer Framework](/computer/computers) and [Computer Commands](/computer/commands)
|
||||
- Read about [Agent Loops](/agent/agent-loops), [Tools](/agent/custom-tools), and [Supported Model Providers](/agent/supported-model-providers/)
|
||||
- Experiment with different [Models and Providers](/agent/supported-model-providers/)
|
||||
- Learn more about [Cua Computer Framework](/computer-sdk/computers) and [Computer Commands](/computer-sdk/commands)
|
||||
- Read about [Agent Loops](/agent-sdk/agent-loops), [Tools](/agent-sdk/custom-tools), and [Supported Model Providers](/agent-sdk/supported-model-providers/)
|
||||
- Experiment with different [Models and Providers](/agent-sdk/supported-model-providers/)
|
||||
- Adapt this script for other platforms (Twitter/X, email extraction, etc.)
|
||||
- Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help
|
||||
|
||||
@@ -616,8 +616,8 @@ If costs are higher than expected:
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **Explore Custom Tools**: Learn how to create [Custom Tools](/agent/custom-tools) for application-specific actions
|
||||
- **Implement Callbacks**: Add [Monitoring and Logging](/agent/callbacks) for production workflows
|
||||
- **Explore Custom Tools**: Learn how to create [Custom Tools](/agent-sdk/custom-tools) for application-specific actions
|
||||
- **Implement Callbacks**: Add [Monitoring and Logging](/agent-sdk/callbacks) for production workflows
|
||||
- **Join community**: Get help in our [Discord](https://discord.com/invite/mVnXXpdE85)
|
||||
|
||||
---
|
||||
@@ -626,4 +626,4 @@ If costs are higher than expected:
|
||||
|
||||
- [Form Filling](/example-usecases/form-filling) - Web form automation
|
||||
- [Post-Event Contact Export](/example-usecases/post-event-contact-export) - Data extraction workflows
|
||||
- [Custom Tools](/agent/custom-tools) - Building application-specific functions
|
||||
- [Custom Tools](/agent-sdk/custom-tools) - Building application-specific functions
|
||||
|
||||
@@ -619,7 +619,7 @@ Install Cua Computer Framework and verify your sandbox is working by performing
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
Learn more about computers in the [Cua computers documentation](/computer/computers).
|
||||
Learn more about computers in the [Cua computers documentation](/computer-sdk/computers).
|
||||
|
||||
</Step>
|
||||
|
||||
@@ -772,7 +772,7 @@ While you can build your own agent loop with any LLM, Cua Agent Framework is the
|
||||
- `huggingface-local/*` - Local HuggingFace models
|
||||
- And many more via LiteLLM
|
||||
|
||||
See [Supported Models](/agent/supported-model-providers/) for the complete list.
|
||||
See [Supported Models](/agent-sdk/supported-model-providers/) for the complete list.
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
@@ -927,17 +927,17 @@ While you can build your own agent loop with any LLM, Cua Agent Framework is the
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
Learn more about agents in [Agent Loops](/agent/agent-loops) and available models in [Supported Models](/agent/supported-model-providers/).
|
||||
Learn more about agents in [Agent Loops](/agent-sdk/agent-loops) and available models in [Supported Models](/agent-sdk/supported-model-providers/).
|
||||
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
### Next Steps
|
||||
|
||||
- Explore [Cua Computer Framework Commands](/computer/commands) for more sandbox interactions
|
||||
- Learn about [Agent Loops](/agent/agent-loops) and advanced agent configuration
|
||||
- Check out [Custom Tools](/agent/custom-tools) to extend your agents
|
||||
- Review [Supported Model Providers](/agent/supported-model-providers/) for more LLM options
|
||||
- Explore [Cua Computer Framework Commands](/computer-sdk/commands) for more sandbox interactions
|
||||
- Learn about [Agent Loops](/agent-sdk/agent-loops) and advanced agent configuration
|
||||
- Check out [Custom Tools](/agent-sdk/custom-tools) to extend your agents
|
||||
- Review [Supported Model Providers](/agent-sdk/supported-model-providers/) for more LLM options
|
||||
- Try the [Form Filling](/example-usecases/form-filling) example use case
|
||||
- Join our [Discord community](https://discord.com/invite/cua-ai) for help and discussion
|
||||
|
||||
@@ -1080,4 +1080,4 @@ cua sb delete my-sandbox-abc123
|
||||
|
||||
---
|
||||
|
||||
For running models locally, see [Running Models Locally](/agent/supported-model-providers/local-models). */}
|
||||
For running models locally, see [Running Models Locally](/agent-sdk/supported-model-providers/local-models). */}
|
||||
|
||||
@@ -45,10 +45,10 @@ Check out our [tutorials](https://cua.ai/blog), [examples](https://github.com/tr
|
||||
|
||||
<div className="grid grid-cols-2 md:grid-cols-4 gap-2 mt-4 text-sm">
|
||||
<Card icon={<Rocket className="w-4 h-4" />} href="/get-started/quickstart" title="Quickstart" />
|
||||
<Card icon={<Zap className="w-4 h-4" />} href="/agent/agent-loops" title="Agent Loops" />
|
||||
<Card icon={<Zap className="w-4 h-4" />} href="/agent-sdk/agent-loops" title="Agent Loops" />
|
||||
<Card
|
||||
icon={<BookOpen className="w-4 h-4" />}
|
||||
href="/computer/computers"
|
||||
href="/computer-sdk/computers"
|
||||
title="Cua Computer"
|
||||
/>
|
||||
<Card
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
{
|
||||
"title": "Lume",
|
||||
"description": "VM management for macOS",
|
||||
"pages": ["index", "installation", "prebuilt-images", "cli-reference", "http-api", "faq"]
|
||||
}
|
||||
@@ -1,5 +0,0 @@
|
||||
{
|
||||
"title": "Lumier",
|
||||
"description": "Docker interface for macOS/Linux VMs",
|
||||
"pages": ["index", "installation", "docker", "docker-compose", "building-lumier"]
|
||||
}
|
||||
3
docs/content/docs/macos-vm-cli-playbook/lume/meta.json
Normal file
3
docs/content/docs/macos-vm-cli-playbook/lume/meta.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"pages": ["installation", "prebuilt-images", "cli-reference", "http-api", "faq"]
|
||||
}
|
||||
3
docs/content/docs/macos-vm-cli-playbook/lumier/meta.json
Normal file
3
docs/content/docs/macos-vm-cli-playbook/lumier/meta.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"pages": ["installation", "docker", "docker-compose", "building-lumier"]
|
||||
}
|
||||
5
docs/content/docs/macos-vm-cli-playbook/meta.json
Normal file
5
docs/content/docs/macos-vm-cli-playbook/meta.json
Normal file
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"title": "macOS VM CLI",
|
||||
"description": "CLI tools for macOS virtualization",
|
||||
"pages": ["lume", "lumier"]
|
||||
}
|
||||
@@ -10,19 +10,13 @@
|
||||
"...get-started",
|
||||
"---[ChefHat]Examples Cookbook---",
|
||||
"...example-usecases",
|
||||
"---[Bot]Agent---",
|
||||
"...agent",
|
||||
"---[BookCopy]Computer---",
|
||||
"...computer",
|
||||
"---[Terminal]Lume---",
|
||||
"...lume",
|
||||
"---[Terminal]Lumier---",
|
||||
"...lumier",
|
||||
"---[Target]Set-of-Mark---",
|
||||
"...som",
|
||||
"---[Plug]MCP---",
|
||||
"...mcp",
|
||||
"---[Bot]Agent Playbook---",
|
||||
"...agent-sdk",
|
||||
"---[BookCopy]Computer Playbook---",
|
||||
"...computer-sdk",
|
||||
"---[Terminal]Cloud CLI Playbook---",
|
||||
"...cli-playbook"
|
||||
"...cli-playbook",
|
||||
"---[Terminal]macOS VM CLI Playbook---",
|
||||
"...macos-vm-cli-playbook"
|
||||
]
|
||||
}
|
||||
|
||||
@@ -1,82 +0,0 @@
|
||||
---
|
||||
title: Set-of-Mark
|
||||
description: Set-of-Mark library for Agent
|
||||
---
|
||||
|
||||
**Set-of-Mark (SOM)** is a visual grounding component for the Computer-Use Agent (CUA) framework powering Cua, for detecting and analyzing UI elements in screenshots. Optimized for macOS Silicon with Metal Performance Shaders (MPS), it combines YOLO-based icon detection with EasyOCR text recognition to provide comprehensive UI element analysis.
|
||||
|
||||
## Features
|
||||
|
||||
- Optimized for Apple Silicon with MPS acceleration
|
||||
- Icon detection using YOLO with multi-scale processing
|
||||
- Text recognition using EasyOCR (GPU-accelerated)
|
||||
- Automatic hardware detection (MPS → CUDA → CPU)
|
||||
- Smart detection parameters tuned for UI elements
|
||||
- Detailed visualization with numbered annotations
|
||||
- Performance benchmarking tools
|
||||
|
||||
## System Requirements
|
||||
|
||||
- **Recommended**: macOS with Apple Silicon
|
||||
- Uses Metal Performance Shaders (MPS)
|
||||
- Multi-scale detection enabled
|
||||
- ~0.4s average detection time
|
||||
- **Supported**: Any Python 3.11+ environment
|
||||
- Falls back to CPU if no GPU available
|
||||
- Single-scale detection on CPU
|
||||
- ~1.3s average detection time
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install cua-som
|
||||
```
|
||||
|
||||
Or install with the Agent SDK:
|
||||
|
||||
```bash
|
||||
pip install "cua-agent[omni]"
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from som import OmniParser
|
||||
from PIL import Image
|
||||
|
||||
# Initialize parser
|
||||
parser = OmniParser()
|
||||
|
||||
# Process an image
|
||||
image = Image.open("screenshot.png")
|
||||
result = parser.parse(
|
||||
image,
|
||||
box_threshold=0.3, # Confidence threshold
|
||||
iou_threshold=0.1, # Overlap threshold
|
||||
use_ocr=True # Enable text detection
|
||||
)
|
||||
|
||||
# Access results
|
||||
for elem in result.elements:
|
||||
if elem.type == "icon":
|
||||
print(f"Icon: confidence={elem.confidence:.3f}, bbox={elem.bbox.coordinates}")
|
||||
else: # text
|
||||
print(f"Text: '{elem.content}', confidence={elem.confidence:.3f}")
|
||||
```
|
||||
|
||||
## Usage with Agent
|
||||
|
||||
Set-of-Mark is used by the Agent SDK's OmniParser loop for UI element detection. When using the OmniParser agent loop, Set-of-Mark is automatically used for grounding:
|
||||
|
||||
```python
|
||||
from agent import ComputerAgent
|
||||
|
||||
# OmniParser loop uses Set-of-Mark automatically
|
||||
agent = ComputerAgent(
|
||||
model="omniparser+anthropic/claude-sonnet-4-5-20250929",
|
||||
tools=[computer]
|
||||
)
|
||||
```
|
||||
|
||||
See the [Agent documentation](/agent/supported-agents/grounding-models) for more details on using Set-of-Mark with agents.
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
{
|
||||
"title": "Set-of-Mark",
|
||||
"description": "Set-of-Mark library for Agent",
|
||||
"pages": ["index"]
|
||||
}
|
||||
@@ -45,10 +45,10 @@ const config = {
|
||||
destination: '/get-started/quickstart',
|
||||
permanent: true,
|
||||
},
|
||||
// Moved telemetry to agent section
|
||||
// Moved telemetry to agent-sdk section
|
||||
{
|
||||
source: '/telemetry',
|
||||
destination: '/agent/telemetry',
|
||||
destination: '/agent-sdk/telemetry',
|
||||
permanent: true,
|
||||
},
|
||||
// Removed quickstart-cli, consolidated into main quickstart
|
||||
@@ -57,37 +57,6 @@ const config = {
|
||||
destination: '/get-started/quickstart',
|
||||
permanent: true,
|
||||
},
|
||||
// Documentation restructure: 6-section organization
|
||||
// Redirect old agent-sdk paths to new agent paths
|
||||
{
|
||||
source: '/agent-sdk/:path*',
|
||||
destination: '/agent/:path*',
|
||||
permanent: true,
|
||||
},
|
||||
// Redirect old computer-sdk paths to new computer paths
|
||||
{
|
||||
source: '/computer-sdk/:path*',
|
||||
destination: '/computer/:path*',
|
||||
permanent: true,
|
||||
},
|
||||
// Redirect old macos-vm-cli-playbook/lume paths to new lume paths
|
||||
{
|
||||
source: '/macos-vm-cli-playbook/lume/:path*',
|
||||
destination: '/lume/:path*',
|
||||
permanent: true,
|
||||
},
|
||||
// Redirect old macos-vm-cli-playbook/lumier paths to new lumier paths
|
||||
{
|
||||
source: '/macos-vm-cli-playbook/lumier/:path*',
|
||||
destination: '/lumier/:path*',
|
||||
permanent: true,
|
||||
},
|
||||
// Redirect old agent-sdk/mcp-server paths to new mcp paths
|
||||
{
|
||||
source: '/agent-sdk/mcp-server/:path*',
|
||||
destination: '/mcp/:path*',
|
||||
permanent: true,
|
||||
},
|
||||
];
|
||||
},
|
||||
images: {
|
||||
|
||||
@@ -48,7 +48,7 @@ export function Footer() {
|
||||
</li>
|
||||
<li>
|
||||
<a
|
||||
href="/docs/agent/agent-loops"
|
||||
href="/docs/agent-sdk/agent-loops"
|
||||
className="text-sm text-fd-muted-foreground hover:text-fd-foreground transition-colors"
|
||||
>
|
||||
Agent Loops
|
||||
|
||||
@@ -55,11 +55,11 @@ To get set up with Lume for development, read [these instructions](Development.m
|
||||
|
||||
## Docs
|
||||
|
||||
- [Installation](https://cua.ai/docs/lume/installation)
|
||||
- [Prebuilt Images](https://cua.ai/docs/lume/prebuilt-images)
|
||||
- [CLI Reference](https://cua.ai/docs/lume/cli-reference)
|
||||
- [HTTP API](https://cua.ai/docs/lume/http-api)
|
||||
- [FAQ](https://cua.ai/docs/lume/faq)
|
||||
- [Installation](https://cua.ai/docs/macos-vm-cli-playbook/lume/installation)
|
||||
- [Prebuilt Images](https://cua.ai/docs/macos-vm-cli-playbook/lume/prebuilt-images)
|
||||
- [CLI Reference](https://cua.ai/docs/macos-vm-cli-playbook/lume/cli-reference)
|
||||
- [HTTP API](https://cua.ai/docs/macos-vm-cli-playbook/lume/http-api)
|
||||
- [FAQ](https://cua.ai/docs/macos-vm-cli-playbook/lume/faq)
|
||||
|
||||
## Contributing
|
||||
|
||||
|
||||
@@ -58,14 +58,14 @@ docker run -it --rm \
|
||||
|
||||
After running the command above, you can access your macOS VM through a web browser (e.g., http://localhost:8006).
|
||||
|
||||
> **Note:** With the basic setup above, your VM will be reset when you stop the container (ephemeral mode). This means any changes you make inside the macOS VM will be lost. See [the documentation](https://cua.ai/docs/lumier/docker) for how to save your VM state.
|
||||
> **Note:** With the basic setup above, your VM will be reset when you stop the container (ephemeral mode). This means any changes you make inside the macOS VM will be lost. See [the documentation](https://cua.ai/docs/macos-vm-cli-playbook/lumier/docker) for how to save your VM state.
|
||||
|
||||
## Docs
|
||||
|
||||
- [Installation](https://cua.ai/docs/lumier/installation)
|
||||
- [Docker](https://cua.ai/docs/lumier/docker)
|
||||
- [Docker Compose](https://cua.ai/docs/lumier/docker-compose)
|
||||
- [Building Lumier](https://cua.ai/docs/lumier/building-lumier)
|
||||
- [Installation](https://cua.ai/docs/macos-vm-cli-playbook/lumier/installation)
|
||||
- [Docker](https://cua.ai/docs/macos-vm-cli-playbook/lumier/docker)
|
||||
- [Docker Compose](https://cua.ai/docs/macos-vm-cli-playbook/lumier/docker-compose)
|
||||
- [Building Lumier](https://cua.ai/docs/macos-vm-cli-playbook/lumier/building-lumier)
|
||||
|
||||
## Credits
|
||||
|
||||
|
||||
@@ -72,16 +72,16 @@ if __name__ == "__main__":
|
||||
|
||||
## Docs
|
||||
|
||||
- [Agent Loops](https://cua.ai/docs/agent/agent-loops)
|
||||
- [Supported Agents](https://cua.ai/docs/agent/supported-agents/computer-use-agents)
|
||||
- [Supported Models](https://cua.ai/docs/agent/supported-model-providers)
|
||||
- [Chat History](https://cua.ai/docs/agent/chat-history)
|
||||
- [Callbacks](https://cua.ai/docs/agent/callbacks)
|
||||
- [Custom Tools](https://cua.ai/docs/agent/custom-tools)
|
||||
- [Custom Computer Handlers](https://cua.ai/docs/computer/custom-computer-handlers)
|
||||
- [Prompt Caching](https://cua.ai/docs/agent/prompt-caching)
|
||||
- [Usage Tracking](https://cua.ai/docs/agent/usage-tracking)
|
||||
- [Benchmarks](https://cua.ai/docs/agent/benchmarks)
|
||||
- [Agent Loops](https://cua.ai/docs/agent-sdk/agent-loops)
|
||||
- [Supported Agents](https://cua.ai/docs/agent-sdk/supported-agents/computer-use-agents)
|
||||
- [Supported Models](https://cua.ai/docs/agent-sdk/supported-model-providers)
|
||||
- [Chat History](https://cua.ai/docs/agent-sdk/chat-history)
|
||||
- [Callbacks](https://cua.ai/docs/agent-sdk/callbacks)
|
||||
- [Custom Tools](https://cua.ai/docs/agent-sdk/custom-tools)
|
||||
- [Custom Computer Handlers](https://cua.ai/docs/computer-sdk/custom-computer-handlers)
|
||||
- [Prompt Caching](https://cua.ai/docs/agent-sdk/prompt-caching)
|
||||
- [Usage Tracking](https://cua.ai/docs/agent-sdk/usage-tracking)
|
||||
- [Benchmarks](https://cua.ai/docs/agent-sdk/benchmarks)
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@@ -40,7 +40,7 @@ Refer to this notebook for a step-by-step guide on how to use the Computer-Use S
|
||||
|
||||
## Docs
|
||||
|
||||
- [Commands](https://cua.ai/docs/computer/computer-server/Commands)
|
||||
- [REST-API](https://cua.ai/docs/computer/computer-server/REST-API)
|
||||
- [WebSocket-API](https://cua.ai/docs/computer/computer-server/WebSocket-API)
|
||||
- [Index](https://cua.ai/docs/computer/computer-server)
|
||||
- [Commands](https://cua.ai/docs/computer-sdk/computer-server/Commands)
|
||||
- [REST-API](https://cua.ai/docs/computer-sdk/computer-server/REST-API)
|
||||
- [WebSocket-API](https://cua.ai/docs/computer-sdk/computer-server/WebSocket-API)
|
||||
- [Index](https://cua.ai/docs/computer-sdk/computer-server)
|
||||
|
||||
@@ -68,7 +68,7 @@ Refer to this notebook for a step-by-step guide on how to use the Computer-Use I
|
||||
|
||||
## Docs
|
||||
|
||||
- [Computers](https://cua.ai/docs/computer/computers)
|
||||
- [Commands](https://cua.ai/docs/computer/commands)
|
||||
- [Computer UI](https://cua.ai/docs/computer/computer-ui)
|
||||
- [Sandboxed Python](https://cua.ai/docs/computer/sandboxed-python)
|
||||
- [Computers](https://cua.ai/docs/computer-sdk/computers)
|
||||
- [Commands](https://cua.ai/docs/computer-sdk/commands)
|
||||
- [Computer UI](https://cua.ai/docs/computer-sdk/computer-ui)
|
||||
- [Sandboxed Python](https://cua.ai/docs/computer-sdk/sandboxed-python)
|
||||
|
||||
@@ -129,12 +129,12 @@ See [desktop-extension/README.md](desktop-extension/README.md) for more details.
|
||||
|
||||
## Documentation
|
||||
|
||||
- Installation: https://cua.ai/docs/mcp/installation
|
||||
- Configuration: https://cua.ai/docs/mcp/configuration
|
||||
- Usage: https://cua.ai/docs/mcp/usage
|
||||
- Tools: https://cua.ai/docs/mcp/tools
|
||||
- Client Integrations: https://cua.ai/docs/mcp/client-integrations
|
||||
- LLM Integrations: https://cua.ai/docs/mcp/llm-integrations
|
||||
- Installation: https://cua.ai/docs/agent-sdk/mcp-server/installation
|
||||
- Configuration: https://cua.ai/docs/agent-sdk/mcp-server/configuration
|
||||
- Usage: https://cua.ai/docs/agent-sdk/mcp-server/usage
|
||||
- Tools: https://cua.ai/docs/agent-sdk/mcp-server/tools
|
||||
- Client Integrations: https://cua.ai/docs/agent-sdk/mcp-server/client-integrations
|
||||
- LLM Integrations: https://cua.ai/docs/agent-sdk/mcp-server/llm-integrations
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
||||
@@ -76,9 +76,9 @@ Refer to this example for a step-by-step guide on how to use the Computer-Use In
|
||||
|
||||
## Docs
|
||||
|
||||
- [Computers](https://cua.ai/docs/computer/computers)
|
||||
- [Commands](https://cua.ai/docs/computer/commands)
|
||||
- [Computer UI](https://cua.ai/docs/computer/computer-ui)
|
||||
- [Computers](https://cua.ai/docs/computer-sdk/computers)
|
||||
- [Commands](https://cua.ai/docs/computer-sdk/commands)
|
||||
- [Computer UI](https://cua.ai/docs/computer-sdk/computer-ui)
|
||||
|
||||
## License
|
||||
|
||||
|
||||
Reference in New Issue
Block a user