Cua ("koo-ah") is Docker for Computer-Use Agents - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.
With the Computer SDK, you can:
- automate Windows, Linux, and macOS VMs with a consistent, pyautogui-like API
- create & manage VMs locally or using Cua cloud
With the Agent SDK, you can:
- run computer-use models with a consistent schema
- benchmark on OSWorld-Verified, SheetBench-V2, and more with a single line of code using HUD (Notebook)
- combine UI grounding models with any LLM using composed agents
- use new UI agent models and UI grounding models from the Model Zoo below with just a model string (e.g.,
ComputerAgent(model="openai/computer-use-preview")) - use API or local inference by changing a prefix (e.g.,
openai/,openrouter/,ollama/,huggingface-local/,mlx/, etc.)
Modules
|
Agent |
Computer |
MCP Server |
Computer Server |
|
Lume |
Lumier |
SOM |
Core |
Quick Start
- Clone a starter template and run the code in <1 min
- Get started with the Cua SDKs
- Get started with the Cua CLI
Agent SDK
Install the agent SDK:
pip install cua-agent[all]
Initialize a computer agent using a model configuration string and a computer instance:
from agent import ComputerAgent
# ComputerAgent works with any computer initialized with the Computer SDK
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
Output format
Cua uses the OpenAI Agent response format.
Example
{
"output": [
{
"role": "user",
"content": "go to trycua on gh"
},
{
"summary": [
{
"text": "Searching Firefox for Trycua GitHub",
"type": "summary_text"
}
],
"type": "reasoning"
},
{
"action": {
"text": "Trycua GitHub",
"type": "type"
},
"call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
"status": "completed",
"type": "computer_call"
},
{
"type": "computer_call_output",
"call_id": "call_QI6OsYkXxl6Ww1KvyJc4LKKq",
"output": {
"type": "input_image",
"image_url": "data:image/png;base64,..."
}
},
{
"type": "message",
"role": "assistant",
"content": [
{
"text": "Success! The Trycua GitHub page has been opened.",
"type": "output_text"
}
]
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 75,
"total_tokens": 225,
"response_cost": 0.01
}
}
Model Configuration
These are the valid model configurations for ComputerAgent(model="..."):
| Configuration | Description |
|---|---|
{computer-use-model} |
A single model to perform all computer-use tasks |
{grounding-model}+{any-vlm-with-tools} |
Composed with VLM for captioning and grounding LLM for element detection |
moondream3+{any-llm-with-tools} |
Composed with Moondream3 for captioning and UI element detection |
human/human |
A human-in-the-loop in place of a model |
Model Capabilities
The following table shows which capabilities are supported by each model:
| Model | Computer-Use | Grounding | Tools | VLM |
|---|---|---|---|---|
| Claude Sonnet/Haiku | ✓ | ✓ | ✓ | ✓ |
| OpenAI CU Preview | ✓ | ✓ | ✓ | |
| GLM-V | ✓ | ✓ | ✓ | ✓ |
| Gemini CU Preview | ✓ | ✓ | ✓ | |
| InternVL | ✓ | ✓ | ✓ | ✓ |
| UI-TARS | ✓ | ✓ | ✓ | ✓ |
| OpenCUA | ✓ | |||
| GTA | ✓ | |||
| Holo | ✓ | |||
| Moondream | ✓ | |||
| OmniParser | ✓ |
Model IDs
Examples of valid model IDs
| Model | Model IDs |
|---|---|
| Claude Sonnet/Haiku | anthropic/claude-sonnet-4-5, anthropic/claude-haiku-4-5 |
| OpenAI CU Preview | openai/computer-use-preview |
| GLM-V | openrouter/z-ai/glm-4.5v, huggingface-local/zai-org/GLM-4.5V |
| Gemini CU Preview | gemini-2.5-computer-use-preview |
| InternVL | huggingface-local/OpenGVLab/InternVL3_5-{1B,2B,4B,8B,...} |
| UI-TARS | huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B |
| OpenCUA | huggingface-local/xlangai/OpenCUA-{7B,32B} |
| GTA | huggingface-local/HelloKKMe/GTA1-{7B,32B,72B} |
| Holo | huggingface-local/Hcompany/Holo1.5-{3B,7B,72B} |
| Moondream | moondream3 |
| OmniParser | omniparser |
Missing a model? Create a feature request or contribute!
Learn more in the Agent SDK documentation.
Computer SDK
Install the computer SDK:
pip install cua-computer
Initialize a computer:
from computer import Computer
computer = Computer(
os_type="linux", # or "macos", "windows"
provider_type="cloud", # or "lume", "docker", "windows_sandbox"
name="your-sandbox-name",
api_key="your-api-key" # only for cloud
# or use_host_computer_server=True for host desktop
)
try:
await computer.run()
# Take a screenshot
screenshot = await computer.interface.screenshot()
# Click and type
await computer.interface.left_click(100, 100)
await computer.interface.type("Hello!")
finally:
await computer.close()
Learn more in the Computer SDK documentation.
MCP Server
Install the MCP server:
pip install cua-mcp-server
Learn more in the MCP Server documentation.
Computer Server
Install the Computer Server:
pip install cua-computer-server
python -m computer_server
Learn more in the Computer Server documentation.
Lume
Install Lume:
curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
Learn more in the Lume documentation.
Lumier
Install Lumier:
docker pull trycua/lumier:latest
Learn more in the Lumier documentation.
SOM
Install SOM:
pip install cua-som
Learn more in the SOM documentation.
Resources
Community and Contributions
We welcome contributions to Cua! Please refer to our Contributing Guidelines for details.
Join our Discord community to discuss ideas, get assistance, or share your demos!
License
Cua is open-sourced under the MIT License - see the LICENSE file for details.
Portions of this project, specifically components adapted from Kasm Technologies Inc., are also licensed under the MIT License. See libs/kasm/LICENSE for details.
Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). See the OmniParser LICENSE for details.
Third-Party Licenses and Optional Components
Some optional extras for this project depend on third-party packages that are licensed under terms different from the MIT License.
- The optional "omni" extra (installed via
pip install "cua-agent[omni]") installs thecua-sommodule, which includesultralyticsand is licensed under the AGPL-3.0.
When you choose to install and use such optional extras, your use, modification, and distribution of those third-party components are governed by their respective licenses (e.g., AGPL-3.0 for ultralytics).
Trademarks
Apple, macOS, and Apple Silicon are trademarks of Apple Inc.
Ubuntu and Canonical are registered trademarks of Canonical Ltd.
Microsoft is a registered trademark of Microsoft Corporation.
This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., Microsoft Corporation, or Kasm Technologies.
Stargazers
Thank you to all our supporters!
Sponsors
Thank you to all our GitHub Sponsors!
