Move text from README to Cua documentation

2026-05-05 14:42:50 -05:00 · 2025-08-28 11:07:20 -04:00
parent 110e6482c1
commit e17f6106c8
10 changed files with 223 additions and 324 deletions
@@ -29,16 +29,6 @@

 ```bash
 pip install "cua-agent[all]"
-
-# or install specific providers
-pip install "cua-agent[openai]"        # OpenAI computer-use-preview support
-pip install "cua-agent[anthropic]"     # Anthropic Claude support
-pip install "cua-agent[omni]"          # Omniparser + any LLM support
-pip install "cua-agent[uitars]"        # UI-TARS
-pip install "cua-agent[uitars-mlx]"    # UI-TARS + MLX support
-pip install "cua-agent[uitars-hf]"     # UI-TARS + Huggingface support
-pip install "cua-agent[glm45v-hf]"     # GLM-4.5V + Huggingface support
-pip install "cua-agent[ui]"            # Gradio UI support
 ```

 ## Quick Start
@@ -79,303 +69,18 @@ if __name__ == "__main__":
    asyncio.run(main())
 ```

-## Supported Models
+## Docs

-### Anthropic Claude (Computer Use API)
-```python
-model="anthropic/claude-3-5-sonnet-20241022"
-model="anthropic/claude-3-7-sonnet-20250219"
-model="anthropic/claude-opus-4-20250514"
-model="anthropic/claude-sonnet-4-20250514"
-```
-
-### OpenAI Computer Use Preview
-```python
-model="openai/computer-use-preview"
-```
-
-### UI-TARS (Local or Huggingface Inference)
-```python
-model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
-model="ollama_chat/0000/ui-tars-1.5-7b"
-```
-
-### Omniparser + Any LLM
-```python
-model="omniparser+ollama_chat/mistral-small3.2"
-model="omniparser+vertex_ai/gemini-pro"
-model="omniparser+anthropic/claude-3-5-sonnet-20241022"
-model="omniparser+openai/gpt-4o"
-```
-
-## Custom Tools
-
-Define custom tools using decorated functions:
-
-```python
-from computer.helpers import sandboxed
-
-@sandboxed()
-def read_file(location: str) -> str:
-    """Read contents of a file
-    
-    Parameters
-    ----------
-    location : str
-        Path to the file to read
-        
-    Returns
-    -------
-    str
-        Contents of the file or error message
-    """
-    try:
-        with open(location, 'r') as f:
-            return f.read()
-    except Exception as e:
-        return f"Error reading file: {str(e)}"
-
-def calculate(a: int, b: int) -> int:
-    """Calculate the sum of two integers"""
-    return a + b
-
-# Use with agent
-agent = ComputerAgent(
-    model="anthropic/claude-3-5-sonnet-20241022",
-    tools=[computer, read_file, calculate]
-)
-```
-
-## Callbacks System
-
-agent provides a comprehensive callback system for extending functionality:
-
-### Built-in Callbacks
-
-```python
-from agent.callbacks import (
-    ImageRetentionCallback,
-    TrajectorySaverCallback, 
-    BudgetManagerCallback,
-    LoggingCallback
-)
-
-agent = ComputerAgent(
-    model="anthropic/claude-3-5-sonnet-20241022",
-    tools=[computer],
-    callbacks=[
-        ImageRetentionCallback(only_n_most_recent_images=3),
-        TrajectorySaverCallback(trajectory_dir="trajectories"),
-        BudgetManagerCallback(max_budget=10.0, raise_error=True),
-        LoggingCallback(level=logging.INFO)
-    ]
-)
-```
-
-### Custom Callbacks
-
-```python
-from agent.callbacks.base import AsyncCallbackHandler
-
-class CustomCallback(AsyncCallbackHandler):
-    async def on_llm_start(self, messages):
-        """Preprocess messages before LLM call"""
-        # Add custom preprocessing logic
-        return messages
-    
-    async def on_llm_end(self, messages):
-        """Postprocess messages after LLM call"""
-        # Add custom postprocessing logic
-        return messages
-    
-    async def on_usage(self, usage):
-        """Track usage information"""
-        print(f"Tokens used: {usage.total_tokens}")
-```
-
-## Budget Management
-
-Control costs with built-in budget management:
-
-```python
-# Simple budget limit
-agent = ComputerAgent(
-    model="anthropic/claude-3-5-sonnet-20241022",
-    max_trajectory_budget=5.0  # $5 limit
-)
-
-# Advanced budget configuration
-agent = ComputerAgent(
-    model="anthropic/claude-3-5-sonnet-20241022",
-    max_trajectory_budget={
-        "max_budget": 10.0,
-        "raise_error": True,  # Raise error when exceeded
-        "reset_after_each_run": False  # Persistent across runs
-    }
-)
-```
-
-## Trajectory Management
-
-Save and replay agent conversations:
-
-```python
-agent = ComputerAgent(
-    model="anthropic/claude-3-5-sonnet-20241022",
-    trajectory_dir="trajectories",  # Auto-save trajectories
-    tools=[computer]
-)
-
-# Trajectories are saved with:
-# - Complete conversation history
-# - Usage statistics and costs
-# - Timestamps and metadata
-# - Screenshots and computer actions
-```
-
-## Configuration Options
-
-### ComputerAgent Parameters
-
- `model`: Model identifier (required)
- `tools`: List of computer objects and decorated functions
- `callbacks`: List of callback handlers for extensibility
- `only_n_most_recent_images`: Limit recent images to prevent context overflow
- `verbosity`: Logging level (logging.INFO, logging.DEBUG, etc.)
- `trajectory_dir`: Directory to save conversation trajectories
- `max_retries`: Maximum API call retries (default: 3)
- `screenshot_delay`: Delay between actions and screenshots (default: 0.5s)
- `use_prompt_caching`: Enable prompt caching for supported models
- `max_trajectory_budget`: Budget limit configuration
-
-### Environment Variables
-
-```bash
-# Computer instance (cloud)
-export CUA_CONTAINER_NAME="your-container-name"
-export CUA_API_KEY="your-cua-api-key"
-
-# LLM API keys
-export ANTHROPIC_API_KEY="your-anthropic-key"
-export OPENAI_API_KEY="your-openai-key"
-```
-
-## Advanced Usage
-
-### Streaming Responses
-
-```python
-async for result in agent.run(messages, stream=True):
-    # Process streaming chunks
-    for item in result["output"]:
-        if item["type"] == "message":
-            print(item["content"][0]["text"], end="", flush=True)
-        elif item["type"] == "computer_call":
-            action = item["action"]
-            print(f"\n[Action: {action['type']}]")
-```
-
-### Interactive Chat Loop
-
-```python
-history = []
-while True:
-    user_input = input("> ")
-    if user_input.lower() in ['quit', 'exit']:
-        break
-        
-    history.append({"role": "user", "content": user_input})
-    
-    async for result in agent.run(history):
-        history += result["output"]
-        
-        # Display assistant responses
-        for item in result["output"]:
-            if item["type"] == "message":
-                print(item["content"][0]["text"])
-```
-
-### Error Handling
-
-```python
-try:
-    async for result in agent.run(messages):
-        # Process results
-        pass
-except BudgetExceededException:
-    print("Budget limit exceeded")
-except Exception as e:
-    print(f"Agent error: {e}")
-```
-
-## API Reference
-
-### ComputerAgent.run()
-
-```python
-async def run(
-    self,
-    messages: Messages,
-    stream: bool = False,
-    **kwargs
-) -> AsyncGenerator[Dict[str, Any], None]:
-    """
-    Run the agent with the given messages.
-    
-    Args:
-        messages: List of message dictionaries
-        stream: Whether to stream the response
-        **kwargs: Additional arguments
-        
-    Returns:
-        AsyncGenerator that yields response chunks
-    """
-```
-
-### Message Format
-
-```python
-messages = [
-    {
-        "role": "user",
-        "content": "Take a screenshot and describe what you see"
-    },
-    {
-        "role": "assistant", 
-        "content": "I'll take a screenshot for you."
-    }
-]
-```
-
-### Response Format
-
-```python
-{
-    "output": [
-        {
-            "type": "message",
-            "role": "assistant",
-            "content": [{"type": "output_text", "text": "I can see..."}]
-        },
-        {
-            "type": "computer_call",
-            "action": {"type": "screenshot"},
-            "call_id": "call_123"
-        },
-        {
-            "type": "computer_call_output",
-            "call_id": "call_123",
-            "output": {"image_url": "data:image/png;base64,..."}
-        }
-    ],
-    "usage": {
-        "prompt_tokens": 150,
-        "completion_tokens": 75,
-        "total_tokens": 225,
-        "response_cost": 0.01,
-    }
-}
-```
+- [Agent Loops](https://trycua.com/docs/agent-sdk/agent-loops)
+- [Supported Agents](https://trycua.com/docs/agent-sdk/supported-agents)
+- [Supported Models](https://trycua.com/docs/agent-sdk/supported-models)
+- [Chat History](https://trycua.com/docs/agent-sdk/chat-history)
+- [Callbacks](https://trycua.com/docs/agent-sdk/callbacks)
+- [Custom Tools](https://trycua.com/docs/agent-sdk/custom-tools)
+- [Custom Computer Handlers](https://trycua.com/docs/agent-sdk/custom-computer-handlers)
+- [Prompt Caching](https://trycua.com/docs/agent-sdk/prompt-caching)
+- [Usage Tracking](https://trycua.com/docs/agent-sdk/usage-tracking)
+- [Benchmarks](https://trycua.com/docs/agent-sdk/benchmarks)

 ## License