diff --git a/docs/content/docs/libraries/mcp-server/client-integrations.mdx b/docs/content/docs/libraries/mcp-server/client-integrations.mdx index 8699cda0..6a79f5b3 100644 --- a/docs/content/docs/libraries/mcp-server/client-integrations.mdx +++ b/docs/content/docs/libraries/mcp-server/client-integrations.mdx @@ -6,6 +6,67 @@ title: Client Integrations To use with Claude Desktop, add an entry to your Claude Desktop configuration (`claude_desktop_config.json`, typically found in `~/.config/claude-desktop/`): +### Package Installation Method + +```json +{ + "mcpServers": { + "cua-agent": { + "command": "/bin/bash", + "args": ["~/.cua/start_mcp_server.sh"], + "env": { + "CUA_MODEL_NAME": "anthropic/claude-sonnet-4-20250514", + "ANTHROPIC_API_KEY": "your-anthropic-api-key-here", + "CUA_MAX_IMAGES": "3", + "CUA_USE_HOST_COMPUTER_SERVER": "false" + } + } + } +} +``` + +### Development Method + +If you're working with the CUA source code: + +**Standard VM Mode:** +```json +{ + "mcpServers": { + "cua-agent": { + "command": "/usr/bin/env", + "args": [ + "bash", "-lc", + "export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh" + ] + } + } +} +``` + +**Host Computer Control Mode:** +```json +{ + "mcpServers": { + "cua-agent": { + "command": "/usr/bin/env", + "args": [ + "bash", "-lc", + "export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh" + ] + } + } +} +``` + +**Note**: Replace `/path/to/cua` with the absolute path to your CUA repository directory. + +**⚠️ Host Computer Control Setup**: When using `CUA_USE_HOST_COMPUTER_SERVER='true'`, you must also: +1. Install computer server dependencies: `python3 -m pip install uvicorn fastapi` +2. Install the computer server: `python3 -m pip install -e libs/python/computer-server --break-system-packages` +3. Start the computer server: `python -m computer_server --log-level debug` +4. The AI will have direct access to your desktop - use with caution! + For more information on MCP with Claude Desktop, see the [official MCP User Guide](https://modelcontextprotocol.io/quickstart/user). ## Cursor Integration @@ -15,6 +76,43 @@ To use with Cursor, add an MCP configuration file in one of these locations: - **Project-specific**: Create `.cursor/mcp.json` in your project directory - **Global**: Create `~/.cursor/mcp.json` in your home directory +Example configuration for Cursor: + +```json +{ + "mcpServers": { + "cua-agent": { + "command": "/bin/bash", + "args": ["~/.cua/start_mcp_server.sh"], + "env": { + "CUA_MODEL_NAME": "anthropic/claude-sonnet-4-20250514", + "ANTHROPIC_API_KEY": "your-anthropic-api-key-here" + } + } + } +} +``` + After configuration, you can simply tell Cursor's Agent to perform computer tasks by explicitly mentioning the CUA agent, such as "Use the computer control tools to open Safari." -For more information on MCP with Cursor, see the [official Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol). \ No newline at end of file +For more information on MCP with Cursor, see the [official Cursor MCP documentation](https://docs.cursor.com/context/model-context-protocol). + +## Other MCP Clients + +The MCP server is compatible with any MCP-compliant client. The server exposes the following tools: + +- `run_cua_task` - Execute single computer tasks +- `run_multi_cua_tasks` - Execute multiple tasks (sequential or concurrent) +- `screenshot_cua` - Capture screenshots +- `get_session_stats` - Monitor session statistics +- `cleanup_session` - Manage session lifecycle + +### Configuration Options + +All MCP clients can configure the server using environment variables: + +- `CUA_MODEL_NAME` - Model to use for task execution +- `CUA_MAX_IMAGES` - Maximum images to keep in context +- `CUA_USE_HOST_COMPUTER_SERVER` - Use host system instead of VM + +See the [Configuration](/docs/libraries/mcp-server/configuration) page for detailed configuration options. \ No newline at end of file diff --git a/docs/content/docs/libraries/mcp-server/configuration.mdx b/docs/content/docs/libraries/mcp-server/configuration.mdx index e5df8293..cce1957c 100644 --- a/docs/content/docs/libraries/mcp-server/configuration.mdx +++ b/docs/content/docs/libraries/mcp-server/configuration.mdx @@ -6,5 +6,64 @@ The server is configured using environment variables (can be set in the Claude D | Variable | Description | Default | |----------|-------------|---------| -| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-3-5-sonnet-20241022", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-3-5-sonnet-20241022 | +| `CUA_MODEL_NAME` | Model string (e.g., "anthropic/claude-sonnet-4-20250514", "anthropic/claude-3-5-sonnet-20240620", "openai/computer-use-preview", "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", "omniparser+litellm/gpt-4o", "omniparser+ollama_chat/gemma3") | anthropic/claude-sonnet-4-20250514 | +| `ANTHROPIC_API_KEY` | Your Anthropic API key (required for Anthropic models) | None | | `CUA_MAX_IMAGES` | Maximum number of images to keep in context | 3 | +| `CUA_USE_HOST_COMPUTER_SERVER` | Target your local desktop instead of a VM. Set to "true" to use your host system. **Warning:** AI models may perform risky actions. | false | + +## Model Configuration + +The `CUA_MODEL_NAME` environment variable supports various model providers through LiteLLM integration: + +### Supported Providers +- **Anthropic**: `anthropic/claude-sonnet-4-20250514`, `anthropic/claude-3-5-sonnet-20240620`, `anthropic/claude-3-haiku-20240307` +- **OpenAI**: `openai/computer-use-preview`, `openai/gpt-4o` +- **Local Models**: `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B` +- **Omni + LiteLLM**: `omniparser+litellm/gpt-4o`, `omniparser+litellm/claude-3-haiku` +- **Ollama**: `omniparser+ollama_chat/gemma3` + +### Example Configurations + +**Claude Desktop Configuration:** +```json +{ + "mcpServers": { + "cua-agent": { + "command": "/bin/bash", + "args": ["~/.cua/start_mcp_server.sh"], + "env": { + "CUA_MODEL_NAME": "anthropic/claude-sonnet-4-20250514", + "ANTHROPIC_API_KEY": "your-anthropic-api-key-here", + "CUA_MAX_IMAGES": "5", + "CUA_USE_HOST_COMPUTER_SERVER": "false" + } + } + } +} +``` + +**Local Model Configuration:** +```json +{ + "mcpServers": { + "cua-agent": { + "command": "/bin/bash", + "args": ["~/.cua/start_mcp_server.sh"], + "env": { + "CUA_MODEL_NAME": "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", + "CUA_MAX_IMAGES": "3" + } + } + } +} +``` + +## Session Management Configuration + +The MCP server automatically manages sessions with the following defaults: +- **Max Concurrent Sessions**: 10 +- **Session Timeout**: 10 minutes of inactivity +- **Computer Pool Size**: 5 instances +- **Automatic Cleanup**: Enabled + +These settings are optimized for typical usage and don't require configuration for most users. diff --git a/docs/content/docs/libraries/mcp-server/index.mdx b/docs/content/docs/libraries/mcp-server/index.mdx index 87c9a342..a20b5d09 100644 --- a/docs/content/docs/libraries/mcp-server/index.mdx +++ b/docs/content/docs/libraries/mcp-server/index.mdx @@ -6,4 +6,22 @@ github: - https://github.com/trycua/cua/tree/main/libs/python/mcp-server --- -**cua-mcp-server** is a MCP server for the Computer-Use Agent (CUA), allowing you to run CUA through Claude Desktop or other MCP clients. \ No newline at end of file +**cua-mcp-server** is a MCP server for the Computer-Use Agent (CUA), allowing you to run CUA through Claude Desktop or other MCP clients. + +## Features + +- **Multi-Client Support**: Concurrent sessions with automatic resource management +- **Progress Reporting**: Real-time progress updates during task execution +- **Error Handling**: Robust error recovery with screenshot capture +- **Concurrent Execution**: Run multiple tasks in parallel for improved performance +- **Session Management**: Automatic cleanup and resource pooling +- **LiteLLM Integration**: Support for multiple model providers +- **VM Safety**: Default VM execution with optional host system control + +## Quick Start + +1. **Install**: `pip install cua-mcp-server` +2. **Configure**: Add to your MCP client configuration +3. **Use**: Ask Claude to perform computer tasks + +See the [Installation](/docs/libraries/mcp-server/installation) guide for detailed setup instructions. \ No newline at end of file diff --git a/docs/content/docs/libraries/mcp-server/installation.mdx b/docs/content/docs/libraries/mcp-server/installation.mdx index c04a4917..ce4f87a6 100644 --- a/docs/content/docs/libraries/mcp-server/installation.mdx +++ b/docs/content/docs/libraries/mcp-server/installation.mdx @@ -36,18 +36,98 @@ You can then use the script in your MCP configuration like this: "command": "/bin/bash", "args": ["~/.cua/start_mcp_server.sh"], "env": { - "CUA_MODEL_NAME": "anthropic/claude-3-5-sonnet-20241022" + "CUA_MODEL_NAME": "anthropic/claude-sonnet-4-20250514", + "ANTHROPIC_API_KEY": "your-anthropic-api-key-here" } } } } ``` +**Important**: You must include your Anthropic API key for the MCP server to work properly. + +## Development Setup + +If you're working with the CUA source code directly (like in the CUA repository), you can use the development script instead: + +```json +{ + "mcpServers": { + "cua-agent": { + "command": "/usr/bin/env", + "args": [ + "bash", "-lc", + "export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh" + ] + } + } +} +``` + +**For host computer control** (development setup): + +1. **Install Computer Server Dependencies**: + ```bash + python3 -m pip install uvicorn fastapi + python3 -m pip install -e libs/python/computer-server --break-system-packages + ``` + +2. **Start the Computer Server**: + ```bash + cd /path/to/cua + python -m computer_server --log-level debug + ``` + This will start the computer server on `http://localhost:8000` that controls your actual desktop. + +3. **Configure Claude Desktop**: + ```json + { + "mcpServers": { + "cua-agent": { + "command": "/usr/bin/env", + "args": [ + "bash", "-lc", + "export CUA_MODEL_NAME='anthropic/claude-sonnet-4-20250514'; export ANTHROPIC_API_KEY='your-anthropic-api-key-here'; export CUA_USE_HOST_COMPUTER_SERVER='true'; export CUA_MAX_IMAGES='1'; /path/to/cua/libs/python/mcp-server/scripts/start_mcp_server.sh" + ] + } + } + } + ``` + +**Note**: Replace `/path/to/cua` with the absolute path to your CUA repository directory. + +**⚠️ Important**: When using host computer control (`CUA_USE_HOST_COMPUTER_SERVER='true'`), the AI will have direct access to your desktop and can perform actions like opening applications, clicking, typing, and taking screenshots. Make sure you're comfortable with this level of access. + ### Troubleshooting -If you get a `/bin/bash: ~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative. +**Common Issues:** -To see the logs: -``` +1. **"Claude's response was interrupted"** - This usually means: + - Missing API key: Add `ANTHROPIC_API_KEY` to your environment variables + - Invalid model name: Use a valid model like `anthropic/claude-sonnet-4-20250514` + - Check logs for specific error messages + +2. **"Missing Anthropic API Key"** - Add your API key to the configuration: + ```json + "env": { + "ANTHROPIC_API_KEY": "your-api-key-here" + } + ``` + +3. **"model not found"** - Use a valid model name: + - ✅ `anthropic/claude-sonnet-4-20250514` + - ✅ `anthropic/claude-3-5-sonnet-20240620` + - ❌ `anthropic/claude-3-5-sonnet-20241022` (doesn't exist) + +4. **Script not found** - If you get a `/bin/bash: ~/cua/libs/python/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative. + +5. **Host Computer Control Issues** - If using `CUA_USE_HOST_COMPUTER_SERVER='true'`: + - **Computer Server not running**: Make sure you've started the computer server with `python -m computer_server --log-level debug` + - **Port 8000 in use**: Check if another process is using port 8000 with `lsof -i :8000` + - **Missing dependencies**: Install `uvicorn` and `fastapi` with `python3 -m pip install uvicorn fastapi` + - **Image size errors**: Use `CUA_MAX_IMAGES='1'` to reduce image context size + +**Viewing Logs:** +```bash tail -n 20 -f ~/Library/Logs/Claude/mcp*.log ``` \ No newline at end of file diff --git a/docs/content/docs/libraries/mcp-server/tools.mdx b/docs/content/docs/libraries/mcp-server/tools.mdx index edf29c0b..20e91311 100644 --- a/docs/content/docs/libraries/mcp-server/tools.mdx +++ b/docs/content/docs/libraries/mcp-server/tools.mdx @@ -6,5 +6,58 @@ title: Tools The MCP server exposes the following tools to Claude: -1. `run_cua_task` - Run a single Computer-Use Agent task with the given instruction -2. `run_multi_cua_tasks` - Run multiple tasks in sequence \ No newline at end of file +### Core Task Execution Tools + +1. **`run_cua_task`** - Run a single Computer-Use Agent task with the given instruction + - `task` (string): The task description for the agent to execute + - `session_id` (string, optional): Session ID for multi-client support. If not provided, a new session will be created + - Returns: Tuple of (combined text output, final screenshot) + +2. **`run_multi_cua_tasks`** - Run multiple tasks in sequence or concurrently + - `tasks` (list of strings): List of task descriptions to execute + - `session_id` (string, optional): Session ID for multi-client support. If not provided, a new session will be created + - `concurrent` (boolean, optional): If true, run tasks concurrently. If false, run sequentially (default) + - Returns: List of tuples (combined text output, screenshot) for each task + +### Utility Tools + +3. **`screenshot_cua`** - Take a screenshot of the current screen + - `session_id` (string, optional): Session ID for multi-client support. If not provided, a new session will be created + - Returns: Screenshot image + +4. **`get_session_stats`** - Get statistics about active sessions and resource usage + - Returns: Dictionary with session statistics including total sessions, active tasks, and session details + +5. **`cleanup_session`** - Cleanup a specific session and release its resources + - `session_id` (string): The session ID to cleanup + - Returns: Confirmation message + +## Session Management + +The MCP server supports multi-client sessions with automatic resource management: + +- **Session Isolation**: Each client can have its own session with isolated computer instances +- **Resource Pooling**: Computer instances are pooled for efficient resource usage +- **Automatic Cleanup**: Idle sessions are automatically cleaned up after 10 minutes +- **Concurrent Tasks**: Multiple tasks can run concurrently within the same session +- **Progress Reporting**: Real-time progress updates during task execution + +## Usage Examples + +### Basic Task Execution +``` +"Open Chrome and navigate to github.com" +"Create a folder called 'Projects' on my desktop" +``` + +### Multi-Task Execution +``` +"Run these tasks: 1) Open Finder, 2) Navigate to Documents, 3) Create a new folder called 'Work'" +``` + +### Session Management +``` +"Take a screenshot of the current screen" +"Show me the session statistics" +"Cleanup session abc123" +``` \ No newline at end of file diff --git a/docs/content/docs/libraries/mcp-server/usage.mdx b/docs/content/docs/libraries/mcp-server/usage.mdx index 19eef934..1748490a 100644 --- a/docs/content/docs/libraries/mcp-server/usage.mdx +++ b/docs/content/docs/libraries/mcp-server/usage.mdx @@ -2,7 +2,7 @@ title: Usage --- -## Usage +## Basic Usage Once configured, you can simply ask Claude to perform computer tasks: @@ -13,8 +13,140 @@ Once configured, you can simply ask Claude to perform computer tasks: Claude will automatically use your CUA agent to perform these tasks. -### First-time Usage Notes +## Advanced Features + +### Progress Reporting +The MCP server provides real-time progress updates during task execution: +- Task progress is reported as percentages (0-100%) +- Multi-task operations show progress for each individual task +- Progress updates are streamed to the MCP client for real-time feedback + +### Error Handling +Robust error handling ensures reliable operation: +- Failed tasks return error messages with screenshots when possible +- Session state is preserved even when individual tasks fail +- Automatic cleanup prevents resource leaks +- Detailed error logging for troubleshooting + +### Concurrent Task Execution +For improved performance, multiple tasks can run concurrently: +- Set `concurrent=true` in `run_multi_cua_tasks` for parallel execution +- Each task runs in its own context with isolated state +- Progress tracking works for both sequential and concurrent modes +- Resource pooling ensures efficient computer instance usage + +### Session Management +Multi-client support with automatic resource management: +- Each client gets isolated sessions with separate computer instances +- Sessions automatically clean up after 10 minutes of inactivity +- Resource pooling prevents resource exhaustion +- Session statistics available for monitoring + +## Target Computer Options + +By default, the MCP server runs CUA in a virtual machine for safety. However, you can also configure it to run on your local system. + +### Default: Using a VM (Recommended) + +The MCP server will automatically start and connect to a VM based on your platform. This is the safest option as AI actions are isolated from your host system. + +No additional configuration is needed - this is the default behavior. + +### Option: Targeting Your Local Desktop + + + **Warning:** When targeting your local system, AI models have direct access to your desktop and may perform risky actions. Use with caution. + + +To have the MCP server control your local desktop instead of a VM: + +1. **Start the Computer Server on your host:** + +```bash +pip install cua-computer-server +python -m computer_server +``` + +2. **Configure the MCP server to use your host system:** + +Add the `CUA_USE_HOST_COMPUTER_SERVER` environment variable to your MCP client configuration: + + + + Update your Claude Desktop config (see [Installation](/docs/libraries/mcp-server/installation)) to include the environment variable: + + ```json + { + "mcpServers": { + "cua-agent": { + "command": "/bin/bash", + "args": ["~/.cua/start_mcp_server.sh"], + "env": { + "CUA_MODEL_NAME": "anthropic/claude-3-5-sonnet-20241022", + "CUA_USE_HOST_COMPUTER_SERVER": "true" + } + } + } + } + ``` + + + Set the environment variable in your MCP client configuration: + + ```bash + export CUA_USE_HOST_COMPUTER_SERVER=true + ``` + + Then start your MCP client as usual. + + + +3. **Restart your MCP client** (e.g., Claude Desktop) to apply the changes. + +Now Claude will control your local desktop directly when you ask it to perform computer tasks. + +## Usage Examples + +### Single Task Execution +``` +"Open Safari and navigate to apple.com" +"Create a new folder on the desktop called 'My Projects'" +"Take a screenshot of the current screen" +``` + +### Multi-Task Execution (Sequential) +``` +"Run these tasks in order: 1) Open Finder, 2) Navigate to Documents folder, 3) Create a new folder called 'Work'" +``` + +### Multi-Task Execution (Concurrent) +``` +"Run these tasks simultaneously: 1) Open Chrome, 2) Open Safari, 3) Open Finder" +``` + +### Session Management +``` +"Show me the current session statistics" +"Take a screenshot using session abc123" +"Cleanup session xyz789" +``` + +### Error Recovery +``` +"Try to open a non-existent application and show me the error" +"Find all files with .tmp extension and delete them safely" +``` + +## First-time Usage Notes **API Keys**: Ensure you have valid API keys: - - Add your Anthropic API key, or other model provider API key in the Claude Desktop config (as shown above) + - Add your Anthropic API key in the Claude Desktop config (as shown above) - Or set it as an environment variable in your shell profile + - **Required**: The MCP server needs an API key to authenticate with the model provider + +**Model Selection**: Choose the appropriate model for your needs: + - **Claude Sonnet 4**: Latest model with best performance (`anthropic/claude-sonnet-4-20250514`) + - **Claude 3.5 Sonnet**: Reliable performance (`anthropic/claude-3-5-sonnet-20240620`) + - **Computer-Use Preview**: Specialized for computer tasks (`openai/computer-use-preview`) + - **Local Models**: For privacy-sensitive environments + - **Ollama**: For offline usage diff --git a/libs/python/computer-server/computer_server/handlers/macos.py b/libs/python/computer-server/computer_server/handlers/macos.py index ce341668..6a831c17 100644 --- a/libs/python/computer-server/computer_server/handlers/macos.py +++ b/libs/python/computer-server/computer_server/handlers/macos.py @@ -1287,7 +1287,15 @@ class MacOSAutomationHandler(BaseAutomationHandler): if not isinstance(screenshot, Image.Image): return {"success": False, "error": "Failed to capture screenshot"} + # Resize image to reduce size (max width 1920, maintain aspect ratio) + max_width = 1920 + if screenshot.width > max_width: + ratio = max_width / screenshot.width + new_height = int(screenshot.height * ratio) + screenshot = screenshot.resize((max_width, new_height), Image.Resampling.LANCZOS) + buffered = BytesIO() + # Use PNG format with optimization to reduce file size screenshot.save(buffered, format="PNG", optimize=True) buffered.seek(0) image_data = base64.b64encode(buffered.getvalue()).decode() diff --git a/libs/python/mcp-server/mcp_server/session_manager.py b/libs/python/mcp-server/mcp_server/session_manager.py index dc8d480b..a415feac 100644 --- a/libs/python/mcp-server/mcp_server/session_manager.py +++ b/libs/python/mcp-server/mcp_server/session_manager.py @@ -10,6 +10,7 @@ This module provides: import asyncio import logging +import os import time import uuid import weakref @@ -57,7 +58,14 @@ class ComputerPool: logger.debug("Creating new computer instance") from computer import Computer - computer = Computer(verbosity=logging.INFO) + # Check if we should use host computer server + use_host = os.getenv("CUA_USE_HOST_COMPUTER_SERVER", "false").lower() in ( + "true", + "1", + "yes", + ) + + computer = Computer(verbosity=logging.INFO, use_host_computer_server=use_host) await computer.run() self._in_use.add(computer) return computer