Update README with providers

This commit is contained in:
f-trycua
2025-03-30 10:45:56 +02:00
parent a2f30a2d40
commit 7bc2ca87d4
7 changed files with 115 additions and 231 deletions

View File

@@ -51,8 +51,38 @@ async def run_agent_example():
for i, task in enumerate(tasks):
print(f"\nExecuting task {i}/{len(tasks)}: {task}")
async for result in agent.run(task):
# print(result)
pass
print("Response ID: ", result.get("id"))
# Print detailed usage information
usage = result.get("usage")
if usage:
print("\nUsage Details:")
print(f" Input Tokens: {usage.get('input_tokens')}")
if "input_tokens_details" in usage:
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
print(f" Output Tokens: {usage.get('output_tokens')}")
if "output_tokens_details" in usage:
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
print(f" Total Tokens: {usage.get('total_tokens')}")
print("Response Text: ", result.get("text"))
# Print tools information
tools = result.get("tools")
if tools:
print("\nTools:")
print(tools)
# Print reasoning and tool call outputs
outputs = result.get("output", [])
for output in outputs:
output_type = output.get("type")
if output_type == "reasoning":
print("\nReasoning Output:")
print(output)
elif output_type == "computer_call":
print("\nTool Call Output:")
print(output)
print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")

View File

@@ -15,9 +15,7 @@
</h1>
</div>
**Agent** is a Computer Use (CUA) framework for running multi-app agentic workflows targeting macOS and Linux sandbox, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen). The framework integrates with Microsoft's OmniParser for enhanced UI understanding and interaction.
> While our north star is to create a 1-click experience, this preview of Agent might be still a bit rough around the edges. We appreciate your patience as we work to improve the experience.
**cua-agent** is a general Computer-Use framework for running multi-app agentic workflows targeting macOS and Linux sandbox created with Cua, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).
### Get started with Agent
@@ -27,18 +25,92 @@
## Install
### cua-agent
```bash
pip install "cua-agent[all]"
# or install specific loop providers
pip install "cua-agent[anthropic]"
pip install "cua-agent[omni]"
pip install "cua-agent[openai]" # OpenAI Cua Loop
pip install "cua-agent[anthropic]" # Anthropic Cua Loop
pip install "cua-agent[omni]" # Cua Loop based on OmniParser
```
## Run
```bash
async with Computer() as macos_computer:
# Create agent with loop and provider
agent = ComputerAgent(
computer=macos_computer,
loop=AgentLoop.OPENAI,
model=LLM(provider=LLMProvider.OPENAI)
)
tasks = [
"Look for a repository named trycua/cua on GitHub.",
"Check the open issues, open the most recent one and read it.",
"Clone the repository in users/lume/projects if it doesn't exist yet.",
"Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
"From Cursor, open Composer if not already open.",
"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
]
for i, task in enumerate(tasks):
print(f"\nExecuting task {i}/{len(tasks)}: {task}")
async for result in agent.run(task):
print(result)
print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
```
Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):
- [Agent Notebook](../../notebooks/agent_nb.ipynb) - Complete examples and workflows
- [Agent Notebook](../../notebooks/agent_nb.ipynb) - Complete examples and workflows
## Agent Loops
The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
| Agent Loop | Supported Models | Description | Set-Of-Marks |
|:-----------|:-----------------|:------------|:-------------|
| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
| `AgentLoop.OMNI` <br>(preview) | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219`<br>• `gpt-4.5-preview`<br>• `gpt-4o`<br>• `gpt-4`<br>• `gpt-3.5-turbo` | Use OmniParser for element pixel-detection (SoM) and any VLMs | OmniParser |
## AgentResponse
The `AgentResponse` class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new [OpenAI Agent SDK specification](https://platform.openai.com/docs/api-reference/responses) for better consistency across different agent loops.
```python
async for result in agent.run(task):
print("Response ID: ", result.get("id"))
# Print detailed usage information
usage = result.get("usage")
if usage:
print("\nUsage Details:")
print(f" Input Tokens: {usage.get('input_tokens')}")
if "input_tokens_details" in usage:
print(f" Input Tokens Details: {usage.get('input_tokens_details')}")
print(f" Output Tokens: {usage.get('output_tokens')}")
if "output_tokens_details" in usage:
print(f" Output Tokens Details: {usage.get('output_tokens_details')}")
print(f" Total Tokens: {usage.get('total_tokens')}")
print("Response Text: ", result.get("text"))
# Print tools information
tools = result.get("tools")
if tools:
print("\nTools:")
print(tools)
# Print reasoning and tool call outputs
outputs = result.get("output", [])
for output in outputs:
output_type = output.get("type")
if output_type == "reasoning":
print("\nReasoning Output:")
print(output)
elif output_type == "computer_call":
print("\nTool Call Output:")
print(output)
```

View File

@@ -1,63 +0,0 @@
# Agent Package Structure
## Overview
The agent package provides a modular and extensible framework for AI-powered computer agents.
## Directory Structure
```
agent/
├── __init__.py # Package exports
├── core/ # Core functionality
│ ├── __init__.py
│ ├── computer_agent.py # Main entry point
│ └── factory.py # Provider factory
├── base/ # Base implementations
│ ├── __init__.py
│ ├── agent.py # Base agent class
│ ├── core/ # Core components
│ │ ├── callbacks.py
│ │ ├── loop.py
│ │ └── messages.py
│ └── tools/ # Tool implementations
├── providers/ # Provider implementations
│ ├── __init__.py
│ ├── anthropic/ # Anthropic provider
│ │ ├── agent.py
│ │ ├── loop.py
│ │ └── tool_manager.py
│ └── omni/ # Omni provider
│ ├── agent.py
│ ├── loop.py
│ └── tool_manager.py
└── types/ # Type definitions
├── __init__.py
├── base.py # Core types
├── messages.py # Message types
├── tools.py # Tool types
└── providers/ # Provider-specific types
├── anthropic.py
└── omni.py
```
## Key Components
### Core
- `computer_agent.py`: Main entry point for creating and using agents
- `factory.py`: Factory for creating provider-specific implementations
### Base
- `agent.py`: Base agent implementation with shared functionality
- `core/`: Core components used across providers
- `tools/`: Shared tool implementations
### Providers
Each provider follows the same structure:
- `agent.py`: Provider-specific agent implementation
- `loop.py`: Provider-specific message loop
- `tool_manager.py`: Tool management for provider
### Types
- `base.py`: Core type definitions
- `messages.py`: Message-related types
- `tools.py`: Tool-related types
- `providers/`: Provider-specific type definitions

View File

@@ -1,5 +0,0 @@
"""OpenAI API client module."""
from .client import OpenAIClient
__all__ = ["OpenAIClient"]

View File

@@ -1,137 +0,0 @@
"""OpenAI API client for Agent Response API."""
import logging
import json
import os
import httpx
from typing import Dict, List, Optional, Any, Union
logger = logging.getLogger(__name__)
class OpenAIClient:
"""Client for OpenAI's Agent Response API."""
def __init__(
self,
api_key: str,
model: str = "computer-use-preview",
base_url: str = "https://api.openai.com/v1",
max_retries: int = 3,
timeout: int = 120,
**kwargs,
):
"""Initialize OpenAI API client.
Args:
api_key: OpenAI API key
model: Model to use for completions (should always be computer-use-preview)
base_url: Base URL for API requests
max_retries: Maximum number of retries for API calls
timeout: Timeout for API calls in seconds
**kwargs: Additional arguments to pass to the httpx client
"""
self.api_key = api_key
# Always use computer-use-preview model
if model != "computer-use-preview":
logger.warning(
f"Overriding provided model '{model}' with required model 'computer-use-preview'"
)
model = "computer-use-preview"
self.model = model
self.base_url = base_url
self.max_retries = max_retries
self.timeout = timeout
# Create httpx client with auth and timeout
self.client = httpx.AsyncClient(
timeout=timeout,
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"OpenAI-Beta": "computer-use-2023-09-30", # Required beta header for computer use
},
**kwargs,
)
# Additional initialization for organization if available
openai_org = os.environ.get("OPENAI_ORG")
if openai_org:
self.client.headers["OpenAI-Organization"] = openai_org
logger.info(f"Initialized OpenAI client with model {model}")
async def create_response(
self,
input: List[Dict[str, Any]],
tools: Optional[List[Dict[str, Any]]] = None,
truncation: str = "auto",
temperature: float = 0.7,
top_p: float = 1.0,
**kwargs,
) -> Dict[str, Any]:
"""Create a response using the OpenAI Agent Response API.
Args:
input: List of messages in the conversation (must be in Agent Response API format)
tools: List of tools available to the agent
truncation: How to handle truncation (auto, truncate)
temperature: Sampling temperature
top_p: Nucleus sampling parameter
**kwargs: Additional parameters to include in the request
Returns:
Response from the API
"""
url = f"{self.base_url}/responses"
# Prepare request payload
payload = {
"model": self.model,
"input": input,
"temperature": temperature,
"top_p": top_p,
"truncation": truncation,
**kwargs,
}
# Add tools if provided
if tools:
payload["tools"] = tools
try:
logger.debug(f"Sending request to {url}")
# Make API call
response = await self.client.post(url, json=payload)
# Check for errors
try:
response.raise_for_status()
except httpx.HTTPStatusError as e:
error_detail = e.response.text
try:
# Try to parse the error as JSON for better debugging
error_json = json.loads(error_detail)
logger.error(f"HTTP error from OpenAI API: {json.dumps(error_json, indent=2)}")
except:
logger.error(f"HTTP error from OpenAI API: {error_detail}")
raise
result = response.json()
logger.debug("Received successful response")
return result
except httpx.HTTPStatusError as e:
error_detail = e.response.text if hasattr(e, "response") else str(e)
logger.error(f"HTTP error from OpenAI API: {error_detail}")
raise RuntimeError(f"OpenAI API error: {error_detail}")
except Exception as e:
logger.error(f"Error calling OpenAI API: {str(e)}")
raise RuntimeError(f"Error calling OpenAI API: {str(e)}")
async def close(self):
"""Close the httpx client."""
await self.client.aclose()

View File

@@ -10,7 +10,6 @@ from ...core.base import BaseLoop
from ...core.types import AgentResponse
from ...core.messages import StandardMessageManager, ImageRetentionConfig
from .api.client import OpenAIClient
from .api_handler import OpenAIAPIHandler
from .response_handler import OpenAIResponseHandler
from .tools.manager import ToolManager
@@ -109,15 +108,8 @@ class OpenAILoop(BaseLoop):
client, tool manager, and message manager.
"""
try:
logger.info(f"Initializing OpenAI client with model {self.model}...")
# Initialize client
self.client = OpenAIClient(api_key=self.api_key, model=self.model)
# Initialize tool manager
await self.tool_manager.initialize()
logger.info(f"Initialized OpenAI client with model {self.model}")
except Exception as e:
logger.error(f"Error initializing OpenAI client: {str(e)}")
self.client = None
@@ -142,13 +134,8 @@ class OpenAILoop(BaseLoop):
# Create queue for response streaming
queue = asyncio.Queue()
# Ensure client is initialized
if self.client is None:
logger.info("Initializing client...")
await self.initialize_client()
if self.client is None:
raise RuntimeError("Failed to initialize client")
logger.info("Client initialized successfully")
# Ensure tool manager is initialized
await self.tool_manager.initialize()
# Start loop in background task
loop_task = asyncio.create_task(self._run_loop(queue, messages))

View File

@@ -15,7 +15,7 @@
</h1>
</div>
**Computer** is a Computer-Use Interface (CUI) framework powering Cua for interacting with local macOS and Linux sandboxes, PyAutoGUI-compatible, and pluggable with any AI agent systems (Cua, Langchain, CrewAI, AutoGen). Computer relies on [Lume](https://github.com/trycua/lume) for creating and managing sandbox environments.
**cua-computer** is a Computer-Use Interface (CUI) framework powering Cua for interacting with local macOS and Linux sandboxes, PyAutoGUI-compatible, and pluggable with any AI agent systems (Cua, Langchain, CrewAI, AutoGen). Computer relies on [Lume](https://github.com/trycua/lume) for creating and managing sandbox environments.
### Get started with Computer