Update README with providers

2026-01-01 11:00:31 -06:00 · 2025-03-30 10:45:56 +02:00
parent a2f30a2d40
commit 7bc2ca87d4
7 changed files with 115 additions and 231 deletions
--- a/examples/agent_examples.py
+++ b/examples/agent_examples.py
@@ -51,8 +51,38 @@ async def run_agent_example():
            for i, task in enumerate(tasks):
                print(f"\nExecuting task {i}/{len(tasks)}: {task}")
                async for result in agent.run(task):
-                    # print(result)
-                    pass
+                    print("Response ID: ", result.get("id"))
+
+                    # Print detailed usage information
+                    usage = result.get("usage")
+                    if usage:
+                        print("\nUsage Details:")
+                        print(f"  Input Tokens: {usage.get('input_tokens')}")
+                        if "input_tokens_details" in usage:
+                            print(f"  Input Tokens Details: {usage.get('input_tokens_details')}")
+                        print(f"  Output Tokens: {usage.get('output_tokens')}")
+                        if "output_tokens_details" in usage:
+                            print(f"  Output Tokens Details: {usage.get('output_tokens_details')}")
+                        print(f"  Total Tokens: {usage.get('total_tokens')}")
+
+                    print("Response Text: ", result.get("text"))
+
+                    # Print tools information
+                    tools = result.get("tools")
+                    if tools:
+                        print("\nTools:")
+                        print(tools)
+
+                    # Print reasoning and tool call outputs
+                    outputs = result.get("output", [])
+                    for output in outputs:
+                        output_type = output.get("type")
+                        if output_type == "reasoning":
+                            print("\nReasoning Output:")
+                            print(output)
+                        elif output_type == "computer_call":
+                            print("\nTool Call Output:")
+                            print(output)

                print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")

--- a/libs/agent/README.md
+++ b/libs/agent/README.md
@@ -15,9 +15,7 @@
 </h1>
 </div>

-**Agent** is a Computer Use (CUA) framework for running multi-app agentic workflows targeting macOS and Linux sandbox, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen). The framework integrates with Microsoft's OmniParser for enhanced UI understanding and interaction.
-
-> While our north star is to create a 1-click experience, this preview of Agent might be still a bit rough around the edges. We appreciate your patience as we work to improve the experience.
+**cua-agent** is a general Computer-Use framework for running multi-app agentic workflows targeting macOS and Linux sandbox created with Cua, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).

 ### Get started with Agent

@@ -27,18 +25,92 @@

 ## Install

-### cua-agent
-
 ```bash
 pip install "cua-agent[all]"

 # or install specific loop providers
-pip install "cua-agent[anthropic]"
-pip install "cua-agent[omni]"
+pip install "cua-agent[openai]" # OpenAI Cua Loop
+pip install "cua-agent[anthropic]" # Anthropic Cua Loop
+pip install "cua-agent[omni]" # Cua Loop based on OmniParser
 ```

 ## Run

+```bash
+async with Computer() as macos_computer:
+  # Create agent with loop and provider
+  agent = ComputerAgent(
+      computer=macos_computer,
+      loop=AgentLoop.OPENAI,
+      model=LLM(provider=LLMProvider.OPENAI)
+  )
+
+  tasks = [
+      "Look for a repository named trycua/cua on GitHub.",
+      "Check the open issues, open the most recent one and read it.",
+      "Clone the repository in users/lume/projects if it doesn't exist yet.",
+      "Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
+      "From Cursor, open Composer if not already open.",
+      "Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
+  ]
+
+  for i, task in enumerate(tasks):
+      print(f"\nExecuting task {i}/{len(tasks)}: {task}")
+      async for result in agent.run(task):
+          print(result)
+
+      print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
+```
+
 Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):

- [Agent Notebook](../../notebooks/agent_nb.ipynb) - Complete examples and workflows
+- [Agent Notebook](../../notebooks/agent_nb.ipynb) - Complete examples and workflows
+
+## Agent Loops
+
+The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
+
+| Agent Loop | Supported Models | Description | Set-Of-Marks |
+|:-----------|:-----------------|:------------|:-------------|
+| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
+| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
+| `AgentLoop.OMNI` <br>(preview) | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219`<br>• `gpt-4.5-preview`<br>• `gpt-4o`<br>• `gpt-4`<br>• `gpt-3.5-turbo` | Use OmniParser for element pixel-detection (SoM) and any VLMs | OmniParser |
+
+## AgentResponse
+The `AgentResponse` class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new [OpenAI Agent SDK specification](https://platform.openai.com/docs/api-reference/responses) for better consistency across different agent loops.
+
+```python
+async for result in agent.run(task):
+  print("Response ID: ", result.get("id"))
+
+  # Print detailed usage information
+  usage = result.get("usage")
+  if usage:
+      print("\nUsage Details:")
+      print(f"  Input Tokens: {usage.get('input_tokens')}")
+      if "input_tokens_details" in usage:
+          print(f"  Input Tokens Details: {usage.get('input_tokens_details')}")
+      print(f"  Output Tokens: {usage.get('output_tokens')}")
+      if "output_tokens_details" in usage:
+          print(f"  Output Tokens Details: {usage.get('output_tokens_details')}")
+      print(f"  Total Tokens: {usage.get('total_tokens')}")
+
+  print("Response Text: ", result.get("text"))
+
+  # Print tools information
+  tools = result.get("tools")
+  if tools:
+      print("\nTools:")
+      print(tools)
+
+  # Print reasoning and tool call outputs
+  outputs = result.get("output", [])
+  for output in outputs:
+      output_type = output.get("type")
+      if output_type == "reasoning":
+          print("\nReasoning Output:")
+          print(output)
+      elif output_type == "computer_call":
+          print("\nTool Call Output:")
+          print(output)
+```
--- a/libs/agent/agent/README.md
+++ b/libs/agent/agent/README.md
@@ -1,63 +0,0 @@
-# Agent Package Structure
-
-## Overview
-The agent package provides a modular and extensible framework for AI-powered computer agents.
-
-## Directory Structure
-```
-agent/
-├── __init__.py           # Package exports
-├── core/                 # Core functionality
-│   ├── __init__.py
-│   ├── computer_agent.py # Main entry point
-│   └── factory.py        # Provider factory
-├── base/                 # Base implementations
-│   ├── __init__.py
-│   ├── agent.py         # Base agent class
-│   ├── core/            # Core components
-│   │   ├── callbacks.py
-│   │   ├── loop.py
-│   │   └── messages.py
-│   └── tools/           # Tool implementations
-├── providers/           # Provider implementations
-│   ├── __init__.py
-│   ├── anthropic/      # Anthropic provider
-│   │   ├── agent.py
-│   │   ├── loop.py
-│   │   └── tool_manager.py
-│   └── omni/           # Omni provider
-│       ├── agent.py
-│       ├── loop.py
-│       └── tool_manager.py
-└── types/              # Type definitions
-    ├── __init__.py
-    ├── base.py        # Core types
-    ├── messages.py    # Message types
-    ├── tools.py       # Tool types
-    └── providers/     # Provider-specific types
-        ├── anthropic.py
-        └── omni.py
-```
-
-## Key Components
-
-### Core
- `computer_agent.py`: Main entry point for creating and using agents
- `factory.py`: Factory for creating provider-specific implementations
-
-### Base
- `agent.py`: Base agent implementation with shared functionality
- `core/`: Core components used across providers
- `tools/`: Shared tool implementations
-
-### Providers
-Each provider follows the same structure:
- `agent.py`: Provider-specific agent implementation
- `loop.py`: Provider-specific message loop
- `tool_manager.py`: Tool management for provider
-
-### Types
- `base.py`: Core type definitions
- `messages.py`: Message-related types
- `tools.py`: Tool-related types
- `providers/`: Provider-specific type definitions
--- a/libs/agent/agent/providers/openai/api/init.py
+++ b/libs/agent/agent/providers/openai/api/init.py
@@ -1,5 +0,0 @@
-"""OpenAI API client module."""
-
-from .client import OpenAIClient
-
-__all__ = ["OpenAIClient"]
--- a/libs/agent/agent/providers/openai/api/client.py
+++ b/libs/agent/agent/providers/openai/api/client.py
@@ -1,137 +0,0 @@
-"""OpenAI API client for Agent Response API."""
-
-import logging
-import json
-import os
-import httpx
-from typing import Dict, List, Optional, Any, Union
-
-logger = logging.getLogger(__name__)
-
-
-class OpenAIClient:
-    """Client for OpenAI's Agent Response API."""
-
-    def __init__(
-        self,
-        api_key: str,
-        model: str = "computer-use-preview",
-        base_url: str = "https://api.openai.com/v1",
-        max_retries: int = 3,
-        timeout: int = 120,
-        **kwargs,
-    ):
-        """Initialize OpenAI API client.
-
-        Args:
-            api_key: OpenAI API key
-            model: Model to use for completions (should always be computer-use-preview)
-            base_url: Base URL for API requests
-            max_retries: Maximum number of retries for API calls
-            timeout: Timeout for API calls in seconds
-            **kwargs: Additional arguments to pass to the httpx client
-        """
-        self.api_key = api_key
-
-        # Always use computer-use-preview model
-        if model != "computer-use-preview":
-            logger.warning(
-                f"Overriding provided model '{model}' with required model 'computer-use-preview'"
-            )
-            model = "computer-use-preview"
-
-        self.model = model
-        self.base_url = base_url
-        self.max_retries = max_retries
-        self.timeout = timeout
-
-        # Create httpx client with auth and timeout
-        self.client = httpx.AsyncClient(
-            timeout=timeout,
-            headers={
-                "Authorization": f"Bearer {api_key}",
-                "Content-Type": "application/json",
-                "OpenAI-Beta": "computer-use-2023-09-30",  # Required beta header for computer use
-            },
-            **kwargs,
-        )
-
-        # Additional initialization for organization if available
-        openai_org = os.environ.get("OPENAI_ORG")
-        if openai_org:
-            self.client.headers["OpenAI-Organization"] = openai_org
-
-        logger.info(f"Initialized OpenAI client with model {model}")
-
-    async def create_response(
-        self,
-        input: List[Dict[str, Any]],
-        tools: Optional[List[Dict[str, Any]]] = None,
-        truncation: str = "auto",
-        temperature: float = 0.7,
-        top_p: float = 1.0,
-        **kwargs,
-    ) -> Dict[str, Any]:
-        """Create a response using the OpenAI Agent Response API.
-
-        Args:
-            input: List of messages in the conversation (must be in Agent Response API format)
-            tools: List of tools available to the agent
-            truncation: How to handle truncation (auto, truncate)
-            temperature: Sampling temperature
-            top_p: Nucleus sampling parameter
-            **kwargs: Additional parameters to include in the request
-
-        Returns:
-            Response from the API
-        """
-        url = f"{self.base_url}/responses"
-
-        # Prepare request payload
-        payload = {
-            "model": self.model,
-            "input": input,
-            "temperature": temperature,
-            "top_p": top_p,
-            "truncation": truncation,
-            **kwargs,
-        }
-
-        # Add tools if provided
-        if tools:
-            payload["tools"] = tools
-
-        try:
-            logger.debug(f"Sending request to {url}")
-
-            # Make API call
-            response = await self.client.post(url, json=payload)
-
-            # Check for errors
-            try:
-                response.raise_for_status()
-            except httpx.HTTPStatusError as e:
-                error_detail = e.response.text
-                try:
-                    # Try to parse the error as JSON for better debugging
-                    error_json = json.loads(error_detail)
-                    logger.error(f"HTTP error from OpenAI API: {json.dumps(error_json, indent=2)}")
-                except:
-                    logger.error(f"HTTP error from OpenAI API: {error_detail}")
-                raise
-
-            result = response.json()
-            logger.debug("Received successful response")
-            return result
-
-        except httpx.HTTPStatusError as e:
-            error_detail = e.response.text if hasattr(e, "response") else str(e)
-            logger.error(f"HTTP error from OpenAI API: {error_detail}")
-            raise RuntimeError(f"OpenAI API error: {error_detail}")
-        except Exception as e:
-            logger.error(f"Error calling OpenAI API: {str(e)}")
-            raise RuntimeError(f"Error calling OpenAI API: {str(e)}")
-
-    async def close(self):
-        """Close the httpx client."""
-        await self.client.aclose()
--- a/libs/agent/agent/providers/openai/loop.py
+++ b/libs/agent/agent/providers/openai/loop.py
@@ -10,7 +10,6 @@ from ...core.base import BaseLoop
 from ...core.types import AgentResponse
 from ...core.messages import StandardMessageManager, ImageRetentionConfig

-from .api.client import OpenAIClient
 from .api_handler import OpenAIAPIHandler
 from .response_handler import OpenAIResponseHandler
 from .tools.manager import ToolManager
@@ -109,15 +108,8 @@ class OpenAILoop(BaseLoop):
        client, tool manager, and message manager.
        """
        try:
-            logger.info(f"Initializing OpenAI client with model {self.model}...")
-
-            # Initialize client
-            self.client = OpenAIClient(api_key=self.api_key, model=self.model)
-
            # Initialize tool manager
            await self.tool_manager.initialize()
-
-            logger.info(f"Initialized OpenAI client with model {self.model}")
        except Exception as e:
            logger.error(f"Error initializing OpenAI client: {str(e)}")
            self.client = None
@@ -142,13 +134,8 @@ class OpenAILoop(BaseLoop):
            # Create queue for response streaming
            queue = asyncio.Queue()

-            # Ensure client is initialized
-            if self.client is None:
-                logger.info("Initializing client...")
-                await self.initialize_client()
-                if self.client is None:
-                    raise RuntimeError("Failed to initialize client")
-                logger.info("Client initialized successfully")
+            # Ensure tool manager is initialized
+            await self.tool_manager.initialize()

            # Start loop in background task
            loop_task = asyncio.create_task(self._run_loop(queue, messages))
--- a/libs/computer/README.md
+++ b/libs/computer/README.md
@@ -15,7 +15,7 @@
 </h1>
 </div>

-**Computer** is a Computer-Use Interface (CUI) framework powering Cua for interacting with local macOS and Linux sandboxes, PyAutoGUI-compatible, and pluggable with any AI agent systems (Cua, Langchain, CrewAI, AutoGen). Computer relies on [Lume](https://github.com/trycua/lume) for creating and managing sandbox environments.
+**cua-computer** is a Computer-Use Interface (CUI) framework powering Cua for interacting with local macOS and Linux sandboxes, PyAutoGUI-compatible, and pluggable with any AI agent systems (Cua, Langchain, CrewAI, AutoGen). Computer relies on [Lume](https://github.com/trycua/lume) for creating and managing sandbox environments.

 ### Get started with Computer