Update docs for operator blogpost

2026-05-05 14:42:50 -05:00 · 2025-04-26 18:23:33 -07:00
parent b7c96f6379
commit 6b3baf075b
4 changed files with 200 additions and 13 deletions
@@ -78,7 +78,13 @@ Refer to these notebooks for step-by-step guides on how to use the Computer-Use

 ## Using the Gradio UI

-The agent includes a Gradio-based user interface for easy interaction. To use it:
+The agent includes a Gradio-based user interface for easier interaction.
+
+<div align="center">
+    <img src="../../img/agent_gradio_ui.png"/>
+</div>
+
+To use it:

 ```bash
 # Install with Gradio support
@@ -34,7 +34,9 @@
    "!pip install \"cua-agent[all]\"\n",
    "\n",
    "# Or install individual agent loops:\n",
+    "# !pip install cua-agent[openai]\n",
    "# !pip install cua-agent[anthropic]\n",
+    "# !pip install cua-agent[uitars]\n",
    "# !pip install cua-agent[omni]"
   ]
  },
@@ -66,7 +68,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Computer allows you to run an agentic workflow in a virtual sandbox instances on Apple Silicon. Here's a basic example:"
+    "Agent allows you to run an agentic workflow in a virtual sandbox instances on Apple Silicon. Here's a basic example:"
   ]
  },
  {
@@ -79,13 +81,6 @@
    "from agent import ComputerAgent, LLM, AgentLoop, LLMProvider"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Similar to Computer, you can either use the async context manager pattern or initialize the ComputerAgent instance directly."
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": 4,
@@ -106,7 +101,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Direct initialization:"
+    "Similar to Computer, you can either use the async context manager pattern or initialize the ComputerAgent instance directly."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's start by creating an agent that relies on the OpenAI API computer-use-preview model."
   ]
  },
  {
@@ -153,7 +155,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Or using the Omni Agentic Loop:"
+    "Or using the Omni Agent Loop:"
   ]
  },
  {
@@ -239,7 +241,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "cua313",
   "language": "python",
   "name": "python3"
  },
@@ -253,7 +255,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.12.2"
+   "version": "3.13.2"
  }
 },
 "nbformat": 4,
@@ -0,0 +1,179 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Build Your Own Operator on macOS - Part 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Welcome to Part 2 of our tutorial series on building a Computer Use Automation (CUA) operator, this time using the `cua-agent` package. For the complete guide, check out our [full blog post](https://www.trycua.com/blog/build-your-own-operator-on-macos-2)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "- Install the `cua-agent` package and set up the Lume daemon as described in its documentation. The `cua-computer` package used in the previous part is already installed as a dependency of `cua-agent`.\n",
+    "- Ensure you have an OpenAI, or Claude API key (set as an environment variable or in your OpenAI configuration).\n",
+    "- This notebook uses asynchronous Python (async/await)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Install the required packages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install cua-agent"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prompt for any API keys\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Get API keys from environment or prompt user\n",
+    "anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or input(\"Enter your Anthropic API key: \")\n",
+    "openai_key = os.getenv(\"OPENAI_API_KEY\") or input(\"Enter your OpenAI API key: \")\n",
+    "\n",
+    "os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_key\n",
+    "os.environ[\"OPENAI_API_KEY\"] = openai_key"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Import required modules"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from computer import Computer\n",
+    "from agent import ComputerAgent, LLM, AgentLoop, LLMProvider"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Running a c/ua Agent"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's start by creating an agent that relies on the OpenAI API computer-use-preview model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import logging\n",
+    "\n",
+    "computer = Computer(verbosity=logging.INFO)\n",
+    "\n",
+    "tasks = [\n",
+    "    \"Look for a repository named trycua/cua on GitHub.\",\n",
+    "    \"Check the open issues, open the most recent one and read it.\",\n",
+    "    \"Clone the repository in users/lume/projects if it doesn't exist yet.\",\n",
+    "    \"Open the repository with an app named Cursor (on the dock, black background and white cube icon).\",\n",
+    "    \"From Cursor, open Composer if not already open.\",\n",
+    "    \"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.\",\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can either provide a list of tasks or a single task as a string."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "agent = ComputerAgent(\n",
+    "        computer=computer,\n",
+    "        loop=AgentLoop.OPENAI,\n",
+    "        model=LLM(provider=LLMProvider.OPENAI),\n",
+    "        save_trajectory=True,\n",
+    "        only_n_most_recent_images=3,\n",
+    "        verbosity=logging.INFO\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "for i, task in enumerate(tasks):\n",
+    "    print(f\"\\nExecuting task {i}/{len(tasks)}: {task}\")\n",
+    "    async for result in agent.run(task):\n",
+    "        # print(result)\n",
+    "        pass\n",
+    "\n",
+    "    print(f\"\\n✅ Task {i+1}/{len(tasks)} completed: {task}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For each task, the agent.run() method returns a generator of results indicating the progress of the task, and any reasoning or actions taken by the agent, conforming to the OpenAI Responses API format."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "cua",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}