Files
computer/notebooks/agent_nb.ipynb
2025-05-14 00:50:21 -07:00

264 lines
7.5 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Agent\n",
"\n",
"This notebook demonstrates how to use Cua's Agent to run a workflow in a virtual sandbox on Apple Silicon Macs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip uninstall -y cua-agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install \"cua-agent[all]\"\n",
"\n",
"# Or install individual agent loops:\n",
"# !pip install cua-agent[openai]\n",
"# !pip install cua-agent[anthropic]\n",
"# !pip install cua-agent[uitars]\n",
"# !pip install cua-agent[omni]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If locally installed, use this instead:\n",
"import os\n",
"\n",
"os.chdir('../libs/agent')\n",
"!poetry install\n",
"!poetry build\n",
"\n",
"!pip uninstall cua-agent -y\n",
"!pip install ./dist/cua_agent-0.1.0-py3-none-any.whl --force-reinstall"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize a Computer Agent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Agent allows you to run an agentic workflow in a virtual sandbox instances on Apple Silicon. Here's a basic example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from computer import Computer, VMProviderType\n",
"from agent import ComputerAgent, LLM, AgentLoop, LLMProvider"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Get API keys from environment or prompt user\n",
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or input(\"Enter your Anthropic API key: \")\n",
"openai_key = os.getenv(\"OPENAI_API_KEY\") or input(\"Enter your OpenAI API key: \")\n",
"\n",
"os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_key\n",
"os.environ[\"OPENAI_API_KEY\"] = openai_key"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar to Computer, you can either use the async context manager pattern or initialize the ComputerAgent instance directly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start by creating an agent that relies on the OpenAI API computer-use-preview model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from pathlib import Path\n",
"\n",
"computer = Computer(verbosity=logging.INFO, provider_type=VMProviderType.LUME)\n",
"\n",
"# Create agent with Anthropic loop and provider\n",
"agent = ComputerAgent(\n",
" computer=computer,\n",
" loop=AgentLoop.OPENAI,\n",
" model=LLM(provider=LLMProvider.OPENAI),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" only_n_most_recent_images=3,\n",
" verbosity=logging.INFO\n",
" )\n",
"\n",
"tasks = [\n",
" \"Look for a repository named trycua/cua on GitHub.\",\n",
" \"Check the open issues, open the most recent one and read it.\",\n",
" \"Clone the repository in users/lume/projects if it doesn't exist yet.\",\n",
" \"Open the repository with an app named Cursor (on the dock, black background and white cube icon).\",\n",
" \"From Cursor, open Composer if not already open.\",\n",
" \"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.\",\n",
"]\n",
"\n",
"for i, task in enumerate(tasks):\n",
" print(f\"\\nExecuting task {i}/{len(tasks)}: {task}\")\n",
" async for result in agent.run(task):\n",
" # print(result)\n",
" pass\n",
"\n",
" print(f\"\\n✅ Task {i+1}/{len(tasks)} completed: {task}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or using the Omni Agent Loop:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"from pathlib import Path\n",
"from agent import ComputerAgent, LLM, AgentLoop\n",
"\n",
"computer = Computer(verbosity=logging.INFO)\n",
"\n",
"# Create agent with Anthropic loop and provider\n",
"agent = ComputerAgent(\n",
" computer=computer,\n",
" loop=AgentLoop.OMNI,\n",
" # model=LLM(provider=LLMProvider.ANTHROPIC, name=\"claude-3-7-sonnet-20250219\"),\n",
" # model=LLM(provider=LLMProvider.OPENAI, name=\"gpt-4.5-preview\"),\n",
" model=LLM(provider=LLMProvider.OLLAMA, name=\"gemma3:12b-it-q4_K_M\"),\n",
" save_trajectory=True,\n",
" trajectory_dir=str(Path(\"trajectories\")),\n",
" only_n_most_recent_images=3,\n",
" verbosity=logging.INFO\n",
" )\n",
"\n",
"tasks = [\n",
" \"Look for a repository named trycua/cua on GitHub.\",\n",
" \"Check the open issues, open the most recent one and read it.\",\n",
" \"Clone the repository in users/lume/projects if it doesn't exist yet.\",\n",
" \"Open the repository with an app named Cursor (on the dock, black background and white cube icon).\",\n",
" \"From Cursor, open Composer if not already open.\",\n",
" \"Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.\",\n",
"]\n",
"\n",
"for i, task in enumerate(tasks):\n",
" print(f\"\\nExecuting task {i}/{len(tasks)}: {task}\")\n",
" async for result in agent.run(task):\n",
" # print(result)\n",
" pass\n",
"\n",
" print(f\"\\n✅ Task {i+1}/{len(tasks)} completed: {task}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using the Gradio UI\n",
"\n",
"The agent includes a Gradio-based user interface for easy interaction. To use it:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Get API keys from environment or prompt user\n",
"anthropic_key = os.getenv(\"ANTHROPIC_API_KEY\") or input(\"Enter your Anthropic API key: \")\n",
"openai_key = os.getenv(\"OPENAI_API_KEY\") or input(\"Enter your OpenAI API key: \")\n",
"\n",
"os.environ[\"ANTHROPIC_API_KEY\"] = anthropic_key\n",
"os.environ[\"OPENAI_API_KEY\"] = openai_key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from agent.ui.gradio.app import create_gradio_ui\n",
"\n",
"app = create_gradio_ui()\n",
"app.launch(share=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "cua312",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}