mirror of
https://github.com/trycua/computer.git
synced 2026-02-19 12:59:34 -06:00
163 lines
4.9 KiB
Plaintext
163 lines
4.9 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Composite Agents with Docker Container Computer\n",
|
|
"\n",
|
|
"This notebook walks you through running a composed GUI agent using a Docker-based Computer and OpenRouter for the grounding model, paired with a planning model.\n",
|
|
"\n",
|
|
"We'll use the model string:\n",
|
|
"\n",
|
|
"- `\"openrouter/z-ai/glm-4.5v+openai/gpt-5-nano\"` (grounding + planning)\n",
|
|
"\n",
|
|
"Grounding (left) generates actionable UI coordinates; planning (right) reasons and drives steps."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prerequisites\n",
|
|
"\n",
|
|
"- Docker Desktop or Engine installed and running\n",
|
|
"- An OpenRouter account and API key (https://openrouter.ai/)\n",
|
|
"- (Optional) An OpenAI API key if using `openai/gpt-5-nano` for planning\n",
|
|
"- Python 3.12 environment with `cua-agent` installed"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Install CUA Agent (and extras as needed)\n",
|
|
"!pip install -q \"cua-agent[all]\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prepare a Docker Computer\n",
|
|
"\n",
|
|
"We'll follow the documented Docker provider flow (see `docs/content/docs/computer-sdk/computers.mdx`).\n",
|
|
"\n",
|
|
"If you don't have the image yet, either pull or build it locally. Run these in a terminal, not inside the notebook:\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"# Option 1: Pull from Docker Hub\n",
|
|
"docker pull trycua/cua-ubuntu:latest\n",
|
|
"\n",
|
|
"# Option 2: Build locally (from repo root)\n",
|
|
"cd libs/kasm\n",
|
|
"docker build -t cua-ubuntu:latest .\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Set environment keys\n",
|
|
"\n",
|
|
"- Get an OpenRouter API key at https://openrouter.ai/\n",
|
|
"- If using OpenAI for planning, set your OpenAI key as well\n",
|
|
"- You can input them here to set for this notebook session"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY') or input('Enter your OPENROUTER_API_KEY: ').strip()\n",
|
|
"os.environ['OPENROUTER_API_KEY'] = OPENROUTER_API_KEY\n",
|
|
"\n",
|
|
"# Optional: if planning model uses OpenAI provider\n",
|
|
"OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input('(Optional) Enter your OPENAI_API_KEY (press Enter to skip): ').strip()\n",
|
|
"if OPENAI_API_KEY:\n",
|
|
" os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create a Docker Computer and a composed agent\n",
|
|
"\n",
|
|
"This uses the documented Docker provider parameters: `os_type=\"linux\"`, `provider_type=\"docker\"`, plus `image` and `name`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import asyncio\n",
|
|
"from computer import Computer\n",
|
|
"from agent import ComputerAgent\n",
|
|
"\n",
|
|
"async def main():\n",
|
|
" # Launch & connect to a Docker container running the Computer Server\n",
|
|
" async with Computer(\n",
|
|
" os_type='linux',\n",
|
|
" provider_type='docker',\n",
|
|
" image='trycua/cua-ubuntu:latest',\n",
|
|
" name='my-cua-container'\n",
|
|
" ) as computer:\n",
|
|
" agent = ComputerAgent(\n",
|
|
" model='openrouter/z-ai/glm-4.5v+openai/gpt-5-nano',\n",
|
|
" tools=[computer],\n",
|
|
" trajectory_dir='trajectories' # Save agent trajectory (screenshots, api calls)\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Simple task to verify end-to-end\n",
|
|
" async for _ in agent.run('Open a browser and go to example.com'):\n",
|
|
" pass\n",
|
|
"\n",
|
|
"asyncio.run(main())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Notes\n",
|
|
"\n",
|
|
"- Grounding (OpenRouter `z-ai/glm-4.5v`) + Planning (OpenAI `gpt-5-nano`) can be swapped for other providers/models.\n",
|
|
"- If you prefer to avoid OpenAI, choose a planning model on OpenRouter and update the model string accordingly.\n",
|
|
"- Be sure the planning model supports `vision` input and the `tools` parameter.\n",
|
|
"- The agent emits normalized Agent Responses across providers."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|