Files
computer/notebooks/composite_agents_docker_nb.ipynb
Dillon DuPont 84e2a27aea added notebook
2025-08-26 18:29:39 -04:00

163 lines
4.9 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Composite Agents with Docker Container Computer\n",
"\n",
"This notebook walks you through running a composed GUI agent using a Docker-based Computer and OpenRouter for the grounding model, paired with a planning model.\n",
"\n",
"We'll use the model string:\n",
"\n",
"- `\"openrouter/z-ai/glm-4.5v+openai/gpt-5-nano\"` (grounding + planning)\n",
"\n",
"Grounding (left) generates actionable UI coordinates; planning (right) reasons and drives steps."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"- Docker Desktop or Engine installed and running\n",
"- An OpenRouter account and API key (https://openrouter.ai/)\n",
"- (Optional) An OpenAI API key if using `openai/gpt-5-nano` for planning\n",
"- Python 3.12 environment with `cua-agent` installed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install CUA Agent (and extras as needed)\n",
"!pip install -q \"cua-agent[all]\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare a Docker Computer\n",
"\n",
"We'll follow the documented Docker provider flow (see `docs/content/docs/computer-sdk/computers.mdx`).\n",
"\n",
"If you don't have the image yet, either pull or build it locally. Run these in a terminal, not inside the notebook:\n",
"\n",
"```bash\n",
"# Option 1: Pull from Docker Hub\n",
"docker pull trycua/cua-ubuntu:latest\n",
"\n",
"# Option 2: Build locally (from repo root)\n",
"cd libs/kasm\n",
"docker build -t cua-ubuntu:latest .\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set environment keys\n",
"\n",
"- Get an OpenRouter API key at https://openrouter.ai/\n",
"- If using OpenAI for planning, set your OpenAI key as well\n",
"- You can input them here to set for this notebook session"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY') or input('Enter your OPENROUTER_API_KEY: ').strip()\n",
"os.environ['OPENROUTER_API_KEY'] = OPENROUTER_API_KEY\n",
"\n",
"# Optional: if planning model uses OpenAI provider\n",
"OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input('(Optional) Enter your OPENAI_API_KEY (press Enter to skip): ').strip()\n",
"if OPENAI_API_KEY:\n",
" os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Docker Computer and a composed agent\n",
"\n",
"This uses the documented Docker provider parameters: `os_type=\"linux\"`, `provider_type=\"docker\"`, plus `image` and `name`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"from computer import Computer\n",
"from agent import ComputerAgent\n",
"\n",
"async def main():\n",
" # Launch & connect to a Docker container running the Computer Server\n",
" async with Computer(\n",
" os_type='linux',\n",
" provider_type='docker',\n",
" image='trycua/cua-ubuntu:latest',\n",
" name='my-cua-container'\n",
" ) as computer:\n",
" agent = ComputerAgent(\n",
" model='openrouter/z-ai/glm-4.5v+openai/gpt-5-nano',\n",
" tools=[computer],\n",
" trajectory_dir='trajectories' # Save agent trajectory (screenshots, api calls)\n",
" )\n",
"\n",
" # Simple task to verify end-to-end\n",
" async for _ in agent.run('Open a browser and go to example.com'):\n",
" pass\n",
"\n",
"asyncio.run(main())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Notes\n",
"\n",
"- Grounding (OpenRouter `z-ai/glm-4.5v`) + Planning (OpenAI `gpt-5-nano`) can be swapped for other providers/models.\n",
"- If you prefer to avoid OpenAI, choose a planning model on OpenRouter and update the model string accordingly.\n",
"- Be sure the planning model supports `vision` input and the `tools` parameter.\n",
"- The agent emits normalized Agent Responses across providers."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}