Files
computer/notebooks/composite_agents_docker_nb.ipynb
2025-11-13 12:21:51 -05:00

170 lines
5.0 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Composite Agents with Docker Container Computer\n",
"\n",
"This notebook walks you through running a composed GUI agent using a Docker-based Computer and OpenRouter for the grounding model, paired with a planning model.\n",
"\n",
"We'll use the model string:\n",
"\n",
"- `\"openrouter/z-ai/glm-4.5v+openai/gpt-5-nano\"` (grounding + planning)\n",
"\n",
"Grounding (left) generates actionable UI coordinates; planning (right) reasons and drives steps."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"- Docker Desktop or Engine installed and running\n",
"- An OpenRouter account and API key (https://openrouter.ai/)\n",
"- (Optional) An OpenAI API key if using `openai/gpt-5-nano` for planning\n",
"- Python 3.12 environment with `cua-agent` installed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If running outside of the monorepo:\n",
"# %pip install \"cua-agent[all]\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare a Docker Computer\n",
"\n",
"We'll follow the documented Docker provider flow (see `docs/content/docs/computer-sdk/computers.mdx`).\n",
"\n",
"If you don't have the image yet, either pull or build it locally. Run these in a terminal, not inside the notebook:\n",
"\n",
"```bash\n",
"# Option 1: Pull from Docker Hub\n",
"docker pull trycua/cua-ubuntu:latest\n",
"\n",
"# Option 2: Build locally (from repo root)\n",
"cd libs/kasm\n",
"docker build -t cua-ubuntu:latest .\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set environment keys\n",
"\n",
"- Get an OpenRouter API key at https://openrouter.ai/\n",
"- If using OpenAI for planning, set your OpenAI key as well\n",
"- You can input them here to set for this notebook session"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"OPENROUTER_API_KEY = (\n",
" os.getenv(\"OPENROUTER_API_KEY\") or input(\"Enter your OPENROUTER_API_KEY: \").strip()\n",
")\n",
"os.environ[\"OPENROUTER_API_KEY\"] = OPENROUTER_API_KEY\n",
"\n",
"# Optional: if planning model uses OpenAI provider\n",
"OPENAI_API_KEY = (\n",
" os.getenv(\"OPENAI_API_KEY\")\n",
" or input(\"(Optional) Enter your OPENAI_API_KEY (press Enter to skip): \").strip()\n",
")\n",
"if OPENAI_API_KEY:\n",
" os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Docker Computer and a composed agent\n",
"\n",
"This uses the documented Docker provider parameters: `os_type=\"linux\"`, `provider_type=\"docker\"`, plus `image` and `name`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import asyncio\n",
"from computer import Computer\n",
"from agent import ComputerAgent\n",
"\n",
"\n",
"async def main():\n",
" # Launch & connect to a Docker container running the Computer Server\n",
" async with Computer(\n",
" os_type=\"linux\",\n",
" provider_type=\"docker\",\n",
" image=\"trycua/cua-ubuntu:latest\",\n",
" name=\"my-cua-container\",\n",
" ) as computer:\n",
" agent = ComputerAgent(\n",
" model=\"openrouter/z-ai/glm-4.5v+openai/gpt-5-nano\",\n",
" tools=[computer],\n",
" trajectory_dir=\"trajectories\", # Save agent trajectory (screenshots, api calls)\n",
" )\n",
"\n",
" # Simple task to verify end-to-end\n",
" async for _ in agent.run(\"Open a browser and go to example.com\"):\n",
" pass\n",
"\n",
"\n",
"asyncio.run(main())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Notes\n",
"\n",
"- Grounding (OpenRouter `z-ai/glm-4.5v`) + Planning (OpenAI `gpt-5-nano`) can be swapped for other providers/models.\n",
"- If you prefer to avoid OpenAI, choose a planning model on OpenRouter and update the model string accordingly.\n",
"- Be sure the planning model supports `vision` input and the `tools` parameter.\n",
"- The agent emits normalized Agent Responses across providers."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}