mirror of
https://github.com/trycua/computer.git
synced 2026-01-06 05:20:02 -06:00
170 lines
5.0 KiB
Plaintext
170 lines
5.0 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Composite Agents with Docker Container Computer\n",
|
|
"\n",
|
|
"This notebook walks you through running a composed GUI agent using a Docker-based Computer and OpenRouter for the grounding model, paired with a planning model.\n",
|
|
"\n",
|
|
"We'll use the model string:\n",
|
|
"\n",
|
|
"- `\"openrouter/z-ai/glm-4.5v+openai/gpt-5-nano\"` (grounding + planning)\n",
|
|
"\n",
|
|
"Grounding (left) generates actionable UI coordinates; planning (right) reasons and drives steps."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prerequisites\n",
|
|
"\n",
|
|
"- Docker Desktop or Engine installed and running\n",
|
|
"- An OpenRouter account and API key (https://openrouter.ai/)\n",
|
|
"- (Optional) An OpenAI API key if using `openai/gpt-5-nano` for planning\n",
|
|
"- Python 3.12 environment with `cua-agent` installed"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# If running outside of the monorepo:\n",
|
|
"# %pip install \"cua-agent[all]\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Prepare a Docker Computer\n",
|
|
"\n",
|
|
"We'll follow the documented Docker provider flow (see `docs/content/docs/computer-sdk/computers.mdx`).\n",
|
|
"\n",
|
|
"If you don't have the image yet, either pull or build it locally. Run these in a terminal, not inside the notebook:\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"# Option 1: Pull from Docker Hub\n",
|
|
"docker pull trycua/cua-ubuntu:latest\n",
|
|
"\n",
|
|
"# Option 2: Build locally (from repo root)\n",
|
|
"cd libs/kasm\n",
|
|
"docker build -t cua-ubuntu:latest .\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Set environment keys\n",
|
|
"\n",
|
|
"- Get an OpenRouter API key at https://openrouter.ai/\n",
|
|
"- If using OpenAI for planning, set your OpenAI key as well\n",
|
|
"- You can input them here to set for this notebook session"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"OPENROUTER_API_KEY = (\n",
|
|
" os.getenv(\"OPENROUTER_API_KEY\") or input(\"Enter your OPENROUTER_API_KEY: \").strip()\n",
|
|
")\n",
|
|
"os.environ[\"OPENROUTER_API_KEY\"] = OPENROUTER_API_KEY\n",
|
|
"\n",
|
|
"# Optional: if planning model uses OpenAI provider\n",
|
|
"OPENAI_API_KEY = (\n",
|
|
" os.getenv(\"OPENAI_API_KEY\")\n",
|
|
" or input(\"(Optional) Enter your OPENAI_API_KEY (press Enter to skip): \").strip()\n",
|
|
")\n",
|
|
"if OPENAI_API_KEY:\n",
|
|
" os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Create a Docker Computer and a composed agent\n",
|
|
"\n",
|
|
"This uses the documented Docker provider parameters: `os_type=\"linux\"`, `provider_type=\"docker\"`, plus `image` and `name`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import asyncio\n",
|
|
"from computer import Computer\n",
|
|
"from agent import ComputerAgent\n",
|
|
"\n",
|
|
"\n",
|
|
"async def main():\n",
|
|
" # Launch & connect to a Docker container running the Computer Server\n",
|
|
" async with Computer(\n",
|
|
" os_type=\"linux\",\n",
|
|
" provider_type=\"docker\",\n",
|
|
" image=\"trycua/cua-ubuntu:latest\",\n",
|
|
" name=\"my-cua-container\",\n",
|
|
" ) as computer:\n",
|
|
" agent = ComputerAgent(\n",
|
|
" model=\"openrouter/z-ai/glm-4.5v+openai/gpt-5-nano\",\n",
|
|
" tools=[computer],\n",
|
|
" trajectory_dir=\"trajectories\", # Save agent trajectory (screenshots, api calls)\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Simple task to verify end-to-end\n",
|
|
" async for _ in agent.run(\"Open a browser and go to example.com\"):\n",
|
|
" pass\n",
|
|
"\n",
|
|
"\n",
|
|
"asyncio.run(main())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Notes\n",
|
|
"\n",
|
|
"- Grounding (OpenRouter `z-ai/glm-4.5v`) + Planning (OpenAI `gpt-5-nano`) can be swapped for other providers/models.\n",
|
|
"- If you prefer to avoid OpenAI, choose a planning model on OpenRouter and update the model string accordingly.\n",
|
|
"- Be sure the planning model supports `vision` input and the `tools` parameter.\n",
|
|
"- The agent emits normalized Agent Responses across providers."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|