mirror of
https://github.com/trycua/computer.git
synced 2025-12-20 12:29:50 -06:00
279 lines
9.1 KiB
Plaintext
279 lines
9.1 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "title-intro",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Using Ollama with Cua (Docker Edition)\n",
|
|
"\n",
|
|
"This notebook demonstrates multiple ways to use Ollama with the Cua ComputerAgent, mirroring the structure of `notebooks/sota_hackathon.ipynb` while running the computer inside Docker.\n",
|
|
"\n",
|
|
"We'll cover three patterns:\n",
|
|
"\n",
|
|
"1. Use an all-in-one CUA model served by Ollama (e.g. `model=\"ollama/blaifa/InternVL3_5:8b\"`).\n",
|
|
"2. Use a strong CUA grounding model composed with an Ollama VLM (e.g. `model=\"openai/computer-use-preview+ollama/gemma3:4b\"`).\n",
|
|
"3. Conceptual: different ways to customize/extend your agent (link + outline only).\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "prereq",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 💻 Prerequisites\n",
|
|
"\n",
|
|
"The easiest way to get started is by getting set up with the Cua development repository.\n",
|
|
"\n",
|
|
"Install [Docker](https://www.docker.com/products/docker-desktop/) and [pdm](https://pdm-project.org/en/latest/#recommended-installation-method)\n",
|
|
"\n",
|
|
"Clone the Cua repository:\n",
|
|
"\n",
|
|
"`git clone https://github.com/trycua/cua`\n",
|
|
"\n",
|
|
"Install the project dependencies:\n",
|
|
"\n",
|
|
"`cd cua && uv sync`\n",
|
|
"\n",
|
|
"Now, you should be able to run the `notebooks/hud_hackathon.ipynb` notebook in VS Code with the `.venv` virtual environment selected."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "env-setup",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 🔑 Environment Setup (.env)\n",
|
|
"\n",
|
|
"Create a `.env` file with your API keys. You can use any provider keys that you plan to compose. For example, if composing with OpenAI or Anthropic, add those keys too.\n",
|
|
"\n",
|
|
"Add these entries as needed (empty values are fine if not used):\n",
|
|
"\n",
|
|
"- `OPENAI_API_KEY` (if composing with OpenAI)\n",
|
|
"- `ANTHROPIC_API_KEY` (if composing with Anthropic)\n",
|
|
"- `OLLAMA_API_BASE` (defaults to `http://localhost:11434`)\n",
|
|
"\n",
|
|
"Note: For Cua Cloud computers, you would also set `CUA_API_KEY` and `CUA_CONTAINER_NAME`, but this notebook uses Docker for the computer.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "create-env",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Create a .env template if it doesn't exist\n",
|
|
"ENV_TEMPLATE = \"\"\"# Optional environment variables for composition:\n",
|
|
"OPENAI_API_KEY=\n",
|
|
"ANTHROPIC_API_KEY=\n",
|
|
"\n",
|
|
"# Ollama endpoint (default shown)\n",
|
|
"OLLAMA_API_BASE=http://localhost:11434\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"from pathlib import Path\n",
|
|
"if not Path('.env').exists():\n",
|
|
" Path('.env').write_text(ENV_TEMPLATE)\n",
|
|
" print('A .env file was created! Fill in the empty values you need.')\n",
|
|
"else:\n",
|
|
" print('.env already exists')\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "load-env",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Load .env into environment\n",
|
|
"import os\n",
|
|
"from dotenv import load_dotenv\n",
|
|
"load_dotenv(dotenv_path='.env', override=True)\n",
|
|
"print('OPENAI_API_KEY set:', bool(os.getenv('OPENAI_API_KEY')))\n",
|
|
"print('ANTHROPIC_API_KEY set:', bool(os.getenv('ANTHROPIC_API_KEY')))\n",
|
|
"print('OLLAMA_API_BASE:', os.getenv('OLLAMA_API_BASE', 'http://localhost:11434'))\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ollama-docker",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 🐳 Run Ollama via Docker (recommended)\n",
|
|
"\n",
|
|
"If you don't already have Ollama running locally, you can run it with Docker. \n",
|
|
"Run the following command in your terminal (outside the notebook):\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama \\\n",
|
|
" ollama/ollama:latest\n",
|
|
"```\n",
|
|
"\n",
|
|
"Then pull any models you need, for example (terminal):\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"docker exec -it ollama ollama pull gemma3:4b\n",
|
|
"docker exec -it ollama ollama pull blaifa/InternVL3_5:8b\n",
|
|
"```\n",
|
|
"\n",
|
|
"Make sure your `OLLAMA_HOST` points to `http://localhost:11434`.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "computer-docker",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 🖥️ Launch a Docker Computer\n",
|
|
"\n",
|
|
"We'll run the computer using the Cua Docker provider.\n",
|
|
"You can watch the live VNC stream at `http://localhost:8006/`.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "launch-computer",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import logging\n",
|
|
"from computer import Computer, VMProviderType\n",
|
|
"import webbrowser\n",
|
|
"\n",
|
|
"computer = Computer(\n",
|
|
" os_type=\"linux\",\n",
|
|
" provider_type=VMProviderType.DOCKER,\n",
|
|
" verbosity=logging.INFO\n",
|
|
")\n",
|
|
"await computer.run()\n",
|
|
"\n",
|
|
"# Optional: open the VNC page in your browser\n",
|
|
"webbrowser.open('http://localhost:8006/', new=0, autoraise=True)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "section-1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 1) All-in-one CUA model via Ollama\n",
|
|
"\n",
|
|
"Some community models on Ollama are trained for computer use end-to-end.\n",
|
|
"Point the agent's model to an Ollama-served model using the `ollama/` prefix.\n",
|
|
"\n",
|
|
"Example: `model=\"ollama/blaifa/InternVL3_5:8b\"`.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "run-allinone",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import logging\n",
|
|
"from pathlib import Path\n",
|
|
"from agent import ComputerAgent\n",
|
|
"\n",
|
|
"agent_all_in_one = ComputerAgent(\n",
|
|
" model=\"ollama/blaifa/InternVL3_5:8b\",\n",
|
|
" tools=[computer],\n",
|
|
" trajectory_dir=str(Path('trajectories')),\n",
|
|
" only_n_most_recent_images=3,\n",
|
|
" verbosity=logging.INFO,\n",
|
|
" # instructions=\"You are a helpful assistant.\" # Editable instructions for prompt engineering\n",
|
|
")\n",
|
|
"\n",
|
|
"print('Running all-in-one Ollama CUA model...')\n",
|
|
"async for _ in agent_all_in_one.run(\"Open the web browser and go to example.com\"):\n",
|
|
" pass\n",
|
|
"print('✅ Done')\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "section-2",
|
|
"metadata": {},
|
|
"source": [
|
|
"## 2) Compose a strong CUA UI grounding model with an Ollama VLM\n",
|
|
"\n",
|
|
"You can compose a UI grounding (element localization) model with a local Ollama VLM (reasoning + tool-use) for planning.\n",
|
|
"Use a `+ollama/<model>` suffix to compose.\n",
|
|
"\n",
|
|
"Examples:\n",
|
|
"- `openai/computer-use-preview+ollama/gemma3:4b`\n",
|
|
"- `anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b`\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "run-composed",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from agent import ComputerAgent\n",
|
|
"import logging\n",
|
|
"\n",
|
|
"agent_composed = ComputerAgent(\n",
|
|
" model=\"anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b\",\n",
|
|
" tools=[computer],\n",
|
|
" trajectory_dir='trajectories',\n",
|
|
" only_n_most_recent_images=3,\n",
|
|
" verbosity=logging.INFO,\n",
|
|
")\n",
|
|
"\n",
|
|
"print('Running composed agent (OpenAI grounding + Ollama VLM)...')\n",
|
|
"async for _ in agent_composed.run(\"Open a text editor and type: Hello from composed model!\"):\n",
|
|
" pass\n",
|
|
"print('✅ Done')\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "section-3-conceptual",
|
|
"metadata": {},
|
|
"source": "## 3) Customize your agent 🛠️\n\nFor a few customization options, see: https://cua.ai/docs/agent-sdk/customizing-computeragent\n\nLevels of customization you can explore:\n\n1) Simple — Prompt engineering\n2) Easy — Tools\n3) Intermediate — Callbacks\n4) Expert — Custom agent via `register_agent` (see `libs/python/agent/agent/decorators.py` → `register_agent`)\n\nor, incorporate the ComputerAgent into your own agent framework!"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "wrapup",
|
|
"metadata": {},
|
|
"source": [
|
|
"## ✅ Summary\n",
|
|
"\n",
|
|
"- You ran the computer in Docker via the Cua Docker provider and viewed it at `http://localhost:8006/`.\n",
|
|
"- You tried two runnable ways to leverage Ollama and reviewed a conceptual path to go further:\n",
|
|
" - All-in-one computer-use model served by Ollama.\n",
|
|
" - A composed agent using a strong grounding model + an Ollama VLM.\n",
|
|
" - A link + outline for further customization paths (prompting, tools, callbacks, custom agent via `register_agent`).\n",
|
|
"\n",
|
|
"Explore more configurations and models in the Cua docs.\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "cua",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.11"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
} |