computer/notebooks/ollama_nb.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "title-intro",
   "metadata": {},
   "source": [
    "# Using Ollama with Cua (Docker Edition)\n",
    "\n",
    "This notebook demonstrates multiple ways to use Ollama with the Cua ComputerAgent, mirroring the structure of `notebooks/sota_hackathon.ipynb` while running the computer inside Docker.\n",
    "\n",
    "We'll cover three patterns:\n",
    "\n",
    "1. Use an all-in-one CUA model served by Ollama (e.g. `model=\"ollama/blaifa/InternVL3_5:8b\"`).\n",
    "2. Use a strong CUA grounding model composed with an Ollama VLM (e.g. `model=\"openai/computer-use-preview+ollama/gemma3:4b\"`).\n",
    "3. Conceptual: different ways to customize/extend your agent (link + outline only).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "prereq",
   "metadata": {},
   "source": [
    "## 💻 Prerequisites\n",
    "\n",
    "The easiest way to get started is by getting set up with the Cua development repository.\n",
    "\n",
    "Install [Docker](https://www.docker.com/products/docker-desktop/) and [pdm](https://pdm-project.org/en/latest/#recommended-installation-method)\n",
    "\n",
    "Clone the Cua repository:\n",
    "\n",
    "`git clone https://github.com/trycua/cua`\n",
    "\n",
    "Install the project dependencies:\n",
    "\n",
    "`cd cua && uv sync`\n",
    "\n",
    "Now, you should be able to run the `notebooks/hud_hackathon.ipynb` notebook in VS Code with the `.venv` virtual environment selected."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "env-setup",
   "metadata": {},
   "source": [
    "## 🔑 Environment Setup (.env)\n",
    "\n",
    "Create a `.env` file with your API keys. You can use any provider keys that you plan to compose. For example, if composing with OpenAI or Anthropic, add those keys too.\n",
    "\n",
    "Add these entries as needed (empty values are fine if not used):\n",
    "\n",
    "- `OPENAI_API_KEY` (if composing with OpenAI)\n",
    "- `ANTHROPIC_API_KEY` (if composing with Anthropic)\n",
    "- `OLLAMA_API_BASE` (defaults to `http://localhost:11434`)\n",
    "\n",
    "Note: For Cua Cloud computers, you would also set `CUA_API_KEY` and `CUA_CONTAINER_NAME`, but this notebook uses Docker for the computer.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "create-env",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a .env template if it doesn't exist\n",
    "ENV_TEMPLATE = \"\"\"# Optional environment variables for composition:\n",
    "OPENAI_API_KEY=\n",
    "ANTHROPIC_API_KEY=\n",
    "\n",
    "# Ollama endpoint (default shown)\n",
    "OLLAMA_API_BASE=http://localhost:11434\n",
    "\"\"\"\n",
    "\n",
    "from pathlib import Path\n",
    "if not Path('.env').exists():\n",
    "    Path('.env').write_text(ENV_TEMPLATE)\n",
    "    print('A .env file was created! Fill in the empty values you need.')\n",
    "else:\n",
    "    print('.env already exists')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "load-env",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load .env into environment\n",
    "import os\n",
    "from dotenv import load_dotenv\n",
    "load_dotenv(dotenv_path='.env', override=True)\n",
    "print('OPENAI_API_KEY set:', bool(os.getenv('OPENAI_API_KEY')))\n",
    "print('ANTHROPIC_API_KEY set:', bool(os.getenv('ANTHROPIC_API_KEY')))\n",
    "print('OLLAMA_API_BASE:', os.getenv('OLLAMA_API_BASE', 'http://localhost:11434'))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ollama-docker",
   "metadata": {},
   "source": [
    "## 🐳 Run Ollama via Docker (recommended)\n",
    "\n",
    "If you don't already have Ollama running locally, you can run it with Docker. \n",
    "Run the following command in your terminal (outside the notebook):\n",
    "\n",
    "```bash\n",
    "docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama \\\n",
    "  ollama/ollama:latest\n",
    "```\n",
    "\n",
    "Then pull any models you need, for example (terminal):\n",
    "\n",
    "```bash\n",
    "docker exec -it ollama ollama pull gemma3:4b\n",
    "docker exec -it ollama ollama pull blaifa/InternVL3_5:8b\n",
    "```\n",
    "\n",
    "Make sure your `OLLAMA_HOST` points to `http://localhost:11434`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "computer-docker",
   "metadata": {},
   "source": [
    "## 🖥️ Launch a Docker Computer\n",
    "\n",
    "We'll run the computer using the Cua Docker provider.\n",
    "You can watch the live VNC stream at `http://localhost:8006/`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "launch-computer",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "from computer import Computer, VMProviderType\n",
    "import webbrowser\n",
    "\n",
    "computer = Computer(\n",
    "    os_type=\"linux\",\n",
    "    provider_type=VMProviderType.DOCKER,\n",
    "    verbosity=logging.INFO\n",
    ")\n",
    "await computer.run()\n",
    "\n",
    "# Optional: open the VNC page in your browser\n",
    "webbrowser.open('http://localhost:8006/', new=0, autoraise=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "section-1",
   "metadata": {},
   "source": [
    "## 1) All-in-one CUA model via Ollama\n",
    "\n",
    "Some community models on Ollama are trained for computer use end-to-end.\n",
    "Point the agent's model to an Ollama-served model using the `ollama/` prefix.\n",
    "\n",
    "Example: `model=\"ollama/blaifa/InternVL3_5:8b\"`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "run-allinone",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "from pathlib import Path\n",
    "from agent import ComputerAgent\n",
    "\n",
    "agent_all_in_one = ComputerAgent(\n",
    "    model=\"ollama/blaifa/InternVL3_5:8b\",\n",
    "    tools=[computer],\n",
    "    trajectory_dir=str(Path('trajectories')),\n",
    "    only_n_most_recent_images=3,\n",
    "    verbosity=logging.INFO,\n",
    "    # instructions=\"You are a helpful assistant.\" # Editable instructions for prompt engineering\n",
    ")\n",
    "\n",
    "print('Running all-in-one Ollama CUA model...')\n",
    "async for _ in agent_all_in_one.run(\"Open the web browser and go to example.com\"):\n",
    "    pass\n",
    "print('✅ Done')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "section-2",
   "metadata": {},
   "source": [
    "## 2) Compose a strong CUA UI grounding model with an Ollama VLM\n",
    "\n",
    "You can compose a UI grounding (element localization) model with a local Ollama VLM (reasoning + tool-use) for planning.\n",
    "Use a `+ollama/<model>` suffix to compose.\n",
    "\n",
    "Examples:\n",
    "- `openai/computer-use-preview+ollama/gemma3:4b`\n",
    "- `anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "run-composed",
   "metadata": {},
   "outputs": [],
   "source": [
    "from agent import ComputerAgent\n",
    "import logging\n",
    "\n",
    "agent_composed = ComputerAgent(\n",
    "    model=\"anthropic/claude-3-5-sonnet-20241022+ollama/gemma3:4b\",\n",
    "    tools=[computer],\n",
    "    trajectory_dir='trajectories',\n",
    "    only_n_most_recent_images=3,\n",
    "    verbosity=logging.INFO,\n",
    ")\n",
    "\n",
    "print('Running composed agent (OpenAI grounding + Ollama VLM)...')\n",
    "async for _ in agent_composed.run(\"Open a text editor and type: Hello from composed model!\"):\n",
    "    pass\n",
    "print('✅ Done')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "section-3-conceptual",
   "metadata": {},
   "source": "## 3) Customize your agent 🛠️\n\nFor a few customization options, see: https://cua.ai/docs/agent-sdk/customizing-computeragent\n\nLevels of customization you can explore:\n\n1) Simple — Prompt engineering\n2) Easy — Tools\n3) Intermediate — Callbacks\n4) Expert — Custom agent via `register_agent` (see `libs/python/agent/agent/decorators.py` → `register_agent`)\n\nor, incorporate the ComputerAgent into your own agent framework!"
  },
  {
   "cell_type": "markdown",
   "id": "wrapup",
   "metadata": {},
   "source": [
    "## ✅ Summary\n",
    "\n",
    "- You ran the computer in Docker via the Cua Docker provider and viewed it at `http://localhost:8006/`.\n",
    "- You tried two runnable ways to leverage Ollama and reviewed a conceptual path to go further:\n",
    "  - All-in-one computer-use model served by Ollama.\n",
    "  - A composed agent using a strong grounding model + an Ollama VLM.\n",
    "  - A link + outline for further customization paths (prompting, tools, callbacks, custom agent via `register_agent`).\n",
    "\n",
    "Explore more configurations and models in the Cua docs.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cua",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}