From 3b2722ad8ab93edfccf6d1bd3e933dedaeda5a6d Mon Sep 17 00:00:00 2001 From: Sarina Li Date: Fri, 12 Dec 2025 14:52:19 -0500 Subject: [PATCH] document cloud vlm models --- README.md | 46 +++++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 4b46c9f1..57c0d0d0 100644 --- a/README.md +++ b/README.md @@ -219,21 +219,22 @@ These are the valid model configurations for `ComputerAgent(model="...")`: The following table shows which capabilities are supported by each model: -| Model | Computer-Use | Grounding | Tools | VLM | -| -------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-------: | :---: | :-: | -| [Claude Sonnet/Haiku](https://docs.claude.com/en/docs/agents-and-tools/tool-use/computer-use-tool#how-to-implement-computer-use) | 🖥️ | 🎯 | 🛠️ | 👁️ | -| [OpenAI CU Preview](https://platform.openai.com/docs/models/computer-use-preview) | 🖥️ | 🎯 | | 👁️ | -| [Qwen3 VL](https://huggingface.co/collections/Qwen/qwen3-vl) | 🖥️ | 🎯 | 🛠️ | 👁️ | -| [GLM-V](https://huggingface.co/THUDM/glm-4v-9b) | 🖥️ | 🎯 | 🛠️ | 👁️ | -| [Gemini CU Preview](https://ai.google.dev/gemini-api/docs/computer-use) | 🖥️ | 🎯 | | 👁️ | -| [InternVL](https://huggingface.co/OpenGVLab/InternVL3_5-1B) | 🖥️ | 🎯 | 🛠️ | 👁️ | -| [UI-TARS](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) | 🖥️ | 🎯 | 🛠️ | 👁️ | -| [UI-TARS-2](https://cua.ai/dashboard/vlm-router) | 🖥️ | 🎯 | 🛠️ | 👁️ | -| [OpenCUA](https://huggingface.co/xlangai/OpenCUA-7B) | | 🎯 | | | -| [GTA](https://huggingface.co/HelloKKMe/GTA1-7B) | | 🎯 | | | -| [Holo](https://huggingface.co/Hcompany/Holo1.5-3B) | | 🎯 | | | -| [Moondream](https://huggingface.co/moondream/moondream3-preview) | | 🎯 | | | -| [OmniParser](https://github.com/microsoft/OmniParser) | | 🎯 | | | +| Model | Computer-Use | Grounding | Tools | VLM | Cloud | +| -------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-------: | :---: | :-: | :---: | +| [Claude Sonnet/Haiku](https://docs.claude.com/en/docs/agents-and-tools/tool-use/computer-use-tool#how-to-implement-computer-use) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ | +| [Claude Opus](https://docs.claude.com/en/docs/agents-and-tools/tool-use/computer-use-tool#how-to-implement-computer-use) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ | +| [OpenAI CU Preview](https://platform.openai.com/docs/models/computer-use-preview) | 🖥️ | 🎯 | | 👁️ | | +| [Qwen3 VL](https://huggingface.co/collections/Qwen/qwen3-vl) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ | +| [GLM-V](https://huggingface.co/THUDM/glm-4v-9b) | 🖥️ | 🎯 | 🛠️ | 👁️ | | +| [Gemini CU Preview](https://ai.google.dev/gemini-api/docs/computer-use) | 🖥️ | 🎯 | | 👁️ | | +| [InternVL](https://huggingface.co/OpenGVLab/InternVL3_5-1B) | 🖥️ | 🎯 | 🛠️ | 👁️ | | +| [UI-TARS](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) | 🖥️ | 🎯 | 🛠️ | 👁️ | | +| [UI-TARS-2](https://cua.ai/dashboard/vlm-router) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ | +| [OpenCUA](https://huggingface.co/xlangai/OpenCUA-7B) | | 🎯 | | | | +| [GTA](https://huggingface.co/HelloKKMe/GTA1-7B) | | 🎯 | | | | +| [Holo](https://huggingface.co/Hcompany/Holo1.5-3B) | | 🎯 | | | | +| [Moondream](https://huggingface.co/moondream/moondream3-preview) | | 🎯 | | | | +| [OmniParser](https://github.com/microsoft/OmniParser) | | 🎯 | | | | **Legend:** @@ -241,6 +242,7 @@ The following table shows which capabilities are supported by each model: - 🎯 **Grounding**: UI element detection and click coordinate prediction - 🛠️ **Tools**: Support for function calling beyond screen interaction - 👁️ **VLM**: Vision-language understanding +- ☁️ **Cloud**: Supported on Cua VLM **Composition Examples:** @@ -381,6 +383,20 @@ Learn more in the [SOM documentation](./libs/python/som/README.md). ## 2025 +### December 2025 + +- **Cloud VLM Platform**: Support for Claude Opus, Qwen3 VL 235B, and UI-TARS-2 on Cua VLM cloud infrastructure +- **QEMU Container Support**: Native Linux and Windows container execution via QEMU virtualization + +### November 2025 + +- **Generic VLM Provider**: Expanded support for custom VLM providers and model configurations +- **NeurIPS 2025**: Coverage of computer-use agent research papers and developments ([Blog Post](https://cua.ai/blog/neurips-2025-cua-papers)) + +### October 2025 + +- **Agent SDK Improvements**: Enhanced model support and configuration options + ### September 2025 - **Hack the North Competition**: First benchmark-driven hackathon track with guaranteed YC interview prize. Winner achieved 68.3% on OSWorld-Tiny ([Blog Post](https://www.cua.ai/blog/hack-the-north))