mirror of
https://github.com/trycua/computer.git
synced 2026-01-05 04:50:08 -06:00
Merge pull request #651 from sarinali/fix/cloud-models-documented
fix(docs): document cloud models
This commit is contained in:
46
README.md
46
README.md
@@ -219,21 +219,22 @@ These are the valid model configurations for `ComputerAgent(model="...")`:
|
||||
|
||||
The following table shows which capabilities are supported by each model:
|
||||
|
||||
| Model | Computer-Use | Grounding | Tools | VLM |
|
||||
| -------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-------: | :---: | :-: |
|
||||
| [Claude Sonnet/Haiku](https://docs.claude.com/en/docs/agents-and-tools/tool-use/computer-use-tool#how-to-implement-computer-use) | 🖥️ | 🎯 | 🛠️ | 👁️ |
|
||||
| [OpenAI CU Preview](https://platform.openai.com/docs/models/computer-use-preview) | 🖥️ | 🎯 | | 👁️ |
|
||||
| [Qwen3 VL](https://huggingface.co/collections/Qwen/qwen3-vl) | 🖥️ | 🎯 | 🛠️ | 👁️ |
|
||||
| [GLM-V](https://huggingface.co/THUDM/glm-4v-9b) | 🖥️ | 🎯 | 🛠️ | 👁️ |
|
||||
| [Gemini CU Preview](https://ai.google.dev/gemini-api/docs/computer-use) | 🖥️ | 🎯 | | 👁️ |
|
||||
| [InternVL](https://huggingface.co/OpenGVLab/InternVL3_5-1B) | 🖥️ | 🎯 | 🛠️ | 👁️ |
|
||||
| [UI-TARS](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) | 🖥️ | 🎯 | 🛠️ | 👁️ |
|
||||
| [UI-TARS-2](https://cua.ai/dashboard/vlm-router) | 🖥️ | 🎯 | 🛠️ | 👁️ |
|
||||
| [OpenCUA](https://huggingface.co/xlangai/OpenCUA-7B) | | 🎯 | | |
|
||||
| [GTA](https://huggingface.co/HelloKKMe/GTA1-7B) | | 🎯 | | |
|
||||
| [Holo](https://huggingface.co/Hcompany/Holo1.5-3B) | | 🎯 | | |
|
||||
| [Moondream](https://huggingface.co/moondream/moondream3-preview) | | 🎯 | | |
|
||||
| [OmniParser](https://github.com/microsoft/OmniParser) | | 🎯 | | |
|
||||
| Model | Computer-Use | Grounding | Tools | VLM | Cloud |
|
||||
| -------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-------: | :---: | :-: | :---: |
|
||||
| [Claude Sonnet/Haiku](https://docs.claude.com/en/docs/agents-and-tools/tool-use/computer-use-tool#how-to-implement-computer-use) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ |
|
||||
| [Claude Opus](https://docs.claude.com/en/docs/agents-and-tools/tool-use/computer-use-tool#how-to-implement-computer-use) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ |
|
||||
| [OpenAI CU Preview](https://platform.openai.com/docs/models/computer-use-preview) | 🖥️ | 🎯 | | 👁️ | |
|
||||
| [Qwen3 VL](https://huggingface.co/collections/Qwen/qwen3-vl) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ |
|
||||
| [GLM-V](https://huggingface.co/THUDM/glm-4v-9b) | 🖥️ | 🎯 | 🛠️ | 👁️ | |
|
||||
| [Gemini CU Preview](https://ai.google.dev/gemini-api/docs/computer-use) | 🖥️ | 🎯 | | 👁️ | |
|
||||
| [InternVL](https://huggingface.co/OpenGVLab/InternVL3_5-1B) | 🖥️ | 🎯 | 🛠️ | 👁️ | |
|
||||
| [UI-TARS](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) | 🖥️ | 🎯 | 🛠️ | 👁️ | |
|
||||
| [UI-TARS-2](https://cua.ai/dashboard/vlm-router) | 🖥️ | 🎯 | 🛠️ | 👁️ | ☁️ |
|
||||
| [OpenCUA](https://huggingface.co/xlangai/OpenCUA-7B) | | 🎯 | | | |
|
||||
| [GTA](https://huggingface.co/HelloKKMe/GTA1-7B) | | 🎯 | | | |
|
||||
| [Holo](https://huggingface.co/Hcompany/Holo1.5-3B) | | 🎯 | | | |
|
||||
| [Moondream](https://huggingface.co/moondream/moondream3-preview) | | 🎯 | | | |
|
||||
| [OmniParser](https://github.com/microsoft/OmniParser) | | 🎯 | | | |
|
||||
|
||||
**Legend:**
|
||||
|
||||
@@ -241,6 +242,7 @@ The following table shows which capabilities are supported by each model:
|
||||
- 🎯 **Grounding**: UI element detection and click coordinate prediction
|
||||
- 🛠️ **Tools**: Support for function calling beyond screen interaction
|
||||
- 👁️ **VLM**: Vision-language understanding
|
||||
- ☁️ **Cloud**: Supported on Cua VLM
|
||||
|
||||
**Composition Examples:**
|
||||
|
||||
@@ -381,6 +383,20 @@ Learn more in the [SOM documentation](./libs/python/som/README.md).
|
||||
|
||||
## 2025
|
||||
|
||||
### December 2025
|
||||
|
||||
- **Cloud VLM Platform**: Support for Claude Opus, Qwen3 VL 235B, and UI-TARS-2 on Cua VLM cloud infrastructure
|
||||
- **QEMU Container Support**: Native Linux and Windows container execution via QEMU virtualization
|
||||
|
||||
### November 2025
|
||||
|
||||
- **Generic VLM Provider**: Expanded support for custom VLM providers and model configurations
|
||||
- **NeurIPS 2025**: Coverage of computer-use agent research papers and developments ([Blog Post](https://cua.ai/blog/neurips-2025-cua-papers))
|
||||
|
||||
### October 2025
|
||||
|
||||
- **Agent SDK Improvements**: Enhanced model support and configuration options
|
||||
|
||||
### September 2025
|
||||
|
||||
- **Hack the North Competition**: First benchmark-driven hackathon track with guaranteed YC interview prize. Winner achieved 68.3% on OSWorld-Tiny ([Blog Post](https://www.cua.ai/blog/hack-the-north))
|
||||
|
||||
Reference in New Issue
Block a user