diff --git a/README.md b/README.md index 199e5a3a..fe1c8b03 100644 --- a/README.md +++ b/README.md @@ -216,6 +216,7 @@ The following table shows which capabilities are supported by each model: | [Gemini CU Preview](https://ai.google.dev/gemini-api/docs/computer-use) | 🖥️ | 🎯 | | 👁️ | | [InternVL](https://huggingface.co/OpenGVLab/InternVL3_5-1B) | 🖥️ | 🎯 | 🛠️ | 👁️ | | [UI-TARS](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) | 🖥️ | 🎯 | 🛠️ | 👁️ | +| [UI-TARS-2](https://cua.ai/dashboard/vlm-router) | 🖥️ | 🎯 | 🛠️ | 👁️ | | [OpenCUA](https://huggingface.co/xlangai/OpenCUA-7B) | | 🎯 | | | | [GTA](https://huggingface.co/HelloKKMe/GTA1-7B) | | 🎯 | | | | [Holo](https://huggingface.co/Hcompany/Holo1.5-3B) | | 🎯 | | | @@ -264,6 +265,7 @@ agent = ComputerAgent(model="moondream3+openai/gpt-4o") | [Gemini CU Preview](https://ai.google.dev/gemini-api/docs/computer-use) | `gemini-2.5-computer-use-preview` | | [InternVL](https://huggingface.co/OpenGVLab/InternVL3_5-1B) | `huggingface-local/OpenGVLab/InternVL3_5-{1B,2B,4B,8B,...}` | | [UI-TARS](https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B) | `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B` | +| [UI-TARS-2](https://cua.ai/dashboard/vlm-router) | `cua/bytedance/ui-tars-2` | | [OpenCUA](https://huggingface.co/xlangai/OpenCUA-7B) | `huggingface-local/xlangai/OpenCUA-{7B,32B}` | | [GTA](https://huggingface.co/HelloKKMe/GTA1-7B) | `huggingface-local/HelloKKMe/GTA1-{7B,32B,72B}` | | [Holo](https://huggingface.co/Hcompany/Holo1.5-3B) | `huggingface-local/Hcompany/Holo1.5-{3B,7B,72B}` | diff --git a/docs/content/docs/agent-sdk/supported-agents/computer-use-agents.mdx b/docs/content/docs/agent-sdk/supported-agents/computer-use-agents.mdx index 9621e520..8ad89196 100644 --- a/docs/content/docs/agent-sdk/supported-agents/computer-use-agents.mdx +++ b/docs/content/docs/agent-sdk/supported-agents/computer-use-agents.mdx @@ -99,6 +99,18 @@ async for _ in agent.run("Open the settings menu and change the theme to dark mo pass ``` +## UI-TARS-2 + +Next‑generation UI‑TARS via Cua Router: + +- `cua/bytedance/ui-tars-2` + +```python +agent = ComputerAgent("cua/bytedance/ui-tars-2", tools=[computer]) +async for _ in agent.run("Open a browser and search for Python tutorials"): + pass +``` + --- CUAs also support direct click prediction. See [Grounding Models](./grounding-models) for details on `predict_click()`.