added uitars documentation

2026-03-01 19:38:50 -06:00 · 2025-04-24 16:17:00 -04:00
parent 7d6f6cb6e4
commit 7ebab33104
1 changed files with 9 additions and 17 deletions
--- a/libs/agent/README.md
+++ b/libs/agent/README.md
@@ -50,6 +50,9 @@ async with Computer() as macos_computer:
      # or
      # loop=AgentLoop.OMNI,
      # model=LLM(provider=LLMProvider.OLLAMA, model="gemma3")
+      # or
+      # loop=AgentLoop.UITARS,
+      # model=LLM(provider=LLMProvider.OAICOMPAT, model="tgi", provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1")
  )

  tasks = [
@@ -124,6 +127,10 @@ The Gradio UI provides:
 - Configuration of agent parameters
 - Chat interface for interacting with the agent

+### Using UI-TARS
+
+You can use UI-TARS by first following the [deployment guide](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md). This will give you a provider URL like this: `https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1` which you can use in the gradio UI.
+
 ## Agent Loops

 The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
@@ -132,6 +139,7 @@ The `cua-agent` package provides three agent loops variations, based on differen
 |:-----------|:-----------------|:------------|:-------------|
 | `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
 | `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
+| `AgentLoop.UITARS` | • `ByteDance-Seed/UI-TARS-1.5-7B` | Uses ByteDance's UI-TARS 1.5 model | Not Required |
 | `AgentLoop.OMNI` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219`<br>• `gpt-4.5-preview`<br>• `gpt-4o`<br>• `gpt-4`<br>• `phi4`<br>• `phi4-mini`<br>• `gemma3`<br>• `...`<br>• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser |

 ## AgentResponse
@@ -173,25 +181,9 @@ async for result in agent.run(task):
          print(output)
 ```

-### Gradio UI
-
-You can also interact with the agent using a Gradio interface.
-
-```python
-# Ensure environment variables (e.g., API keys) are loaded
-# You might need a helper function like load_dotenv_files() if using .env
-# from utils import load_dotenv_files
-# load_dotenv_files()
-
-from agent.ui.gradio.app import create_gradio_ui
-
-app = create_gradio_ui()
-app.launch(share=False)
-```
-
 **Note on Settings Persistence:**

 *   The Gradio UI automatically saves your configuration (Agent Loop, Model Choice, Custom Base URL, Save Trajectory state, Recent Images count) to a file named `.gradio_settings.json` in the project's root directory when you successfully run a task.
 *   This allows your preferences to persist between sessions.
 *   API keys entered into the custom provider field are **not** saved in this file for security reasons. Manage API keys using environment variables (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) or a `.env` file.
-*   It's recommended to add `.gradio_settings.json` to your `.gitignore` file.
+*   It's recommended to add `.gradio_settings.json` to your `.gitignore` file.