diff --git a/libs/agent/README.md b/libs/agent/README.md index 81e8b8f1..399a023b 100644 --- a/libs/agent/README.md +++ b/libs/agent/README.md @@ -50,6 +50,9 @@ async with Computer() as macos_computer: # or # loop=AgentLoop.OMNI, # model=LLM(provider=LLMProvider.OLLAMA, model="gemma3") + # or + # loop=AgentLoop.UITARS, + # model=LLM(provider=LLMProvider.OAICOMPAT, model="tgi", provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1") ) tasks = [ @@ -124,6 +127,10 @@ The Gradio UI provides: - Configuration of agent parameters - Chat interface for interacting with the agent +### Using UI-TARS + +You can use UI-TARS by first following the [deployment guide](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md). This will give you a provider URL like this: `https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1` which you can use in the gradio UI. + ## Agent Loops The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques: @@ -132,6 +139,7 @@ The `cua-agent` package provides three agent loops variations, based on differen |:-----------|:-----------------|:------------|:-------------| | `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required | | `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`
• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required | +| `AgentLoop.UITARS` | • `ByteDance-Seed/UI-TARS-1.5-7B` | Uses ByteDance's UI-TARS 1.5 model | Not Required | | `AgentLoop.OMNI` | • `claude-3-5-sonnet-20240620`
• `claude-3-7-sonnet-20250219`
• `gpt-4.5-preview`
• `gpt-4o`
• `gpt-4`
• `phi4`
• `phi4-mini`
• `gemma3`
• `...`
• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser | ## AgentResponse @@ -173,25 +181,9 @@ async for result in agent.run(task): print(output) ``` -### Gradio UI - -You can also interact with the agent using a Gradio interface. - -```python -# Ensure environment variables (e.g., API keys) are loaded -# You might need a helper function like load_dotenv_files() if using .env -# from utils import load_dotenv_files -# load_dotenv_files() - -from agent.ui.gradio.app import create_gradio_ui - -app = create_gradio_ui() -app.launch(share=False) -``` - **Note on Settings Persistence:** * The Gradio UI automatically saves your configuration (Agent Loop, Model Choice, Custom Base URL, Save Trajectory state, Recent Images count) to a file named `.gradio_settings.json` in the project's root directory when you successfully run a task. * This allows your preferences to persist between sessions. * API keys entered into the custom provider field are **not** saved in this file for security reasons. Manage API keys using environment variables (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) or a `.env` file. -* It's recommended to add `.gradio_settings.json` to your `.gitignore` file. \ No newline at end of file +* It's recommended to add `.gradio_settings.json` to your `.gitignore` file.