mirror of
https://github.com/trycua/computer.git
synced 2026-01-10 07:20:10 -06:00
added uitars documentation
This commit is contained in:
@@ -50,6 +50,9 @@ async with Computer() as macos_computer:
|
||||
# or
|
||||
# loop=AgentLoop.OMNI,
|
||||
# model=LLM(provider=LLMProvider.OLLAMA, model="gemma3")
|
||||
# or
|
||||
# loop=AgentLoop.UITARS,
|
||||
# model=LLM(provider=LLMProvider.OAICOMPAT, model="tgi", provider_base_url="https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1")
|
||||
)
|
||||
|
||||
tasks = [
|
||||
@@ -124,6 +127,10 @@ The Gradio UI provides:
|
||||
- Configuration of agent parameters
|
||||
- Chat interface for interacting with the agent
|
||||
|
||||
### Using UI-TARS
|
||||
|
||||
You can use UI-TARS by first following the [deployment guide](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md). This will give you a provider URL like this: `https://**************.us-east-1.aws.endpoints.huggingface.cloud/v1` which you can use in the gradio UI.
|
||||
|
||||
## Agent Loops
|
||||
|
||||
The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
|
||||
@@ -132,6 +139,7 @@ The `cua-agent` package provides three agent loops variations, based on differen
|
||||
|:-----------|:-----------------|:------------|:-------------|
|
||||
| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
|
||||
| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
|
||||
| `AgentLoop.UITARS` | • `ByteDance-Seed/UI-TARS-1.5-7B` | Uses ByteDance's UI-TARS 1.5 model | Not Required |
|
||||
| `AgentLoop.OMNI` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219`<br>• `gpt-4.5-preview`<br>• `gpt-4o`<br>• `gpt-4`<br>• `phi4`<br>• `phi4-mini`<br>• `gemma3`<br>• `...`<br>• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser |
|
||||
|
||||
## AgentResponse
|
||||
@@ -173,25 +181,9 @@ async for result in agent.run(task):
|
||||
print(output)
|
||||
```
|
||||
|
||||
### Gradio UI
|
||||
|
||||
You can also interact with the agent using a Gradio interface.
|
||||
|
||||
```python
|
||||
# Ensure environment variables (e.g., API keys) are loaded
|
||||
# You might need a helper function like load_dotenv_files() if using .env
|
||||
# from utils import load_dotenv_files
|
||||
# load_dotenv_files()
|
||||
|
||||
from agent.ui.gradio.app import create_gradio_ui
|
||||
|
||||
app = create_gradio_ui()
|
||||
app.launch(share=False)
|
||||
```
|
||||
|
||||
**Note on Settings Persistence:**
|
||||
|
||||
* The Gradio UI automatically saves your configuration (Agent Loop, Model Choice, Custom Base URL, Save Trajectory state, Recent Images count) to a file named `.gradio_settings.json` in the project's root directory when you successfully run a task.
|
||||
* This allows your preferences to persist between sessions.
|
||||
* API keys entered into the custom provider field are **not** saved in this file for security reasons. Manage API keys using environment variables (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) or a `.env` file.
|
||||
* It's recommended to add `.gradio_settings.json` to your `.gitignore` file.
|
||||
* It's recommended to add `.gradio_settings.json` to your `.gitignore` file.
|
||||
|
||||
Reference in New Issue
Block a user