added gpt-5 + gpta1 examples

This commit is contained in:
Dillon DuPont
2025-08-08 18:36:01 -04:00
parent 9685833428
commit f45f6b84e9
2 changed files with 17 additions and 15 deletions

View File

@@ -28,6 +28,8 @@ taskset = TaskSet(tasks=taskset[:10]) # limit to 10 tasks instead of all 370
# Run benchmark job
job = await run_job(
model="openai/computer-use-preview",
# model="anthropic/claude-3-5-sonnet-20241022",
# model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5",
task_or_taskset=taskset,
job_name="test-computeragent-job",
max_concurrent_tasks=5,

View File

@@ -28,12 +28,26 @@ Any model that supports `predict_click()` can be used as the grounding component
Any vision-enabled LiteLLM-compatible model can be used as the thinking component:
- **Anthropic**: `anthropic/claude-3-5-sonnet-20241022`, `anthropic/claude-3-opus-20240229`
- **OpenAI**: `openai/gpt-4o`, `openai/gpt-4-vision-preview`
- **OpenAI**: `openai/gpt-5`, `openai/gpt-o3`, `openai/gpt-4o`
- **Google**: `gemini/gemini-1.5-pro`, `vertex_ai/gemini-pro-vision`
- **Local models**: Any Hugging Face vision-language model
## Usage Examples
### GTA1 + GPT-5
Use Google's Gemini for planning with specialized grounding:
```python
agent = ComputerAgent(
"huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5",
tools=[computer]
)
async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"):
pass
```
### GTA1 + Claude 3.5 Sonnet
Combine state-of-the-art grounding with powerful reasoning:
@@ -51,20 +65,6 @@ async for _ in agent.run("Open Firefox, navigate to github.com, and search for '
# - GTA1-7B provides precise click coordinates for each UI element
```
### GTA1 + Gemini Pro
Use Google's Gemini for planning with specialized grounding:
```python
agent = ComputerAgent(
"huggingface-local/HelloKKMe/GTA1-7B+gemini/gemini-1.5-pro",
tools=[computer]
)
async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"):
pass
```
### UI-TARS + GPT-4o
Combine two different vision models for enhanced capabilities: