mirror of
https://github.com/trycua/computer.git
synced 2025-12-31 18:40:04 -06:00
added gpt-5 + gpta1 examples
This commit is contained in:
@@ -28,6 +28,8 @@ taskset = TaskSet(tasks=taskset[:10]) # limit to 10 tasks instead of all 370
|
||||
# Run benchmark job
|
||||
job = await run_job(
|
||||
model="openai/computer-use-preview",
|
||||
# model="anthropic/claude-3-5-sonnet-20241022",
|
||||
# model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5",
|
||||
task_or_taskset=taskset,
|
||||
job_name="test-computeragent-job",
|
||||
max_concurrent_tasks=5,
|
||||
|
||||
@@ -28,12 +28,26 @@ Any model that supports `predict_click()` can be used as the grounding component
|
||||
Any vision-enabled LiteLLM-compatible model can be used as the thinking component:
|
||||
|
||||
- **Anthropic**: `anthropic/claude-3-5-sonnet-20241022`, `anthropic/claude-3-opus-20240229`
|
||||
- **OpenAI**: `openai/gpt-4o`, `openai/gpt-4-vision-preview`
|
||||
- **OpenAI**: `openai/gpt-5`, `openai/gpt-o3`, `openai/gpt-4o`
|
||||
- **Google**: `gemini/gemini-1.5-pro`, `vertex_ai/gemini-pro-vision`
|
||||
- **Local models**: Any Hugging Face vision-language model
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### GTA1 + GPT-5
|
||||
|
||||
Use Google's Gemini for planning with specialized grounding:
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
"huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5",
|
||||
tools=[computer]
|
||||
)
|
||||
|
||||
async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"):
|
||||
pass
|
||||
```
|
||||
|
||||
### GTA1 + Claude 3.5 Sonnet
|
||||
|
||||
Combine state-of-the-art grounding with powerful reasoning:
|
||||
@@ -51,20 +65,6 @@ async for _ in agent.run("Open Firefox, navigate to github.com, and search for '
|
||||
# - GTA1-7B provides precise click coordinates for each UI element
|
||||
```
|
||||
|
||||
### GTA1 + Gemini Pro
|
||||
|
||||
Use Google's Gemini for planning with specialized grounding:
|
||||
|
||||
```python
|
||||
agent = ComputerAgent(
|
||||
"huggingface-local/HelloKKMe/GTA1-7B+gemini/gemini-1.5-pro",
|
||||
tools=[computer]
|
||||
)
|
||||
|
||||
async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"):
|
||||
pass
|
||||
```
|
||||
|
||||
### UI-TARS + GPT-4o
|
||||
|
||||
Combine two different vision models for enhanced capabilities:
|
||||
|
||||
Reference in New Issue
Block a user