diff --git a/README.md b/README.md index fe1c8b03..395519f1 100644 --- a/README.md +++ b/README.md @@ -242,7 +242,7 @@ agent = ComputerAgent(model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5") agent = ComputerAgent(model="omniparser+openai/gpt-4o") # Combine state-of-the-art grounding with powerful reasoning -agent = ComputerAgent(model="huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-3-5-sonnet-20241022") +agent = ComputerAgent(model="huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-sonnet-4-5-20250929") # Combine two different vision models for enhanced capabilities agent = ComputerAgent(model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B+openai/gpt-4o") diff --git a/blog/app-use.md b/blog/app-use.md index 68cf9c9b..1985713c 100644 --- a/blog/app-use.md +++ b/blog/app-use.md @@ -25,7 +25,7 @@ desktop = computer.create_desktop_from_apps(["Safari", "Notes"]) # Your agent can now only see and interact with these apps agent = ComputerAgent( - model="anthropic/claude-3-5-sonnet-20241022", + model="anthropic/claude-sonnet-4-5-20250929", tools=[desktop] ) ``` @@ -94,7 +94,7 @@ async def main(): # Initialize an agent agent = ComputerAgent( - model="anthropic/claude-3-5-sonnet-20241022", + model="anthropic/claude-sonnet-4-5-20250929", tools=[desktop] ) @@ -160,7 +160,7 @@ async def automate_iphone(): # Initialize an agent for iPhone automation agent = ComputerAgent( - model="anthropic/claude-3-5-sonnet-20241022", + model="anthropic/claude-sonnet-4-5-20250929", tools=[my_iphone] ) diff --git a/blog/build-your-own-operator-on-macos-2.md b/blog/build-your-own-operator-on-macos-2.md index 7a42d9ae..2c3d8ccb 100644 --- a/blog/build-your-own-operator-on-macos-2.md +++ b/blog/build-your-own-operator-on-macos-2.md @@ -145,9 +145,9 @@ While the core concept remains the same across all agent loops, different AI mod | Agent Loop | Supported Models | Description | Set-Of-Marks | |:-----------|:-----------------|:------------|:-------------| | `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA Preview model | Not Required | -| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`
• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use Beta Tools | Not Required | +| `AgentLoop.ANTHROPIC` | • `claude-sonnet-4-5-20250929`
• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use Beta Tools | Not Required | | `AgentLoop.UITARS` | • `ByteDance-Seed/UI-TARS-1.5-7B` | Uses ByteDance's UI-TARS 1.5 model | Not Required | -| `AgentLoop.OMNI` | • `claude-3-5-sonnet-20240620`
• `claude-3-7-sonnet-20250219`
• `gpt-4.5-preview`
• `gpt-4o`
• `gpt-4`
• `phi4`
• `phi4-mini`
• `gemma3`
• `...`
• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser | +| `AgentLoop.OMNI` | • `claude-sonnet-4-5-20250929`
• `claude-3-7-sonnet-20250219`
• `gpt-4.5-preview`
• `gpt-4o`
• `gpt-4`
• `phi4`
• `phi4-mini`
• `gemma3`
• `...`
• `Any Ollama or OpenAI-compatible model` | Use OmniParser for element pixel-detection (SoM) and any VLMs for UI Grounding and Reasoning | OmniParser | Each loop handles the same basic pattern we implemented manually in Part 1: @@ -191,7 +191,7 @@ The performance of different Computer-Use models varies significantly across tas - **AgentLoop.OPENAI**: Choose when you have OpenAI Tier 3 access and need the most capable computer-use agent for web-based tasks. Uses the same [OpenAI Computer-Use Loop](https://platform.openai.com/docs/guides/tools-computer-use) as Part 1, delivering strong performance on browser-based benchmarks. -- **AgentLoop.ANTHROPIC**: Ideal for users with Anthropic API access who need strong reasoning capabilities with computer-use abilities. Works with `claude-3-5-sonnet-20240620` and `claude-3-7-sonnet-20250219` models following [Anthropic's Computer-Use tools](https://docs.anthropic.com/en/docs/agents-and-tools/computer-use#understanding-the-multi-agent-loop). +- **AgentLoop.ANTHROPIC**: Ideal for users with Anthropic API access who need strong reasoning capabilities with computer-use abilities. Works with `claude-sonnet-4-5-20250929` and `claude-3-7-sonnet-20250219` models following [Anthropic's Computer-Use tools](https://docs.anthropic.com/en/docs/agents-and-tools/computer-use#understanding-the-multi-agent-loop). - **AgentLoop.UITARS**: Best for scenarios requiring more powerful OS/desktop, and latency-sensitive automation, as UI-TARS-1.5 leads in OS capabilities benchmarks. Requires running the model locally or accessing it through compatible endpoints (e.g. on Hugging Face). diff --git a/blog/composite-agents.md b/blog/composite-agents.md index 66af1869..2b8a7df3 100644 --- a/blog/composite-agents.md +++ b/blog/composite-agents.md @@ -14,12 +14,12 @@ This is the kind of problem that makes you wonder if we're building the future o Agent framework 0.4 solves this by doing something radical: making all these different models speak the same language. -Instead of writing separate code for each model's peculiarities, you now just pick a model with a string like `"anthropic/claude-3-5-sonnet-20241022"` or `"huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"`, and everything else Just Works™. Behind the scenes, we handle all the coordinate normalization, token parsing, and image preprocessing so you don't have to. +Instead of writing separate code for each model's peculiarities, you now just pick a model with a string like `"anthropic/claude-sonnet-4-5-20250929"` or `"huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"`, and everything else Just Works™. Behind the scenes, we handle all the coordinate normalization, token parsing, and image preprocessing so you don't have to. ```python # This works the same whether you're using Anthropic, OpenAI, or that new model you found on Hugging Face agent = ComputerAgent( - model="anthropic/claude-3-5-sonnet-20241022", # or any other supported model + model="anthropic/claude-sonnet-4-5-20250929", # or any other supported model tools=[computer] ) ``` diff --git a/blog/computer-use-agents-for-growth-hacking.md b/blog/computer-use-agents-for-growth-hacking.md index 6bf25ea5..c92dd60d 100644 --- a/blog/computer-use-agents-for-growth-hacking.md +++ b/blog/computer-use-agents-for-growth-hacking.md @@ -8,13 +8,13 @@ Growing a developer-focused product is hard. Traditional marketing doesn't work. So we tried something different at Google DevFest Toronto: show up with backpacks full of cute cua-la keychains and see what happens. -This is the story of how two new hires—a growth engineer and a designer/artist—guerrilla marketed their way through a major tech conference with $200 worth of merch and a post-event automation pipeline. +This is the story of how two new hires, a growth engineer and a designer/artist, guerrilla marketed their way through a major tech conference with $200 worth of merch and a post-event automation pipeline. ## Meet the Team **Sarina** (Growth Engineering): Built the post-event automation pipeline that extracts LinkedIn connections and generates personalized messages while you sleep. -**Esther** (Design + Art): Hand-crafted every piece of artwork, giving life to CUA through illustrations, branding, and yes, extremely cute cua-la keychains. +**Esther** (Design + Art): Hand-crafted every piece of artwork, giving life to Cua through illustrations, branding, and yes, extremely cute cua-la keychains. The thesis: what if we could draw people in with irresistible physical merch, then use computer use agents to handle all the tedious follow-up work? @@ -24,11 +24,9 @@ The thesis: what if we could draw people in with irresistible physical merch, th Google DevFest Toronto brought together hundreds of developers and AI enthusiasts. We didn't have a booth. We didn't have demos. We showed up with backpacks full of cua-la keychains with the cua.ai logo and started handing them out. -That's it. Pure guerrilla marketing. +That's it. Pure guerrilla marketing, the cua-las were absurdly effective. -The cua-las were absurdly effective. - -People would literally crowd around us—not because they were interested in computer use (at first), but because they wanted a cua-la. We'd pitch CUA while handing out keychains, and suddenly we had an engaged audience. No booth required. +People would literally crowd around us, not because they were interested in computer use (at first), but because they wanted a cua-la. We'd pitch Cua while handing out keychains, and suddenly we had an engaged audience! DevFest crowd @@ -36,13 +34,13 @@ People would literally crowd around us—not because they were interested in com A few people stuck the cua-las on their bags immediately. Then, throughout the event, we started getting approached: -"Wait, are you the CUA girls?" +"Wait, are you the Cua girls?" -They'd seen the cua-las on someone's bag, asked about it, and tracked us down. The keychains became walking advertisements. +They'd seen the cua-las on someone's bag, asked about it, and tracked us down! The keychains became walking advertisements. Hack the North recognition at DevFest -Even better: two attendees recognized CUA from Hack the North. Our previous event marketing was actually working. People remembered us. +Even better: two attendees recognized Cua from Hack the North. Our previous event marketing was actually working. People remembered us. ## Part 2: The Automation (Try It Yourself) @@ -64,9 +62,9 @@ Sarina had a better idea: build the automation we wish existed, then open source LinkedIn scraping automation in action -The agent navigates LinkedIn like a human would—click profile, extract info, navigate back, repeat. But it does it overnight while you sleep. +The agent navigates LinkedIn like a human would: click profile, extract info, navigate back, repeat. But it does it overnight while you sleep. -The secret sauce: **VM session persistence**. By logging into LinkedIn once through CUA's VM, the session stays alive. No captchas, no bot detection, just smooth automation. +The secret sauce: **VM session persistence**. By logging into LinkedIn once through Cua's VM, the session stays alive. No captchas, no bot detection, just smooth automation.