mirror of
https://github.com/trycua/computer.git
synced 2026-01-05 04:50:08 -06:00
docs: expand CUA description with screenshot-VLM-action loop
Explain how computer-use agents work by capturing screenshots, feeding them to a VLM, and determining the next action in a continuous loop.
This commit is contained in:
@@ -17,7 +17,7 @@ import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react';
|
||||
|
||||
## What is a Computer-Use Agent?
|
||||
|
||||
Computer-Use Agents (CUAs) are AI systems that can autonomously interact with computer interfaces through visual understanding and action execution.
|
||||
Computer-Use Agents (CUAs) are AI systems that can autonomously interact with computer interfaces through visual understanding and action execution. They work by capturing screenshots, feeding them to a vision-language model (VLM), and letting the model determine the next action to take - such as clicking, typing, or scrolling - in a continuous loop until the task is complete.
|
||||
|
||||
## What is a Computer-Use Sandbox?
|
||||
|
||||
|
||||
Reference in New Issue
Block a user