docs: expand CUA description with screenshot-VLM-action loop

Explain how computer-use agents work by capturing screenshots, feeding
them to a VLM, and determining the next action in a continuous loop.
This commit is contained in:
f-trycua
2025-12-08 21:16:57 -08:00
parent 4023b191ca
commit 7fe1b16070

View File

@@ -17,7 +17,7 @@ import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react';
## What is a Computer-Use Agent?
Computer-Use Agents (CUAs) are AI systems that can autonomously interact with computer interfaces through visual understanding and action execution.
Computer-Use Agents (CUAs) are AI systems that can autonomously interact with computer interfaces through visual understanding and action execution. They work by capturing screenshots, feeding them to a vision-language model (VLM), and letting the model determine the next action to take - such as clicking, typing, or scrolling - in a continuous loop until the task is complete.
## What is a Computer-Use Sandbox?