From 7fe1b1607012befa5d251031d374d3e4cfce5c83 Mon Sep 17 00:00:00 2001 From: f-trycua Date: Mon, 8 Dec 2025 21:16:57 -0800 Subject: [PATCH] docs: expand CUA description with screenshot-VLM-action loop Explain how computer-use agents work by capturing screenshots, feeding them to a VLM, and determining the next action in a continuous loop. --- docs/content/docs/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/docs/index.mdx b/docs/content/docs/index.mdx index 5da104ed..b24f1013 100644 --- a/docs/content/docs/index.mdx +++ b/docs/content/docs/index.mdx @@ -17,7 +17,7 @@ import { Monitor, Code, BookOpen, Zap, Bot, Boxes, Rocket } from 'lucide-react'; ## What is a Computer-Use Agent? -Computer-Use Agents (CUAs) are AI systems that can autonomously interact with computer interfaces through visual understanding and action execution. +Computer-Use Agents (CUAs) are AI systems that can autonomously interact with computer interfaces through visual understanding and action execution. They work by capturing screenshots, feeding them to a vision-language model (VLM), and letting the model determine the next action to take - such as clicking, typing, or scrolling - in a continuous loop until the task is complete. ## What is a Computer-Use Sandbox?