diff --git a/docs/content/docs/computer-sdk/commands.mdx b/docs/content/docs/computer-sdk/commands.mdx index 30657471..d8e80493 100644 --- a/docs/content/docs/computer-sdk/commands.mdx +++ b/docs/content/docs/computer-sdk/commands.mdx @@ -202,17 +202,17 @@ Direct file and directory manipulation: ```typescript - // File existence checks + # File existence checks await computer.interface.fileExists(path); // Check if file exists await computer.interface.directoryExists(path); // Check if directory exists - // File content operations + # File content operations await computer.interface.readText(path, "utf-8"); // Read file content await computer.interface.writeText(path, content, "utf-8"); // Write file content await computer.interface.readBytes(path); // Read file content as bytes await computer.interface.writeBytes(path, content); // Write file content as bytes - // File and directory management + # File and directory management await computer.interface.deleteFile(path); // Delete file await computer.interface.createDir(path); // Create directory await computer.interface.deleteDir(path); // Delete directory @@ -243,3 +243,38 @@ Access system accessibility information: ``` + +## Delay Configuration + +Control timing between actions: + + + + ```python + # Set default delay between all actions (in seconds) + computer.interface.delay = 0.5 # 500ms delay between actions + + # Or specify delay for individual actions + await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click + await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing + await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press + ``` + + + + +## Python Virtual Environment Operations + +Manage Python environments: + + + + ```python + # Virtual environment management + await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment + await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'') # Run a shell command in a virtual environment + await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception + ``` + + + \ No newline at end of file diff --git a/docs/content/docs/computer-sdk/computer-ui.mdx b/docs/content/docs/computer-sdk/computer-ui.mdx new file mode 100644 index 00000000..22b131c0 --- /dev/null +++ b/docs/content/docs/computer-sdk/computer-ui.mdx @@ -0,0 +1,80 @@ +--- +title: Computer UI +--- + +The computer module includes a Gradio UI for creating and sharing demonstration data. We make it easy for people to build community datasets for better computer use models with an upload to Huggingface feature. + +```bash +# Install with UI support +pip install "cua-computer[ui]" +``` + + +For precise control of the computer, we recommend using VNC or Screen Sharing instead of the Computer Gradio UI. + + +### Building and Sharing Demonstrations with Huggingface + +Follow these steps to contribute your own demonstrations: + +#### 1. Set up Huggingface Access + +Set your HF_TOKEN in a .env file or in your environment variables: + +```bash +# In .env file +HF_TOKEN=your_huggingface_token +``` + +#### 2. Launch the Computer UI + +```python +# launch_ui.py +from computer.ui.gradio.app import create_gradio_ui +from dotenv import load_dotenv +load_dotenv('.env') + +app = create_gradio_ui() +app.launch(share=False) +``` + +For examples, see [Computer UI Examples](https://github.com/trycua/cua/tree/main/examples/computer_ui_examples.py) + +#### 3. Record Your Tasks + +
+View demonstration video + +
+ +Record yourself performing various computer tasks using the UI. + +#### 4. Save Your Demonstrations + +
+View demonstration video + +
+ +Save each task by picking a descriptive name and adding relevant tags (e.g., "office", "web-browsing", "coding"). + +#### 5. Record Additional Demonstrations + +Repeat steps 3 and 4 until you have a good amount of demonstrations covering different tasks and scenarios. + +#### 6. Upload to Huggingface + +
+View demonstration video + +
+ +Upload your dataset to Huggingface by: +- Naming it as `{your_username}/{dataset_name}` +- Choosing public or private visibility +- Optionally selecting specific tags to upload only tasks with certain tags + +#### Examples and Resources + +- Example Dataset: [ddupont/test-dataset](https://huggingface.co/datasets/ddupont/test-dataset) +- Find Community Datasets: 🔍 [Browse CUA Datasets on Huggingface](https://huggingface.co/datasets?other=cua) \ No newline at end of file diff --git a/docs/content/docs/computer-sdk/meta.json b/docs/content/docs/computer-sdk/meta.json index f632538b..92e14612 100644 --- a/docs/content/docs/computer-sdk/meta.json +++ b/docs/content/docs/computer-sdk/meta.json @@ -4,6 +4,7 @@ "pages": [ "computers", "commands", + "computer-ui", "sandboxed-python" ] } diff --git a/docs/content/docs/computer-sdk/sandboxed-python.mdx b/docs/content/docs/computer-sdk/sandboxed-python.mdx index 1e7f6b78..5f1687bf 100644 --- a/docs/content/docs/computer-sdk/sandboxed-python.mdx +++ b/docs/content/docs/computer-sdk/sandboxed-python.mdx @@ -44,6 +44,32 @@ You can also install packages in the virtual environment using the `venv_install await my_computer.venv_install("myenv", ["requests"]) ``` +## Example: Interacting with macOS Applications + +You can use sandboxed functions to interact with macOS applications on a local Cua Computer (requires `os_type="darwin"`). This is particularly useful for automation tasks that involve GUI applications. + +```python +# Example: Use sandboxed functions to execute code in a Cua Container +from computer.helpers import sandboxed + +await computer.venv_install("demo_venv", ["macos-pyxa"]) # Install packages in a virtual environment + +@sandboxed("demo_venv") +def greet_and_print(name): + """Get the HTML of the current Safari tab""" + import PyXA + safari = PyXA.Application("Safari") + html = safari.current_document.source() + print(f"Hello from inside the container, {name}!") + return {"greeted": name, "safari_html": html} + +# When a @sandboxed function is called, it will execute in the container +result = await greet_and_print("Cua") +# Result: {"greeted": "Cua", "safari_html": "..."} +# stdout and stderr are also captured and printed / raised +print("Result from sandboxed function:", result) +``` + ## Error Handling If the remote execution fails, the decorator will retry up to `max_retries` times. If all attempts fail, the last exception is raised locally. diff --git a/docs/content/docs/libraries/computer/index.mdx b/docs/content/docs/libraries/computer/index.mdx index 4ac698d6..6638f878 100644 --- a/docs/content/docs/libraries/computer/index.mdx +++ b/docs/content/docs/libraries/computer/index.mdx @@ -8,202 +8,16 @@ github: - https://github.com/trycua/cua/tree/main/libs/typescript/computer --- -The Computer library provides a Computer class that can be used to control and automate a container running the Computer Server. +The Computer library provides a Computer class for controlling and automating containers running the Computer Server. -## Reference +## Connecting to Computers -### Basic Usage +See the [Cua Computers](../computer-sdk/computers) documentation for how to connect to different computer types (cloud, local, or host desktop). -Connect to a cua cloud container: +## Computer Commands - - - ```python - from computer import Computer +See the [Commands](../computer-sdk/commands) documentation for all supported commands and interface methods (Shell, Mouse, Keyboard, File System, etc.). - computer = Computer( - os_type="linux", - provider_type="cloud", - name="your-container-name", - api_key="your-api-key" - ) +## Sandboxed Python Functions - computer = await computer.run() # Connect to a cua cloud container - ``` - - - - ```typescript - import { Computer, OSType } from '@trycua/computer'; - - const computer = new Computer({ - osType: OSType.LINUX, - name: "your-container-name", - apiKey: "your-api-key" - }); - - await computer.run(); // Connect to a cua cloud container - ``` - - - - -Connect to a cua local container: - - - - ```python - from computer import Computer - - computer = Computer( - os_type="macos" - ) - - computer = await computer.run() # Connect to the container - ``` - - - - -### Interface Actions - - - - ```python - # Shell Actions - result = await computer.interface.run_command(cmd) # Run shell command - # result.stdout, result.stderr, result.returncode - - # Mouse Actions - await computer.interface.left_click(x, y) # Left click at coordinates - await computer.interface.right_click(x, y) # Right click at coordinates - await computer.interface.double_click(x, y) # Double click at coordinates - await computer.interface.move_cursor(x, y) # Move cursor to coordinates - await computer.interface.drag_to(x, y, duration) # Drag to coordinates - await computer.interface.get_cursor_position() # Get current cursor position - await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button - await computer.interface.mouse_up(x, y, button="left") # Release a mouse button - - # Keyboard Actions - await computer.interface.type_text("Hello") # Type text - await computer.interface.press_key("enter") # Press a single key - await computer.interface.hotkey("command", "c") # Press key combination - await computer.interface.key_down("command") # Press and hold a key - await computer.interface.key_up("command") # Release a key - - # Scrolling Actions - await computer.interface.scroll(x, y) # Scroll the mouse wheel - await computer.interface.scroll_down(clicks) # Scroll down - await computer.interface.scroll_up(clicks) # Scroll up - - # Screen Actions - await computer.interface.screenshot() # Take a screenshot - await computer.interface.get_screen_size() # Get screen dimensions - - # Clipboard Actions - await computer.interface.set_clipboard(text) # Set clipboard content - await computer.interface.copy_to_clipboard() # Get clipboard content - - # File System Operations - await computer.interface.file_exists(path) # Check if file exists - await computer.interface.directory_exists(path) # Check if directory exists - await computer.interface.read_text(path, encoding="utf-8") # Read file content - await computer.interface.write_text(path, content, encoding="utf-8") # Write file content - await computer.interface.read_bytes(path) # Read file content as bytes - await computer.interface.write_bytes(path, content) # Write file content as bytes - await computer.interface.delete_file(path) # Delete file - await computer.interface.create_dir(path) # Create directory - await computer.interface.delete_dir(path) # Delete directory - await computer.interface.list_dir(path) # List directory contents - - # Accessibility - await computer.interface.get_accessibility_tree() # Get accessibility tree - - # Delay Configuration - # Set default delay between all actions (in seconds) - computer.interface.delay = 0.5 # 500ms delay between actions - - # Or specify delay for individual actions - await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click - await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing - await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press - - # Python Virtual Environment Operations - await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment - await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'') # Run a shell command in a virtual environment - await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception - - # Example: Use sandboxed functions to execute code in a Cua Container - from computer.helpers import sandboxed - - @sandboxed("demo_venv") - def greet_and_print(name): - """Get the HTML of the current Safari tab""" - import PyXA - safari = PyXA.Application("Safari") - html = safari.current_document.source() - print(f"Hello from inside the container, {name}!") - return {"greeted": name, "safari_html": html} - - # When a @sandboxed function is called, it will execute in the container - result = await greet_and_print("Cua") - # Result: {"greeted": "Cua", "safari_html": "..."} - # stdout and stderr are also captured and printed / raised - print("Result from sandboxed function:", result) - ``` - - - - ```typescript - // Shell Actions - const result = await computer.interface.runCommand(cmd); // Run shell command - // result.stdout, result.stderr, result.returncode - - // Mouse Actions - await computer.interface.leftClick(x, y); // Left click at coordinates - await computer.interface.rightClick(x, y); // Right click at coordinates - await computer.interface.doubleClick(x, y); // Double click at coordinates - await computer.interface.moveCursor(x, y); // Move cursor to coordinates - await computer.interface.dragTo(x, y, duration); // Drag to coordinates - await computer.interface.getCursorPosition(); // Get current cursor position - await computer.interface.mouseDown(x, y, "left"); // Press and hold a mouse button - await computer.interface.mouseUp(x, y, "left"); // Release a mouse button - - // Keyboard Actions - await computer.interface.typeText("Hello"); // Type text - await computer.interface.pressKey("enter"); // Press a single key - await computer.interface.hotkey("command", "c"); // Press key combination - await computer.interface.keyDown("command"); // Press and hold a key - await computer.interface.keyUp("command"); // Release a key - - // Scrolling Actions - await computer.interface.scroll(x, y); // Scroll the mouse wheel - await computer.interface.scrollDown(clicks); // Scroll down - await computer.interface.scrollUp(clicks); // Scroll up - - // Screen Actions - await computer.interface.screenshot(); // Take a screenshot - await computer.interface.getScreenSize(); // Get screen dimensions - - // Clipboard Actions - await computer.interface.setClipboard(text); // Set clipboard content - await computer.interface.copyToClipboard(); // Get clipboard content - - // File System Operations - await computer.interface.fileExists(path); // Check if file exists - await computer.interface.directoryExists(path); // Check if directory exists - await computer.interface.readText(path, "utf-8"); // Read file content - await computer.interface.writeText(path, content, "utf-8"); // Write file content - await computer.interface.readBytes(path); // Read file content as bytes - await computer.interface.writeBytes(path, content); // Write file content as bytes - await computer.interface.deleteFile(path); // Delete file - await computer.interface.createDir(path); // Create directory - await computer.interface.deleteDir(path); // Delete directory - await computer.interface.listDir(path); // List directory contents - - // Accessibility - await computer.interface.getAccessibilityTree(); // Get accessibility tree - ``` - - - +See the [Sandboxed Python](../computer-sdk/sandboxed-python) documentation for running Python functions securely in isolated environments on a remote Cua Computer. \ No newline at end of file diff --git a/libs/python/computer/README.md b/libs/python/computer/README.md index a75c4fe3..5d7c3c9b 100644 --- a/libs/python/computer/README.md +++ b/libs/python/computer/README.md @@ -65,80 +65,9 @@ Refer to this notebook for a step-by-step guide on how to use the Computer-Use I - [Computer-Use Interface (CUI)](https://github.com/trycua/cua/blob/main/notebooks/computer_nb.ipynb) -## Using the Gradio Computer UI - -The computer module includes a Gradio UI for creating and sharing demonstration data. We make it easy for people to build community datasets for better computer use models with an upload to Huggingface feature. - -```bash -# Install with UI support -pip install "cua-computer[ui]" -``` - -> **Note:** For precise control of the computer, we recommend using VNC or Screen Sharing instead of the Computer Gradio UI. - -### Building and Sharing Demonstrations with Huggingface - -Follow these steps to contribute your own demonstrations: - -#### 1. Set up Huggingface Access - -Set your HF_TOKEN in a .env file or in your environment variables: - -```bash -# In .env file -HF_TOKEN=your_huggingface_token -``` - -#### 2. Launch the Computer UI - -```python -# launch_ui.py -from computer.ui.gradio.app import create_gradio_ui -from dotenv import load_dotenv -load_dotenv('.env') - -app = create_gradio_ui() -app.launch(share=False) -``` - -For examples, see [Computer UI Examples](https://github.com/trycua/cua/tree/main/examples/computer_ui_examples.py) - -#### 3. Record Your Tasks - -
-View demonstration video - -
- -Record yourself performing various computer tasks using the UI. - -#### 4. Save Your Demonstrations - -
-View demonstration video - -
- -Save each task by picking a descriptive name and adding relevant tags (e.g., "office", "web-browsing", "coding"). - -#### 5. Record Additional Demonstrations - -Repeat steps 3 and 4 until you have a good amount of demonstrations covering different tasks and scenarios. - -#### 6. Upload to Huggingface - -
-View demonstration video - -
- -Upload your dataset to Huggingface by: -- Naming it as `{your_username}/{dataset_name}` -- Choosing public or private visibility -- Optionally selecting specific tags to upload only tasks with certain tags - -#### Examples and Resources - -- Example Dataset: [ddupont/test-dataset](https://huggingface.co/datasets/ddupont/test-dataset) -- Find Community Datasets: 🔍 [Browse CUA Datasets on Huggingface](https://huggingface.co/datasets?other=cua) +## Docs +- [Computers](https://trycua.com/docs/computer-sdk/computers) +- [Commands](https://trycua.com/docs/computer-sdk/commands) +- [Computer UI](https://trycua.com/docs/computer-sdk/computer-ui) +- [Sandboxed Python](https://trycua.com/docs/computer-sdk/sandboxed-python) diff --git a/libs/typescript/computer/README.md b/libs/typescript/computer/README.md index b51713c2..2505ee63 100644 --- a/libs/typescript/computer/README.md +++ b/libs/typescript/computer/README.md @@ -1,28 +1,35 @@ -# Cua Computer TypeScript Library +
+

+
+ + + + Shows my svg + +
-The TypeScript library for C/cua Computer - a powerful computer control and automation library. + [![TypeScript](https://img.shields.io/badge/TypeScript-333333?logo=typescript&logoColor=white&labelColor=333333)](#) + [![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#) + [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85) + [![NPM](https://img.shields.io/npm/v/@trycua/computer?color=333333)](https://www.npmjs.com/package/@trycua/computer) +

+
-## Overview +**@trycua/computer** is a Computer-Use Interface (CUI) framework powering Cua for interacting with local macOS and Linux sandboxes, Playwright-compatible, and pluggable with any AI agent systems (Cua, Langchain, CrewAI, AutoGen). Computer relies on [Lume](https://github.com/trycua/lume) for creating and managing sandbox environments. -This library is a TypeScript port of the Python computer library, providing the same functionality for controlling virtual machines and computer interfaces. It enables programmatic control of virtual machines through various providers and offers a consistent interface for interacting with the VM's operating system. +### Get started with Computer -## Installation - -```bash -npm install @trycua/computer -# or -pnpm add @trycua/computer -``` - -## Usage +
+ +
```typescript -import { Computer } from '@trycua/computer'; +import { Computer, OSType } from '@trycua/computer'; // Create a new computer instance const computer = new Computer({ osType: OSType.LINUX, - name: 's-linux-vm_id' + name: 's-linux-vm_id', apiKey: 'your-api-key' }); @@ -30,60 +37,47 @@ const computer = new Computer({ await computer.run(); // Get the computer interface for interaction -const interface = computer.interface; +const computerInterface = computer.interface; // Take a screenshot -const screenshot = await interface.getScreenshot(); +const screenshot = await computerInterface.getScreenshot(); +// In a Node.js environment, you might save it like this: +// import * as fs from 'fs'; +// fs.writeFileSync('screenshot.png', Buffer.from(screenshot)); // Click at coordinates -await interface.click(500, 300); +await computerInterface.click(500, 300); // Type text -await interface.typeText('Hello, world!'); +await computerInterface.typeText('Hello, world!'); // Stop the computer await computer.stop(); ``` -## Architecture +## Install -The library is organized into the following structure: - -### Core Components - -- **Computer Factory**: A factory object that creates appropriate computer instances -- **BaseComputer**: Abstract base class with shared functionality for all computer types -- **Types**: Type definitions for configuration options and shared interfaces - -### Provider Implementations - -- **Computer**: Implementation for cloud-based VMs - -## Development - -- Install dependencies: +To install the Computer-Use Interface (CUI): ```bash -pnpm install +npm install @trycua/computer +# or +pnpm add @trycua/computer ``` -- Run the unit tests: +The `@trycua/computer` package provides the TypeScript library for interacting with computer interfaces. -```bash -pnpm test -``` +## Run -- Build the library: +Refer to this example for a step-by-step guide on how to use the Computer-Use Interface (CUI): -```bash -pnpm build -``` +- [Computer-Use Interface (CUI)](https://github.com/trycua/cua/tree/main/examples/computer-example-ts) -- Type checking: +## Docs -```bash -pnpm typecheck -``` +- [Computers](https://trycua.com/docs/computer-sdk/computers) +- [Commands](https://trycua.com/docs/computer-sdk/commands) +- [Computer UI](https://trycua.com/docs/computer-sdk/computer-ui) ## License