Quickstart (for Developers)
Get started with cua in 5 steps
Get up and running with cua in 5 simple steps.
Introduction
cua combines Computer (interface) + Agent (AI) for automating desktop apps. Computer handles clicks/typing, Agent provides the intelligence.
Set Up Your Computer Environment
Choose how you want to run your cua computer. Cloud containers are recommended for the easiest setup:
Easiest & safest way to get started
- Go to trycua.com/signin
- Navigate to Dashboard > Containers > Create Instance
- Create a Medium, Ubuntu 22 container
- Note your container name and API key
Your cloud container will be automatically configured and ready to use.
- Install lume cli
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
- Start a local cua container
lume run macos-sequoia-cua:latest
- Enable Windows Sandbox (requires Windows 10 Pro/Enterprise or Windows 11)
- Install pywinsandbox dependency
pip install -U git+git://github.com/karkason/pywinsandbox.git
- Windows Sandbox will be automatically configured when you run the CLI
-
Install Docker Desktop or Docker Engine
-
Pull the CUA Ubuntu container
docker pull --platform=linux/amd64 trycua/cua-ubuntu:latest
Install cua
pip install "cua-agent[all]" cua-computer
# or install specific providers
pip install "cua-agent[openai]" # OpenAI computer-use-preview support
pip install "cua-agent[anthropic]" # Anthropic Claude support
pip install "cua-agent[omni]" # Omniparser + any LLM support
pip install "cua-agent[uitars]" # UI-TARS
pip install "cua-agent[uitars-mlx]" # UI-TARS + MLX support
pip install "cua-agent[uitars-hf]" # UI-TARS + Huggingface support
pip install "cua-agent[glm45v-hf]" # GLM-4.5V + Huggingface support
pip install "cua-agent[ui]" # Gradio UI support
npm install @trycua/computer
Using Computer
from computer import Computer
async with Computer(
os_type="linux",
provider_type="cloud",
name="your-container-name",
api_key="your-api-key"
) as computer:
# Take screenshot
screenshot = await computer.interface.screenshot()
# Click and type
await computer.interface.left_click(100, 100)
await computer.interface.type("Hello!")
import { Computer, OSType } from '@trycua/computer';
const computer = new Computer({
osType: OSType.LINUX,
name: "your-container-name",
apiKey: "your-api-key"
});
await computer.run();
try {
// Take screenshot
const screenshot = await computer.interface.screenshot();
// Click and type
await computer.interface.leftClick(100, 100);
await computer.interface.typeText("Hello!");
} finally {
await computer.close();
}
Using Agent
from agent import ComputerAgent
agent = ComputerAgent(
model="anthropic/claude-3-5-sonnet-20241022",
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])
Next Steps
- Learn about trajectory tracking and callbacks
- Join our Discord community for support