agent-computer-use

Computer use CLI for AI Agents.

macOS
Windows
Linux

agent-computer-use lets you control desktop apps from the terminal. Click buttons, type into fields, read what's on screen. All using a single CLI.

Built for AI agents. An agent can snapshot the screen, decide what to click, and act while you sit back and watch. Star it on GitHub

npm install -g agent-cu

How it works

agent-computer-use reads the accessibility tree — the same structure screen readers use. It sees every button, text field, and menu item in any app. You point, it acts.

1

Snapshot

Capture every interactive element. Each gets a ref.
$ agent-cu snapshot -a Calculator -i -c
[@e1] button "All Clear"   [@e5] button "7"
[@e8] button "Multiply"    [@e11] button "6"
[@e20] button "Equals"
2

Act

Use refs to click, type, or read.
$ agent-cu click @e5 && agent-cu click @e8 && agent-cu click @e11 && agent-cu click @e20
$ agent-cu text -a Calculator
42
3

Re-snapshot

UI changed? Snapshot again for fresh refs.
$ agent-cu snapshot -a Calculator -i -c

What can you do with it?

Anything you'd do by clicking around:

Open Maps and search for the Colosseum
Send a Slack message to a teammate
Fill out a form in a browser
Multiply numbers in Calculator
Read the price of a flight from a booking site
Scrape data from a desktop app into a spreadsheet
Click through a setup wizard automatically
Automate a multi-step workflow with a YAML file