agent-computer-use

agent-computer-use reads the accessibility tree — the same structure screen readers use. It sees every button, text field, and menu item in any app. You point, it acts.

Snapshot

Capture every interactive element. Each gets a ref.

$ agent-cu snapshot -a Calculator -i -c
[@e1] button "All Clear"   [@e5] button "7"
[@e8] button "Multiply"    [@e11] button "6"
[@e20] button "Equals"

Act

Use refs to click, type, or read.

$ agent-cu click @e5 && agent-cu click @e8 && agent-cu click @e11 && agent-cu click @e20
$ agent-cu text -a Calculator
42

Re-snapshot

UI changed? Snapshot again for fresh refs.

$ agent-cu snapshot -a Calculator -i -c

What can you do with it?

Anything you'd do by clicking around:

Open Maps and search for the Colosseum

Send a Slack message to a teammate

Fill out a form in a browser

Multiply numbers in Calculator

Read the price of a flight from a booking site

Scrape data from a desktop app into a spreadsheet

Click through a setup wizard automatically

Automate a multi-step workflow with a YAML file

NextQuick Start