A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Auto-Use Computer Use — drives your OS, browser, scours the web, writes your code. One agent, end to end.
🤖 Computer Use Framework for macOS & Windows
Let AI drive your computer — Autouse AI — Computer Use, now with both the macOS and Windows builds combined in a single repository. Control your entire OS with natural language. Browser automation, coding tasks, file management — all powered by vision-language models.
Features • Architecture • GUI Engine • Example Tasks • Providers • Setup • Author
grep, glob, view, and shell out, but can never write. Spawned in parallel, each in its own isolated session scratchpad.osascript execution with automatic TCC permission-dialog handling so first-run consent prompts don't stall the agent.Auto Use is not one model in a loop — it's a hierarchy of agents that can spawn more agents.
┌─────────────────────┐
│ Parent Agent │
│ (GUI · Web · OS) │
└──────────┬──────────┘
│ spawns
┌──────────▼──────────┐
│ CLI Agent │
│ (coding · shell) │
└──────────┬──────────┘
│ spawns ∞
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Minion │ │ Minion │ │ Minion │
│ scout │ … │ scout │ … │ scout │
└─────────┘ └─────────┘ └─────────┘
read-only: shell · view · grep · glob · scratchpad · exit
shell, view, grep, glob, scratchpad, exit. No writes, no Applescript, no GUI — just fast, isolated reconnaissance. Multiple minions run in parallel; the CLI Agent waits on all of them and merges results.Each Minion runs in its own session-isolated scratchpad at cli_minion/{session_id}/, with results polled from a per-run JSON file. Live progress streams to the frontend through minion_start / minion_line / minion_end events.
Invocation examples:
# Direct CLI Agent
python cli.py
# CLI Agent with a specific task
python -m Auto_Use.macOS_use.agent.coder --task "refactor the auth module"
# Single Minion for a quick read-only question
python -m Auto_Use.macOS_use.agent.minions --task "where is _validate_token defined and who calls it?"
Most "AI controls your screen" projects pick one approach. Auto Use uses three, in sequence, on every loop iteration:
AXUIElementCreateSystemWide from PyObjC's ApplicationServices walks the foreground window into a structured tree of buttons, text fields, links — each with coordinates, labels, and role.Quartz.CGWindowListCreateImage captures the display; PIL overlays numbered magenta bounding boxes onto every interactive element discovered in step 1. Retina / HiDPI scaling is handled automatically.Why hybrid? Pure-vision agents hallucinate coordinates and miss off-screen state. Pure-accessibility agents miss everything that lives in a canvas or video. Auto Use shows the model both, refreshed after every action, so it always reasons over current ground truth.
The AppleScript pathway is a first-class action type, not a bash escape hatch.
tell application "..." blocks via osascript._click_automation_allow_button() watches for macOS "Allow / Don't Allow" consent prompts mid-run and clicks through them so first-time automation doesn't deadlock.open for inactive apps, strips redundant activate / launch directives for already-running apps so foregrounding doesn't fight the user's focus.Shell access is wrapped in a Sandbox that the model can't escape by accident.
subprocess.Popen inside a designated Desktop workspace./system, /usr/sbin, /private/var.Enter name:, etc.) don't hang the agent.input_text to drive the dialog.{status, output, error, agent_location} so the agent always knows where it is in the filesystem.Live web search built into the agent loop — no manual copy-paste between browser and chat. Backed by Perplexity Sonar (or any Perplexity / OpenAI / Anthropic web-search-capable model when selected), the agent issues queries, reads results, and folds the findings straight into its next action.
Just describe what you want — Auto Use picks the right tool for the job.
"Uninstall VLC media player"
"Create a Python Flask API with user authentication"
"Why is AMD stock price going up?"
"Check disk space and clean up temp files"
"Send an iMessage to John saying I'll be 10 minutes late"
| Category | Examples |
|---|---|
| Browser | Fill forms, extract data, navigate sites, download files |
| Productivity | Create documents, manage spreadsheets, organize files |
| Development | Write code, debug errors, run tests, manage git |
| System | Install software, configure settings, manage processes |
| Research | Search web, compile information, generate reports |
Auto Use supports 6 LLM providers:
💡 Recommended for most users: We strongly encourage installing the latest binary build from our official website for a fully seamless installation and the complete UI experience — no manual setup required.
The steps below are for developers who want to run Auto Use directly from source.
Run the setup script
bash MacOS_setup.sh
Add your API key(s)
Copy the example env file and fill in your keys:
cp .env.example .env
Then open .env and add the API key for whichever provider(s) you want to use.
Run Auto Use — pick your experience:
python app.py # 🖼️ Full desktop UI (recommended) — webview frontend with live agent streams, minion progress, model switcher
python main.py # 💻 Terminal-only experience — same agents, no GUI, fully usable over SSH
Run the setup script
windows_setup.bat
Add your API key(s)
Copy the example env file and fill in your keys:
copy .env.example .env
Then open .env and add the API key for whichever provider(s) you want to use.
Run Auto Use — pick your experience:
python app.py :: 🖼️ Full desktop UI (recommended) — webview frontend with live agent streams, minion progress, model switcher
python main.py :: 💻 Terminal-only experience — same agents, no GUI, fully usable over SSH
Note: On Windows, Python 3.13.3 is the preferred version for best compatibility.
| Feature | Auto Use | Others |
|---|---|---|
| Multi-agent system | ✅ | ❌ |
| Domain knowledge injection | ✅ | ❌ |
| Multi-provider LLM support | ✅ | Limited |
| Vision-based automation | ✅ | ✅ |
| Coding agent | ✅ | ❌ |
| Read-only sub-agent scouts (Minion) | ✅ | ❌ |
| Unlimited sub-agent spawning | ✅ | ❌ |
| Hybrid Accessibility + Vision GUI control | ✅ | ❌ |
| Native AppleScript with auto-consent | ✅ | ❌ |
| Web search integration | ✅ | ❌ |
| Secure sandbox | ✅ | ❌ |
This repository supports both macOS and Windows — the two platform builds live side-by-side in the same repo:
Auto_Use/macOS_useAuto_Use/windows_useAshish Yadav — founder of Autouse AI
Licensed under the Apache License 2.0 — see LICENSE and NOTICE.
If you use, fork, reference, or derive from this project, you must:
NOTICE file.Yadav, Ashish. Autouse AI — Computer Use. Autouse AI, 2026. https://github.com/auto-use
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent assistance
Deterministic multi-agent pipeline for end-to-end software development, orchestrating CLI-based AI tools (e.g. Gemini, C
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.