mac-use-mcp
MCP ServerFreeZero-dependency macOS desktop automation for AI agents. Screenshot, mouse, keyboard, clipboard, and window control via MCP. 18 tools, macOS 13+, one command: npx mac-use-mcp.
Capabilities17 decomposed
macos screenshot capture with mcp protocol binding
Medium confidenceCaptures full-screen or region-specific screenshots from macOS and returns image data via MCP tool interface. Uses native macOS APIs (likely screencapture or CGImage) to grab pixel data, encodes as base64 or file path, and exposes through standardized MCP tool schema for AI agents to request visual context without subprocess overhead.
Exposes native macOS screenshot capability directly through MCP protocol without subprocess spawning, enabling zero-latency visual context injection into agent decision loops; integrates with MCP's standardized tool schema for seamless multi-provider LLM compatibility
Faster and simpler than Selenium/Playwright screenshot methods because it bypasses browser-specific APIs and uses direct OS-level graphics capture, with native MCP binding eliminating JSON serialization overhead
mouse movement and click control via mcp
Medium confidenceProvides absolute and relative mouse positioning, click (left/right/middle), double-click, and drag operations through MCP tool interface. Translates agent commands into native macOS event injection (likely using CGEvent APIs) with coordinate mapping and optional velocity/acceleration curves for smooth automation.
Integrates mouse control directly into MCP tool schema with coordinate-based targeting, allowing agents to chain screenshot analysis → coordinate extraction → click execution in a single agent loop without external tool dependencies or subprocess management
More direct than PyAutoGUI or xdotool because it uses native macOS CGEvent APIs with MCP protocol binding, eliminating subprocess overhead and enabling real-time feedback loops between vision analysis and mouse actions
multi-monitor and display management via mcp
Medium confidenceQueries display configuration (monitor count, resolution, position, color profile), retrieves screen bounds for multi-monitor setups, and enables agents to target screenshots or mouse operations to specific displays. Uses macOS display APIs (CGDisplay) to enumerate and query display properties.
Provides multi-monitor awareness through MCP by querying macOS display APIs (CGDisplay), enabling agents to target screenshots and mouse operations to specific displays and adapt to variable display configurations without hardcoded coordinates
More flexible than single-display automation because it queries actual display configuration at runtime, enabling agents to work correctly across different monitor setups without manual coordinate adjustments
system preferences and settings access via mcp
Medium confidenceReads system preferences and settings (display brightness, volume, keyboard repeat rate, accessibility settings) through MCP tools using macOS preferences APIs (NSUserDefaults, System Preferences). Enables agents to query and adapt to system configuration without direct file system access.
Exposes macOS system preferences through MCP tools using NSUserDefaults APIs, enabling agents to query system configuration and accessibility settings to adapt automation behavior without direct file system access or AppleScript
More reliable than AppleScript preference queries because it uses native macOS preference APIs with structured output, enabling agents to detect accessibility settings and system configuration to ensure automation compatibility
audio playback and system sound control via mcp
Medium confidencePlays audio files or system sounds through MCP tools, controls volume, and manages audio output devices. Uses native macOS audio APIs (AVAudioPlayer, AudioToolbox) to handle audio playback without subprocess calls, enabling agents to provide audio feedback or trigger sound-based workflows.
Integrates audio playback and volume control directly into MCP tools using native macOS audio APIs (AVAudioPlayer), enabling agents to provide audio feedback without subprocess calls or external audio tools
More direct than shell-based audio playback because it uses native macOS audio APIs with structured output, enabling agents to control volume and select audio devices without parsing command output
sleep/wake and power management via mcp
Medium confidenceControls system sleep/wake state, retrieves power status (battery level, charging state, time remaining), and manages power-related settings through MCP tools. Uses macOS power management APIs (IOKit, NSWorkspace) to query and control power state without privileged subprocess calls.
Exposes macOS power management APIs through MCP tools, enabling agents to query battery status and prevent system sleep during long-running workflows without privileged subprocess calls or AppleScript
More reliable than shell-based power management because it uses native macOS power APIs (IOKit) with structured output, enabling agents to make power-aware decisions and prevent sleep without parsing command output
notification and alert delivery via mcp
Medium confidenceSends system notifications and alerts to the user through macOS notification center using native notification APIs (NSUserNotification, UNUserNotificationCenter). Enables agents to notify users of automation progress, errors, or completion without blocking automation workflow.
Integrates macOS notification center directly into MCP tools using native notification APIs, enabling agents to send system notifications without subprocess calls or external notification services
More native than third-party notification services because it uses macOS notification center with system integration, enabling notifications to appear in notification center and lock screen without external dependencies
file system operations and finder integration via mcp
Medium confidencePerforms file system operations (create, delete, move, copy, list) and integrates with Finder through MCP tools. Uses native macOS file APIs (FileManager, NSWorkspace) to manipulate files and reveal them in Finder without shell commands, enabling agents to manage files as part of automation workflows.
Integrates file system operations and Finder integration directly into MCP tools using native macOS FileManager and NSWorkspace APIs, enabling agents to manage files and reveal them in Finder without shell commands
More integrated than shell-based file operations because it uses native macOS file APIs with structured output and Finder integration, enabling agents to manage files and reveal them in Finder without parsing command output
mcp protocol server with zero-dependency deployment
Medium confidenceProvides a complete MCP server implementation that exposes all desktop automation capabilities through the Model Context Protocol without external dependencies. Runs as a standalone Node.js process that can be invoked with a single command (npx mac-use-mcp), automatically handling MCP protocol negotiation, tool schema generation, and client communication.
Provides a complete, zero-dependency MCP server implementation that exposes 18+ desktop automation tools through standard MCP protocol, deployable with a single command (npx mac-use-mcp) without npm install or dependency management
Simpler than building custom MCP servers because it provides pre-built tool implementations and protocol handling, enabling developers to integrate desktop automation into AI agents without implementing MCP protocol or tool schemas from scratch
keyboard input and hotkey simulation via mcp
Medium confidenceSends individual keystrokes, key combinations (Cmd+C, Shift+Tab), and text input sequences through MCP tools using native macOS event injection. Supports modifier keys (Command, Option, Control, Shift), special keys (Return, Escape, Tab), and text typing with configurable delay between characters to simulate human input speed.
Combines individual keystroke injection with modifier key support and text typing in a single MCP tool interface, allowing agents to handle both programmatic shortcuts (Cmd+S) and natural text input without separate tool calls or complex key sequencing logic
Simpler than xdotool or AppleScript keyboard automation because it provides a unified MCP interface with built-in modifier key handling, reducing agent prompt complexity and eliminating the need for external scripting languages
clipboard read/write with format preservation via mcp
Medium confidenceReads and writes clipboard content through MCP tools, supporting plain text, rich text (RTF), HTML, and image data. Uses native macOS pasteboard APIs (NSPasteboard) to handle multiple clipboard formats simultaneously, enabling agents to exchange data with applications via copy-paste operations without file I/O.
Exposes macOS pasteboard API through MCP with multi-format support (text, HTML, RTF, images), allowing agents to leverage native copy-paste workflows without file I/O or application-specific APIs, with automatic format detection on read operations
More flexible than simple text clipboard tools because it preserves formatting and supports multiple data types, enabling agents to work with rich content from design tools, browsers, and office applications without format conversion
window management and focus control via mcp
Medium confidenceLists open windows, retrieves window properties (title, position, size, app name), focuses specific windows, and performs window operations (minimize, maximize, close) through MCP tools. Uses macOS Accessibility API (AXUIElement) to query window hierarchy and manage focus without subprocess calls.
Provides unified window enumeration and control through MCP by querying macOS Accessibility API (AXUIElement), enabling agents to discover and manage windows without parsing window manager output or using AppleScript, with direct focus control for multi-window workflows
More reliable than AppleScript window management because it uses native Accessibility APIs with structured data output, enabling agents to reliably identify windows by multiple attributes (title, app, PID) and chain window operations with screenshot context
application launch and process control via mcp
Medium confidenceLaunches applications by bundle identifier or path, retrieves running process list with metadata (PID, memory, CPU), and terminates processes through MCP tools. Uses native macOS process APIs (NSWorkspace, Process Manager) to manage application lifecycle without shell subprocess calls.
Integrates macOS application launch and process management directly into MCP tools using NSWorkspace APIs, enabling agents to discover, launch, and manage applications without shell commands or AppleScript, with structured process metadata for intelligent app selection
More reliable than shell-based app launching because it uses native macOS APIs with structured output, enabling agents to verify app launch success and retrieve process metadata for window targeting without parsing command output
system event monitoring and notification via mcp
Medium confidenceMonitors system events (application focus changes, window creation/destruction, clipboard changes) and sends notifications to MCP clients through server-initiated messages or polling endpoints. Uses macOS event stream APIs (CGEventTap, NSWorkspaceNotification) to detect state changes and trigger agent actions reactively.
Exposes macOS system event streams through MCP protocol, enabling agents to react to focus changes, window events, and clipboard updates without polling, using native event APIs (CGEventTap, NSWorkspaceNotification) for low-latency event delivery
More efficient than polling-based monitoring because it uses native macOS event streams with server-initiated notifications, reducing agent latency and CPU overhead compared to repeated screenshot/window list queries
text selection and clipboard-based content extraction via mcp
Medium confidenceSelects text in focused applications using keyboard shortcuts (Cmd+A, Shift+Arrow keys) or mouse-based selection, copies to clipboard, and retrieves the selected content through MCP. Chains keyboard/mouse operations with clipboard read to extract text without OCR or direct text API access.
Chains keyboard/mouse selection operations with clipboard read to extract text from applications without direct text APIs, enabling agents to extract content from legacy apps or web pages by automating user-level copy-paste workflows
More universal than application-specific text APIs because it works with any application supporting standard text selection, enabling agents to extract content from web browsers, PDFs, and legacy applications without app-specific integrations
drag-and-drop file operations via mcp
Medium confidencePerforms drag-and-drop operations between applications by simulating mouse down, movement, and release events with file path payloads. Enables agents to move/copy files between Finder windows or drag files into applications (e.g., uploading to web forms) without file system APIs or application-specific drag handlers.
Simulates drag-and-drop operations through coordinated mouse events with file path payloads, enabling agents to automate file operations between Finder and applications without file system APIs or application-specific handlers
More flexible than direct file system operations because it works with applications that only support drag-and-drop input, enabling agents to automate file uploads and transfers in web applications and legacy desktop apps
screen region ocr and text recognition via mcp
Medium confidencePerforms optical character recognition (OCR) on screenshot regions to extract text from images, UI elements, or documents. Integrates with macOS Vision framework or third-party OCR services to convert image regions into machine-readable text, enabling agents to read text from non-selectable UI elements or scanned documents.
Integrates OCR directly into MCP tools for screenshot regions, enabling agents to extract text from non-selectable UI elements and images without external OCR services, using native macOS Vision framework or pluggable OCR backends
More integrated than separate OCR tools because it operates on screenshot regions directly, enabling agents to chain screenshot capture → OCR → decision-making in a single automation loop without intermediate file I/O
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with mac-use-mcp, ranked by overlap. Discovered automatically through the match graph.
@atomicbotai/computer-use-mcp
MCP server exposing desktop computer-use as an MCP tool
@github/computer-use-mcp
Computer Use MCP Server
just-every/mcp-screenshot-website-fast
** - High-quality screenshot capture optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks (1.15 megapixels) with configurable viewports and wait strategies for dynamic content.
@github/computer-use-mcp
Computer Use MCP Server
Screeny
** - Privacy-first macOS MCP server that provides visual context for AI agents through window screenshots
url-to-image-mcp
MCP server: url-to-image-mcp
Best For
- ✓AI agents performing desktop automation workflows
- ✓developers building macOS-native AI assistants
- ✓teams integrating vision-based UI testing with LLM agents
- ✓AI agents automating GUI workflows on macOS
- ✓developers building no-code automation tools powered by LLMs
- ✓QA automation teams using AI agents for cross-application testing
- ✓AI agents operating in multi-monitor environments requiring display-aware automation
- ✓developers building automation that must work across different display configurations
Known Limitations
- ⚠Screenshot capture may include sensitive data (passwords, PII) — no built-in redaction or filtering
- ⚠Region-based captures require precise coordinate specification; no automatic window detection
- ⚠Performance degrades with high-frequency capture loops (>10 screenshots/second) due to I/O overhead
- ⚠No built-in collision detection — agent must verify target coordinates are valid before clicking
- ⚠Click timing is synchronous; rapid click sequences may be rate-limited by macOS event queue
- ⚠Drag operations require explicit start/end coordinates; no gesture recognition or path interpolation
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
About
Zero-dependency macOS desktop automation for AI agents. Screenshot, mouse, keyboard, clipboard, and window control via MCP. 18 tools, macOS 13+, one command: npx mac-use-mcp.
Categories
Alternatives to mac-use-mcp
Search the Supabase docs for up-to-date guidance and troubleshoot errors quickly. Manage organizations, projects, databases, and Edge Functions, including migrations, SQL, logs, advisors, keys, and type generation, in one flow. Create and manage development branches to iterate safely, confirm costs
Compare →AI-optimized web search and content extraction via Tavily MCP.
Compare →Scrape websites and extract structured data via Firecrawl MCP.
Compare →Are you the builder of mac-use-mcp?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →