What can mac-use-mcp do?

macos screenshot capture with mcp protocol binding, mouse movement and click control via mcp, multi-monitor and display management via mcp, system preferences and settings access via mcp, audio playback and system sound control via mcp, sleep/wake and power management via mcp, notification and alert delivery via mcp, file system operations and finder integration via mcp, mcp protocol server with zero-dependency deployment, keyboard input and hotkey simulation via mcp, clipboard read/write with format preservation via mcp, window management and focus control via mcp, application launch and process control via mcp, system event monitoring and notification via mcp, text selection and clipboard-based content extraction via mcp, drag-and-drop file operations via mcp, screen region ocr and text recognition via mcp

mac-use-mcp

MCP ServerFree

Zero-dependency macOS desktop automation for AI agents. Screenshot, mouse, keyboard, clipboard, and window control via MCP. 18 tools, macOS 13+, one command: npx mac-use-mcp.

Open Source

/ 100

17 capabilities

Capabilities17 decomposed

macos screenshot capture with mcp protocol binding

Medium confidence

Captures full-screen or region-specific screenshots from macOS and returns image data via MCP tool interface. Uses native macOS APIs (likely screencapture or CGImage) to grab pixel data, encodes as base64 or file path, and exposes through standardized MCP tool schema for AI agents to request visual context without subprocess overhead.

Solves for

I need my AI agent to see what's currently on the screen to understand the desktop stateI want to capture specific window or region screenshots for visual analysis by an LLMI need to feed desktop screenshots into a vision model to make decisions about UI automation

Best for

AI agents performing desktop automation workflows

developers building macOS-native AI assistants

teams integrating vision-based UI testing with LLM agents

Requires

macOS 13.0 or later

MCP client capable of handling binary/base64 image data

Screen recording permissions granted to the MCP server process

Limitations

Screenshot capture may include sensitive data (passwords, PII) — no built-in redaction or filtering

Region-based captures require precise coordinate specification; no automatic window detection

Performance degrades with high-frequency capture loops (>10 screenshots/second) due to I/O overhead

What makes it unique

Exposes native macOS screenshot capability directly through MCP protocol without subprocess spawning, enabling zero-latency visual context injection into agent decision loops; integrates with MCP's standardized tool schema for seamless multi-provider LLM compatibility

vs alternatives

Faster and simpler than Selenium/Playwright screenshot methods because it bypasses browser-specific APIs and uses direct OS-level graphics capture, with native MCP binding eliminating JSON serialization overhead

mouse movement and click control via mcp

Medium confidence

Provides absolute and relative mouse positioning, click (left/right/middle), double-click, and drag operations through MCP tool interface. Translates agent commands into native macOS event injection (likely using CGEvent APIs) with coordinate mapping and optional velocity/acceleration curves for smooth automation.

Solves for

I want my agent to click buttons, links, or UI elements identified from screenshotsI need to move the mouse to specific coordinates and perform drag-and-drop operationsI want to simulate multi-click sequences (double-click, triple-click) for text selection

Best for

AI agents automating GUI workflows on macOS

developers building no-code automation tools powered by LLMs

QA automation teams using AI agents for cross-application testing

Requires

macOS 13.0 or later

Accessibility permissions granted to MCP server process (System Preferences > Security & Privacy > Accessibility)

MCP client with numeric coordinate input support

Limitations

No built-in collision detection — agent must verify target coordinates are valid before clicking

Click timing is synchronous; rapid click sequences may be rate-limited by macOS event queue

Drag operations require explicit start/end coordinates; no gesture recognition or path interpolation

What makes it unique

Integrates mouse control directly into MCP tool schema with coordinate-based targeting, allowing agents to chain screenshot analysis → coordinate extraction → click execution in a single agent loop without external tool dependencies or subprocess management

vs alternatives

More direct than PyAutoGUI or xdotool because it uses native macOS CGEvent APIs with MCP protocol binding, eliminating subprocess overhead and enabling real-time feedback loops between vision analysis and mouse actions

multi-monitor and display management via mcp

Medium confidence

Queries display configuration (monitor count, resolution, position, color profile), retrieves screen bounds for multi-monitor setups, and enables agents to target screenshots or mouse operations to specific displays. Uses macOS display APIs (CGDisplay) to enumerate and query display properties.

Solves for

I need my agent to work with multi-monitor setups and target specific displays for screenshots or clicksI want to query display resolution and positioning to calculate correct coordinates for mouse operationsI need to detect when displays are added/removed or resolution changes to adapt automation workflows

Best for

AI agents operating in multi-monitor environments requiring display-aware automation

developers building automation that must work across different display configurations

teams automating workflows on systems with variable display setups (docking stations, external monitors)

Requires

macOS 13.0 or later

No special permissions required for display queries

Limitations

Display enumeration is static; changes to display configuration (adding/removing monitors) require re-querying

Coordinate systems vary by display arrangement; agent must account for negative coordinates or non-contiguous display layouts

Color profile queries may not reflect actual display capabilities; profile data is metadata only

What makes it unique

Provides multi-monitor awareness through MCP by querying macOS display APIs (CGDisplay), enabling agents to target screenshots and mouse operations to specific displays and adapt to variable display configurations without hardcoded coordinates

vs alternatives

More flexible than single-display automation because it queries actual display configuration at runtime, enabling agents to work correctly across different monitor setups without manual coordinate adjustments

system preferences and settings access via mcp

Medium confidence

Reads system preferences and settings (display brightness, volume, keyboard repeat rate, accessibility settings) through MCP tools using macOS preferences APIs (NSUserDefaults, System Preferences). Enables agents to query and adapt to system configuration without direct file system access.

Solves for

I need my agent to check system settings (brightness, volume, accessibility) to adapt automation behaviorI want to detect accessibility settings to ensure automation is compatible with user configurationI need to query keyboard/mouse settings to adjust automation timing or behavior

Best for

AI agents adapting automation to system configuration and accessibility settings

developers building automation that must respect user preferences

teams automating workflows that depend on specific system settings

Requires

macOS 13.0 or later

Specific preference domain and key names

No special permissions for reading standard preferences; some system preferences may require elevated access

Limitations

Preference access is read-only; agents cannot modify system settings through MCP

Preference keys are application-specific and not standardized; querying arbitrary preferences may fail

Some preferences require elevated privileges; access may be denied for security-sensitive settings

What makes it unique

Exposes macOS system preferences through MCP tools using NSUserDefaults APIs, enabling agents to query system configuration and accessibility settings to adapt automation behavior without direct file system access or AppleScript

vs alternatives

More reliable than AppleScript preference queries because it uses native macOS preference APIs with structured output, enabling agents to detect accessibility settings and system configuration to ensure automation compatibility

audio playback and system sound control via mcp

Medium confidence

Plays audio files or system sounds through MCP tools, controls volume, and manages audio output devices. Uses native macOS audio APIs (AVAudioPlayer, AudioToolbox) to handle audio playback without subprocess calls, enabling agents to provide audio feedback or trigger sound-based workflows.

Solves for

I need my agent to play notification sounds or audio feedback during automation workflowsI want to control system volume or switch audio output devices as part of automationI need to play audio files for testing or user notification purposes

Best for

AI agents providing audio feedback during automation workflows

developers building accessibility features that require audio output

teams automating workflows that depend on audio notifications or alerts

Requires

macOS 13.0 or later

Audio file path or system sound identifier

Audio output device available (speakers, headphones, etc.)

Limitations

Audio playback is asynchronous; agent cannot determine when playback completes without polling

Volume control affects system volume; affects all applications, not just automation

Audio device switching may fail if device is in use by other applications

What makes it unique

Integrates audio playback and volume control directly into MCP tools using native macOS audio APIs (AVAudioPlayer), enabling agents to provide audio feedback without subprocess calls or external audio tools

vs alternatives

More direct than shell-based audio playback because it uses native macOS audio APIs with structured output, enabling agents to control volume and select audio devices without parsing command output

sleep/wake and power management via mcp

Medium confidence

Controls system sleep/wake state, retrieves power status (battery level, charging state, time remaining), and manages power-related settings through MCP tools. Uses macOS power management APIs (IOKit, NSWorkspace) to query and control power state without privileged subprocess calls.

Solves for

I need my agent to check battery level and charging status to decide whether to continue automationI want to prevent system sleep during long-running automation workflowsI need to wake the system or manage power state as part of automation sequences

Best for

AI agents managing power state during long-running automation workflows

developers building battery-aware automation that adapts to power conditions

teams automating workflows on laptops or systems with variable power states

Requires

macOS 13.0 or later

No special permissions for querying power status; sleep prevention may require elevated privileges

Limitations

Sleep prevention requires continuous assertion; assertion is released when MCP server terminates

Battery status queries may be delayed or approximate; not suitable for real-time power monitoring

Wake operations may fail if system is in deep sleep or hibernation

What makes it unique

Exposes macOS power management APIs through MCP tools, enabling agents to query battery status and prevent system sleep during long-running workflows without privileged subprocess calls or AppleScript

vs alternatives

More reliable than shell-based power management because it uses native macOS power APIs (IOKit) with structured output, enabling agents to make power-aware decisions and prevent sleep without parsing command output

notification and alert delivery via mcp

Medium confidence

Sends system notifications and alerts to the user through macOS notification center using native notification APIs (NSUserNotification, UNUserNotificationCenter). Enables agents to notify users of automation progress, errors, or completion without blocking automation workflow.

Solves for

I need my agent to notify the user when automation completes or encounters errorsI want to send progress updates or status alerts during long-running workflowsI need to prompt the user for confirmation or input through notification interactions

Best for

AI agents providing user feedback during automation workflows

developers building automation that requires user awareness or interaction

teams automating workflows that need to notify users of completion or errors

Requires

macOS 13.0 or later

Notification permissions granted (usually automatic for MCP server)

Limitations

Notifications are non-blocking; user may dismiss without reading or acting on them

Notification interaction (clicking buttons) requires user action; agent cannot programmatically respond to user choices

Notification persistence depends on user settings; notifications may be automatically cleared

What makes it unique

Integrates macOS notification center directly into MCP tools using native notification APIs, enabling agents to send system notifications without subprocess calls or external notification services

vs alternatives

More native than third-party notification services because it uses macOS notification center with system integration, enabling notifications to appear in notification center and lock screen without external dependencies

file system operations and finder integration via mcp

Medium confidence

Performs file system operations (create, delete, move, copy, list) and integrates with Finder through MCP tools. Uses native macOS file APIs (FileManager, NSWorkspace) to manipulate files and reveal them in Finder without shell commands, enabling agents to manage files as part of automation workflows.

Solves for

I need my agent to create, delete, or move files as part of automation workflowsI want to reveal files in Finder or open them with specific applicationsI need to list directory contents and filter files for processing

Best for

AI agents managing files as part of automation workflows

developers building file management automation integrated with Finder

teams automating workflows that require file system operations

Requires

macOS 13.0 or later

File system permissions for target directories

Absolute file paths

Limitations

File operations are synchronous; large file operations may block automation

File permissions may prevent operations; agent must handle permission errors gracefully

File paths must be absolute; relative paths may not resolve correctly

What makes it unique

Integrates file system operations and Finder integration directly into MCP tools using native macOS FileManager and NSWorkspace APIs, enabling agents to manage files and reveal them in Finder without shell commands

vs alternatives

More integrated than shell-based file operations because it uses native macOS file APIs with structured output and Finder integration, enabling agents to manage files and reveal them in Finder without parsing command output

mcp protocol server with zero-dependency deployment

Medium confidence

Provides a complete MCP server implementation that exposes all desktop automation capabilities through the Model Context Protocol without external dependencies. Runs as a standalone Node.js process that can be invoked with a single command (npx mac-use-mcp), automatically handling MCP protocol negotiation, tool schema generation, and client communication.

Solves for

I want to quickly deploy desktop automation capabilities to any MCP-compatible AI client without complex setupI need a zero-dependency MCP server that works out-of-the-box on macOS 13+I want to integrate desktop automation into my AI agent framework using standard MCP protocol

Best for

AI agent developers integrating desktop automation via MCP protocol

teams deploying automation to macOS systems without complex dependency management

developers building multi-tool AI agents that need standardized tool interfaces

Requires

Node.js 18+ (or npx with Node.js available)

macOS 13.0 or later

MCP-compatible AI client (Claude, custom LLM agents, etc.)

Limitations

Node.js runtime required; not suitable for Python-only or other language-specific environments

Single-process architecture; no built-in clustering or load balancing for multiple concurrent clients

MCP protocol overhead adds ~50-100ms per tool call due to serialization and protocol handling

What makes it unique

Provides a complete, zero-dependency MCP server implementation that exposes 18+ desktop automation tools through standard MCP protocol, deployable with a single command (npx mac-use-mcp) without npm install or dependency management

vs alternatives

Simpler than building custom MCP servers because it provides pre-built tool implementations and protocol handling, enabling developers to integrate desktop automation into AI agents without implementing MCP protocol or tool schemas from scratch

keyboard input and hotkey simulation via mcp

Medium confidence

Sends individual keystrokes, key combinations (Cmd+C, Shift+Tab), and text input sequences through MCP tools using native macOS event injection. Supports modifier keys (Command, Option, Control, Shift), special keys (Return, Escape, Tab), and text typing with configurable delay between characters to simulate human input speed.

Solves for

I need my agent to type text into focused input fields or search boxesI want to trigger keyboard shortcuts (Cmd+S, Cmd+Z) to control applicationsI need to send key sequences for navigation (arrow keys, Page Down) or form submission (Return)

Best for

AI agents filling forms or entering data into web/desktop applications

developers automating keyboard-driven workflows (terminal commands, IDE shortcuts)

teams building AI-powered data entry or content creation tools

Requires

macOS 13.0 or later

Accessibility permissions granted to MCP server process

Target application must support keyboard input (not all apps accept programmatic key events)

Limitations

No keyboard layout awareness — assumes QWERTY; non-QWERTY layouts may produce incorrect characters

Character delay simulation is approximate; cannot perfectly match human typing patterns with variable timing

No input validation — agent must ensure target field is focused before typing to avoid misdirected input

What makes it unique

Combines individual keystroke injection with modifier key support and text typing in a single MCP tool interface, allowing agents to handle both programmatic shortcuts (Cmd+S) and natural text input without separate tool calls or complex key sequencing logic

vs alternatives

Simpler than xdotool or AppleScript keyboard automation because it provides a unified MCP interface with built-in modifier key handling, reducing agent prompt complexity and eliminating the need for external scripting languages

clipboard read/write with format preservation via mcp

Medium confidence

Reads and writes clipboard content through MCP tools, supporting plain text, rich text (RTF), HTML, and image data. Uses native macOS pasteboard APIs (NSPasteboard) to handle multiple clipboard formats simultaneously, enabling agents to exchange data with applications via copy-paste operations without file I/O.

Solves for

I want my agent to copy text from the clipboard to analyze or process itI need to paste generated content (code, text, images) into applications via clipboardI want to preserve formatting when copying rich text or HTML between applications

Best for

AI agents integrating with copy-paste workflows in productivity apps

developers building content generation tools that feed output into existing applications

teams automating data transfer between applications without file system intermediaries

Requires

macOS 13.0 or later

MCP client with support for binary/base64 data in tool responses

No special permissions required (clipboard access is standard)

Limitations

Clipboard is shared system resource — concurrent clipboard operations from multiple agents may cause race conditions

Image clipboard data requires base64 encoding/decoding; large images may exceed MCP message size limits

No clipboard history — only current clipboard content accessible; previous clipboard states are lost

What makes it unique

Exposes macOS pasteboard API through MCP with multi-format support (text, HTML, RTF, images), allowing agents to leverage native copy-paste workflows without file I/O or application-specific APIs, with automatic format detection on read operations

vs alternatives

More flexible than simple text clipboard tools because it preserves formatting and supports multiple data types, enabling agents to work with rich content from design tools, browsers, and office applications without format conversion

window management and focus control via mcp

Medium confidence

Lists open windows, retrieves window properties (title, position, size, app name), focuses specific windows, and performs window operations (minimize, maximize, close) through MCP tools. Uses macOS Accessibility API (AXUIElement) to query window hierarchy and manage focus without subprocess calls.

Solves for

I need my agent to identify which application window is currently active or find a specific window by titleI want to focus a particular application window before sending keyboard or mouse commands to itI need to get window coordinates and dimensions to target mouse clicks or screenshots to specific windows

Best for

AI agents managing multi-window workflows or switching between applications

developers building window-aware automation that targets specific app windows

teams automating workflows across multiple applications with precise window control

Requires

macOS 13.0 or later

Accessibility permissions granted to MCP server process

Target applications must support Accessibility API (most standard macOS apps do)

Limitations

Window list may be incomplete for hidden or minimized windows; not all windows are enumerable via Accessibility API

Window titles are not guaranteed unique; multiple windows with same title require additional filtering by app name or PID

Window position/size queries may return stale data if windows are being resized by user simultaneously

What makes it unique

Provides unified window enumeration and control through MCP by querying macOS Accessibility API (AXUIElement), enabling agents to discover and manage windows without parsing window manager output or using AppleScript, with direct focus control for multi-window workflows

vs alternatives

More reliable than AppleScript window management because it uses native Accessibility APIs with structured data output, enabling agents to reliably identify windows by multiple attributes (title, app, PID) and chain window operations with screenshot context

application launch and process control via mcp

Medium confidence

Launches applications by bundle identifier or path, retrieves running process list with metadata (PID, memory, CPU), and terminates processes through MCP tools. Uses native macOS process APIs (NSWorkspace, Process Manager) to manage application lifecycle without shell subprocess calls.

Solves for

I need my agent to launch a specific application (e.g., Safari, Terminal) before automating itI want to check if an application is running and get its process ID for window targetingI need to close or force-quit an application as part of an automation workflow

Best for

AI agents orchestrating multi-application workflows requiring app lifecycle management

developers building automation that depends on specific applications being available

teams automating complex workflows spanning multiple applications with startup/shutdown sequences

Requires

macOS 13.0 or later

Application bundle identifiers or full paths to executables

No special permissions required for launching standard apps; some system apps may require elevated privileges

Limitations

Application launch is asynchronous; agent must implement polling or delays to wait for app to fully load before automating

Bundle identifier lookup may fail for non-standard app installations or third-party apps; requires exact bundle ID

Process termination is forceful (SIGKILL); no graceful shutdown or unsaved data warnings

What makes it unique

Integrates macOS application launch and process management directly into MCP tools using NSWorkspace APIs, enabling agents to discover, launch, and manage applications without shell commands or AppleScript, with structured process metadata for intelligent app selection

vs alternatives

More reliable than shell-based app launching because it uses native macOS APIs with structured output, enabling agents to verify app launch success and retrieve process metadata for window targeting without parsing command output

system event monitoring and notification via mcp

Medium confidence

Monitors system events (application focus changes, window creation/destruction, clipboard changes) and sends notifications to MCP clients through server-initiated messages or polling endpoints. Uses macOS event stream APIs (CGEventTap, NSWorkspaceNotification) to detect state changes and trigger agent actions reactively.

Solves for

I want my agent to react when the user switches to a different application or windowI need to detect when clipboard content changes so my agent can process new dataI want to monitor for specific system events (window opened, app launched) to trigger automation workflows

Best for

AI agents implementing reactive workflows triggered by system state changes

developers building event-driven automation that responds to user actions

teams automating workflows that depend on detecting application or window changes

Requires

macOS 13.0 or later

Accessibility permissions for event monitoring

MCP client supporting server-initiated notifications or polling for event updates

Limitations

Event monitoring requires continuous polling or event stream subscription; adds background CPU overhead

Some events (e.g., clipboard changes) may be delayed or missed if event queue is saturated

Event filtering is coarse-grained; no fine-grained filtering for specific window types or applications

What makes it unique

Exposes macOS system event streams through MCP protocol, enabling agents to react to focus changes, window events, and clipboard updates without polling, using native event APIs (CGEventTap, NSWorkspaceNotification) for low-latency event delivery

vs alternatives

More efficient than polling-based monitoring because it uses native macOS event streams with server-initiated notifications, reducing agent latency and CPU overhead compared to repeated screenshot/window list queries

text selection and clipboard-based content extraction via mcp

Medium confidence

Selects text in focused applications using keyboard shortcuts (Cmd+A, Shift+Arrow keys) or mouse-based selection, copies to clipboard, and retrieves the selected content through MCP. Chains keyboard/mouse operations with clipboard read to extract text without OCR or direct text API access.

Solves for

I need my agent to select and extract text from a document or web page for processingI want to copy highlighted text from an application without direct text API accessI need to extract specific text regions by selecting them with keyboard or mouse commands

Best for

AI agents extracting text content from applications without native text APIs

developers building content extraction workflows using copy-paste automation

teams automating data extraction from legacy applications or web content

Requires

macOS 13.0 or later

Target application with text selection support

Accessibility permissions for keyboard/mouse control

Limitations

Text selection depends on application support; not all apps respond to standard selection keyboard shortcuts

Extracted text may include formatting codes or invisible characters depending on application

Selection is lossy for complex layouts (tables, multi-column text); extracted text may not preserve structure

What makes it unique

Chains keyboard/mouse selection operations with clipboard read to extract text from applications without direct text APIs, enabling agents to extract content from legacy apps or web pages by automating user-level copy-paste workflows

vs alternatives

More universal than application-specific text APIs because it works with any application supporting standard text selection, enabling agents to extract content from web browsers, PDFs, and legacy applications without app-specific integrations

drag-and-drop file operations via mcp

Medium confidence

Performs drag-and-drop operations between applications by simulating mouse down, movement, and release events with file path payloads. Enables agents to move/copy files between Finder windows or drag files into applications (e.g., uploading to web forms) without file system APIs or application-specific drag handlers.

Solves for

I need my agent to drag files from Finder into an application window (e.g., upload to web form)I want to move or copy files between Finder windows using drag-and-drop automationI need to simulate drag-and-drop interactions for applications that only support this input method

Best for

AI agents automating file management workflows in Finder

developers building file upload automation for web applications

teams automating workflows that depend on drag-and-drop file operations

Requires

macOS 13.0 or later

Source and destination windows visible on screen

Accessibility permissions for mouse control

Limitations

Drag-and-drop requires precise coordinate targeting; source and destination windows must be visible and positioned correctly

File path payloads are application-dependent; not all applications accept programmatic drag-and-drop with file paths

Drag operation timing is critical; too-fast operations may not be recognized by target application

What makes it unique

Simulates drag-and-drop operations through coordinated mouse events with file path payloads, enabling agents to automate file operations between Finder and applications without file system APIs or application-specific handlers

vs alternatives

More flexible than direct file system operations because it works with applications that only support drag-and-drop input, enabling agents to automate file uploads and transfers in web applications and legacy desktop apps

screen region ocr and text recognition via mcp

Medium confidence

Performs optical character recognition (OCR) on screenshot regions to extract text from images, UI elements, or documents. Integrates with macOS Vision framework or third-party OCR services to convert image regions into machine-readable text, enabling agents to read text from non-selectable UI elements or scanned documents.

Solves for

I need my agent to read text from UI elements that don't support text selection (buttons, labels, images)I want to extract text from screenshots or scanned documents for processingI need to recognize and read text in specific screen regions to make decisions about UI automation

Best for

AI agents automating workflows with non-text-selectable UI (images, custom controls)

developers building document processing automation with OCR

teams automating workflows that require reading text from screenshots or scanned content

Requires

macOS 13.0 or later

macOS Vision framework (built-in) or third-party OCR service API key

Screen region coordinates for OCR target

Limitations

OCR accuracy depends on text quality, font, and background; may produce errors for small or stylized text

OCR processing adds latency (typically 100-500ms per region); not suitable for real-time high-frequency text recognition

Language support depends on OCR engine; may not support all languages or scripts

What makes it unique

Integrates OCR directly into MCP tools for screenshot regions, enabling agents to extract text from non-selectable UI elements and images without external OCR services, using native macOS Vision framework or pluggable OCR backends

vs alternatives

More integrated than separate OCR tools because it operates on screenshot regions directly, enabling agents to chain screenshot capture → OCR → decision-making in a single automation loop without intermediate file I/O

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with mac-use-mcp, ranked by overlap. Discovered automatically through the match graph.

MCP Server21

@atomicbotai/computer-use-mcp

MCP server exposing desktop computer-use as an MCP tool

mouse-control-with-coordinate-targetingdesktop-automation-via-mcp-protocol

2 shared capabilities

MCP Server38

@github/computer-use-mcp

Computer Use MCP Server

mouse-cursor-movement-and-clickingdesktop-screenshot-capture-and-analysis

2 shared capabilities

MCP Server26

just-every/mcp-screenshot-website-fast

** - High-quality screenshot capture optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks (1.15 megapixels) with configurable viewports and wait strategies for dynamic content.

mcp protocol integration with stdio json-rpc transportcli binary interface with direct command-line screenshot execution

2 shared capabilities

MCP Server34

@github/computer-use-mcp

Computer Use MCP Server

gui automation via standardized mcp protocolmouse control with absolute positioning

2 shared capabilities

MCP Server23

Screeny

** - Privacy-first macOS MCP server that provides visual context for AI agents through window screenshots

mcp tool registration for screenshot requestsmacos window screenshot capture for ai context

2 shared capabilities

MCP Server26

url-to-image-mcp

MCP server: url-to-image-mcp

url-to-png screenshot capture via mcpconcurrent screenshot request handling via mcp server

2 shared capabilities

Best For

✓AI agents performing desktop automation workflows
✓developers building macOS-native AI assistants
✓teams integrating vision-based UI testing with LLM agents
✓AI agents automating GUI workflows on macOS
✓developers building no-code automation tools powered by LLMs
✓QA automation teams using AI agents for cross-application testing
✓AI agents operating in multi-monitor environments requiring display-aware automation
✓developers building automation that must work across different display configurations

Known Limitations

⚠Screenshot capture may include sensitive data (passwords, PII) — no built-in redaction or filtering
⚠Region-based captures require precise coordinate specification; no automatic window detection
⚠Performance degrades with high-frequency capture loops (>10 screenshots/second) due to I/O overhead
⚠No built-in collision detection — agent must verify target coordinates are valid before clicking
⚠Click timing is synchronous; rapid click sequences may be rate-limited by macOS event queue
⚠Drag operations require explicit start/end coordinates; no gesture recognition or path interpolation

Requirements

macOS 13.0 or laterMCP client capable of handling binary/base64 image dataScreen recording permissions granted to the MCP server processAccessibility permissions granted to MCP server process (System Preferences > Security & Privacy > Accessibility)MCP client with numeric coordinate input supportNo special permissions required for display queriesSpecific preference domain and key namesNo special permissions for reading standard preferences; some system preferences may require elevated access

Input / Output

Accepts: region coordinates (x, y, width, height), optional format specification (PNG, JPEG), x, y coordinates (integers), click type (left, right, middle, double, triple), optional drag end coordinates, optional display index or identifier, optional display property filter (resolution, position, color-profile), preference domain (e.g., 'com.apple.universalaccess'), preference key (e.g., 'mouseDriverCursorSize'), optional application bundle ID for app-specific preferences, audio file path (MP3, WAV, AIFF, etc.), system sound name (e.g., 'Glass', 'Ping'), volume level (0-100), optional audio device identifier, power operation (sleep, wake, prevent-sleep, allow-sleep), optional sleep duration in seconds, notification title (string), notification body (string), notification type (alert, banner, critical), optional action buttons (title, identifier pairs), optional sound (system sound name), file operation (create, delete, move, copy, list, reveal), source file path, destination file path (for move/copy), optional file attributes (permissions, timestamps), optional directory filter (file type, name pattern), MCP tool requests with JSON-serialized parameters, text string (for typing), key name (Return, Escape, Tab, ArrowUp, etc.), modifier combination (Cmd, Option, Control, Shift), optional character delay in milliseconds, text string (for write), format type (text, html, rtf, image), optional base64-encoded image data, window title (partial or full match), application name (e.g., 'Safari', 'Terminal'), window operation (focus, minimize, maximize, close), optional PID for precise window identification, application bundle ID (e.g., 'com.apple.Safari'), application path (e.g., '/Applications/Safari.app'), process ID (PID) for termination, optional launch arguments, event type filter (app-focus, window-created, clipboard-changed, etc.), optional application or window name filter, selection method (select-all, select-line, select-word, mouse-drag), optional selection parameters (start/end coordinates for drag), source file path(s), source window coordinates, destination window coordinates, optional operation type (copy, move, link), screenshot region coordinates (x, y, width, height), optional language hint for OCR engine, optional OCR engine selection (Vision, Tesseract, cloud service)

Produces: base64-encoded image data, file path to temporary screenshot file, success/failure boolean, error message if accessibility denied, list of displays with resolution, position, color profile, refresh rate, primary display identifier, total screen bounds (union of all displays), preference value (string, number, boolean, or data), preference data type, error message if preference not found or access denied, current volume level, list of available audio devices, error message if audio file not found or playback failed, battery level (0-100), charging state (charging, discharging, charged), time remaining on battery (in minutes), power source (AC, battery), success/failure boolean for power operations, notification ID for tracking, error message if notification delivery failed, list of files (for list operation) with metadata (size, modified date, type), error message if operation failed, MCP tool responses with JSON-serialized results, MCP protocol messages (resource updates, notifications), error message if accessibility denied or key invalid, clipboard content as text or base64-encoded data, format type of current clipboard content, list of windows with title, app name, position, size, PID, focused window info, success/failure boolean for window operations, process ID of launched application, list of running processes with PID, name, memory, CPU usage, error message if app not found or launch failed, event notification with timestamp, event type, and context (app name, window title, etc.), event stream as polling response, extracted text content, error message if operation not supported, extracted text string, confidence score (0-1) for OCR accuracy, bounding boxes for recognized text regions

UnfragileRank

Adoption5%(25% weight)

Quality40%(25% weight)

Ecosystem69%(15% weight)

Match Graph25%(30% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: MCP Server

17 capabilities

Visit mac-use-mcp→

Repository Details

About

Zero-dependency macOS desktop automation for AI agents. Screenshot, mouse, keyboard, clipboard, and window control via MCP. 18 tools, macOS 13+, one command: npx mac-use-mcp.

Alternatives to mac-use-mcp

Supabase69Platform

Search the Supabase docs for up-to-date guidance and troubleshoot errors quickly. Manage organizations, projects, databases, and Edge Functions, including migrations, SQL, logs, advisors, keys, and type generation, in one flow. Create and manage development branches to iterate safely, confirm costs

Compare →

Tavily MCP Server62MCP Server

AI-optimized web search and content extraction via Tavily MCP.

Compare →

MongoDB MCP Server62MCP Server

Query and manage MongoDB databases and collections via MCP.

Compare →

Firecrawl MCP Server62MCP Server

Scrape websites and extract structured data via Firecrawl MCP.

Compare →

Are you the builder of mac-use-mcp?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

smithery

Looking for something else?

Search →

Capabilities17 decomposed

macos screenshot capture with mcp protocol binding

Medium confidence

Solves for

Best for

AI agents performing desktop automation workflows

developers building macOS-native AI assistants

teams integrating vision-based UI testing with LLM agents

Requires

macOS 13.0 or later

MCP client capable of handling binary/base64 image data

Screen recording permissions granted to the MCP server process

Limitations

Screenshot capture may include sensitive data (passwords, PII) — no built-in redaction or filtering

Region-based captures require precise coordinate specification; no automatic window detection

Performance degrades with high-frequency capture loops (>10 screenshots/second) due to I/O overhead

What makes it unique

vs alternatives

mouse movement and click control via mcp

Medium confidence

Solves for

Best for

AI agents automating GUI workflows on macOS

developers building no-code automation tools powered by LLMs

QA automation teams using AI agents for cross-application testing

Requires

macOS 13.0 or later

Accessibility permissions granted to MCP server process (System Preferences > Security & Privacy > Accessibility)

MCP client with numeric coordinate input support

Limitations

No built-in collision detection — agent must verify target coordinates are valid before clicking

Click timing is synchronous; rapid click sequences may be rate-limited by macOS event queue

Drag operations require explicit start/end coordinates; no gesture recognition or path interpolation

What makes it unique

vs alternatives

multi-monitor and display management via mcp

Medium confidence

Solves for

Best for

AI agents operating in multi-monitor environments requiring display-aware automation

developers building automation that must work across different display configurations

teams automating workflows on systems with variable display setups (docking stations, external monitors)

Requires

macOS 13.0 or later

No special permissions required for display queries

Limitations

Display enumeration is static; changes to display configuration (adding/removing monitors) require re-querying

Coordinate systems vary by display arrangement; agent must account for negative coordinates or non-contiguous display layouts

Color profile queries may not reflect actual display capabilities; profile data is metadata only

What makes it unique

vs alternatives

system preferences and settings access via mcp

Medium confidence

Solves for

Best for

AI agents adapting automation to system configuration and accessibility settings

developers building automation that must respect user preferences

teams automating workflows that depend on specific system settings

Requires

macOS 13.0 or later

Specific preference domain and key names

No special permissions for reading standard preferences; some system preferences may require elevated access

Limitations

Preference access is read-only; agents cannot modify system settings through MCP

Preference keys are application-specific and not standardized; querying arbitrary preferences may fail

Some preferences require elevated privileges; access may be denied for security-sensitive settings

What makes it unique

vs alternatives

audio playback and system sound control via mcp

Medium confidence

Solves for

Best for

AI agents providing audio feedback during automation workflows

developers building accessibility features that require audio output

teams automating workflows that depend on audio notifications or alerts

Requires

macOS 13.0 or later

Audio file path or system sound identifier

Audio output device available (speakers, headphones, etc.)

Limitations

Audio playback is asynchronous; agent cannot determine when playback completes without polling

Volume control affects system volume; affects all applications, not just automation

Audio device switching may fail if device is in use by other applications

What makes it unique

vs alternatives

More direct than shell-based audio playback because it uses native macOS audio APIs with structured output, enabling agents to control volume and select audio devices without parsing command output

sleep/wake and power management via mcp

Medium confidence

Solves for

Best for

AI agents managing power state during long-running automation workflows

developers building battery-aware automation that adapts to power conditions

teams automating workflows on laptops or systems with variable power states

Requires

macOS 13.0 or later

No special permissions for querying power status; sleep prevention may require elevated privileges

Limitations

Sleep prevention requires continuous assertion; assertion is released when MCP server terminates

Battery status queries may be delayed or approximate; not suitable for real-time power monitoring

Wake operations may fail if system is in deep sleep or hibernation

What makes it unique

vs alternatives

notification and alert delivery via mcp

Medium confidence

Solves for

Best for

AI agents providing user feedback during automation workflows

developers building automation that requires user awareness or interaction

teams automating workflows that need to notify users of completion or errors

Requires

macOS 13.0 or later

Notification permissions granted (usually automatic for MCP server)

Limitations

Notifications are non-blocking; user may dismiss without reading or acting on them

Notification interaction (clicking buttons) requires user action; agent cannot programmatically respond to user choices

Notification persistence depends on user settings; notifications may be automatically cleared

What makes it unique

Integrates macOS notification center directly into MCP tools using native notification APIs, enabling agents to send system notifications without subprocess calls or external notification services

vs alternatives

file system operations and finder integration via mcp

Medium confidence

Solves for

Best for

AI agents managing files as part of automation workflows

developers building file management automation integrated with Finder

teams automating workflows that require file system operations

Requires

macOS 13.0 or later

File system permissions for target directories

Absolute file paths

Limitations

File operations are synchronous; large file operations may block automation

File permissions may prevent operations; agent must handle permission errors gracefully

File paths must be absolute; relative paths may not resolve correctly

What makes it unique

vs alternatives

mcp protocol server with zero-dependency deployment

Medium confidence

Solves for

Best for

AI agent developers integrating desktop automation via MCP protocol

teams deploying automation to macOS systems without complex dependency management

developers building multi-tool AI agents that need standardized tool interfaces

Requires

Node.js 18+ (or npx with Node.js available)

macOS 13.0 or later

MCP-compatible AI client (Claude, custom LLM agents, etc.)

Limitations

Node.js runtime required; not suitable for Python-only or other language-specific environments

Single-process architecture; no built-in clustering or load balancing for multiple concurrent clients

MCP protocol overhead adds ~50-100ms per tool call due to serialization and protocol handling

What makes it unique

vs alternatives

keyboard input and hotkey simulation via mcp

Medium confidence

Solves for

Best for

AI agents filling forms or entering data into web/desktop applications

developers automating keyboard-driven workflows (terminal commands, IDE shortcuts)

teams building AI-powered data entry or content creation tools

Requires

macOS 13.0 or later

Accessibility permissions granted to MCP server process

Target application must support keyboard input (not all apps accept programmatic key events)

Limitations

No keyboard layout awareness — assumes QWERTY; non-QWERTY layouts may produce incorrect characters

Character delay simulation is approximate; cannot perfectly match human typing patterns with variable timing

No input validation — agent must ensure target field is focused before typing to avoid misdirected input

What makes it unique

vs alternatives

clipboard read/write with format preservation via mcp

Medium confidence

Solves for

Best for

AI agents integrating with copy-paste workflows in productivity apps

developers building content generation tools that feed output into existing applications

teams automating data transfer between applications without file system intermediaries

Requires

macOS 13.0 or later

MCP client with support for binary/base64 data in tool responses

No special permissions required (clipboard access is standard)

Limitations

Clipboard is shared system resource — concurrent clipboard operations from multiple agents may cause race conditions

Image clipboard data requires base64 encoding/decoding; large images may exceed MCP message size limits

No clipboard history — only current clipboard content accessible; previous clipboard states are lost

What makes it unique

vs alternatives

window management and focus control via mcp

Medium confidence

Solves for

Best for

AI agents managing multi-window workflows or switching between applications

developers building window-aware automation that targets specific app windows

teams automating workflows across multiple applications with precise window control

Requires

macOS 13.0 or later

Accessibility permissions granted to MCP server process

Target applications must support Accessibility API (most standard macOS apps do)

Limitations

Window list may be incomplete for hidden or minimized windows; not all windows are enumerable via Accessibility API

Window titles are not guaranteed unique; multiple windows with same title require additional filtering by app name or PID

Window position/size queries may return stale data if windows are being resized by user simultaneously

What makes it unique

vs alternatives

application launch and process control via mcp

Medium confidence

Solves for

Best for

AI agents orchestrating multi-application workflows requiring app lifecycle management

developers building automation that depends on specific applications being available

teams automating complex workflows spanning multiple applications with startup/shutdown sequences

Requires

macOS 13.0 or later

Application bundle identifiers or full paths to executables

No special permissions required for launching standard apps; some system apps may require elevated privileges

Limitations

Application launch is asynchronous; agent must implement polling or delays to wait for app to fully load before automating

Bundle identifier lookup may fail for non-standard app installations or third-party apps; requires exact bundle ID

Process termination is forceful (SIGKILL); no graceful shutdown or unsaved data warnings

What makes it unique

vs alternatives

system event monitoring and notification via mcp

Medium confidence

Solves for

Best for

AI agents implementing reactive workflows triggered by system state changes

developers building event-driven automation that responds to user actions

teams automating workflows that depend on detecting application or window changes

Requires

macOS 13.0 or later

Accessibility permissions for event monitoring

MCP client supporting server-initiated notifications or polling for event updates

Limitations

Event monitoring requires continuous polling or event stream subscription; adds background CPU overhead

Some events (e.g., clipboard changes) may be delayed or missed if event queue is saturated

Event filtering is coarse-grained; no fine-grained filtering for specific window types or applications

What makes it unique

vs alternatives

text selection and clipboard-based content extraction via mcp

Medium confidence

Solves for

Best for

AI agents extracting text content from applications without native text APIs

developers building content extraction workflows using copy-paste automation

teams automating data extraction from legacy applications or web content

Requires

macOS 13.0 or later

Target application with text selection support

Accessibility permissions for keyboard/mouse control

Limitations

Text selection depends on application support; not all apps respond to standard selection keyboard shortcuts

Extracted text may include formatting codes or invisible characters depending on application

Selection is lossy for complex layouts (tables, multi-column text); extracted text may not preserve structure

What makes it unique

vs alternatives

drag-and-drop file operations via mcp

Medium confidence

Solves for

Best for

AI agents automating file management workflows in Finder

developers building file upload automation for web applications

teams automating workflows that depend on drag-and-drop file operations

Requires

macOS 13.0 or later

Source and destination windows visible on screen

Accessibility permissions for mouse control

Limitations

Drag-and-drop requires precise coordinate targeting; source and destination windows must be visible and positioned correctly

File path payloads are application-dependent; not all applications accept programmatic drag-and-drop with file paths

Drag operation timing is critical; too-fast operations may not be recognized by target application

What makes it unique

vs alternatives

screen region ocr and text recognition via mcp

Medium confidence

Solves for

Best for

AI agents automating workflows with non-text-selectable UI (images, custom controls)

developers building document processing automation with OCR

teams automating workflows that require reading text from screenshots or scanned content

Requires

macOS 13.0 or later

macOS Vision framework (built-in) or third-party OCR service API key

Screen region coordinates for OCR target

Limitations

OCR accuracy depends on text quality, font, and background; may produce errors for small or stylized text

OCR processing adds latency (typically 100-500ms per region); not suitable for real-time high-frequency text recognition

Language support depends on OCR engine; may not support all languages or scripts

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to mac-use-mcp

Supabase69Platform

Compare →

Tavily MCP Server62MCP Server

AI-optimized web search and content extraction via Tavily MCP.

Compare →

MongoDB MCP Server62MCP Server

Query and manage MongoDB databases and collections via MCP.

Compare →

Firecrawl MCP Server62MCP Server

Scrape websites and extract structured data via Firecrawl MCP.

Compare →

mac-use-mcp

Capabilities17 decomposed

macos screenshot capture with mcp protocol binding

mouse movement and click control via mcp

multi-monitor and display management via mcp

system preferences and settings access via mcp

audio playback and system sound control via mcp

sleep/wake and power management via mcp

notification and alert delivery via mcp

file system operations and finder integration via mcp

mcp protocol server with zero-dependency deployment

keyboard input and hotkey simulation via mcp

clipboard read/write with format preservation via mcp

window management and focus control via mcp

application launch and process control via mcp

system event monitoring and notification via mcp

text selection and clipboard-based content extraction via mcp

drag-and-drop file operations via mcp

screen region ocr and text recognition via mcp

Related Artifactssharing capabilities

@atomicbotai/computer-use-mcp

@github/computer-use-mcp

just-every/mcp-screenshot-website-fast

@github/computer-use-mcp

Screeny

url-to-image-mcp

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to mac-use-mcp

Are you the builder of mac-use-mcp?

Get the weekly brief

Data Sources

mac-use-mcp

Capabilities17 decomposed

macos screenshot capture with mcp protocol binding

mouse movement and click control via mcp

multi-monitor and display management via mcp

system preferences and settings access via mcp

audio playback and system sound control via mcp

sleep/wake and power management via mcp

notification and alert delivery via mcp

file system operations and finder integration via mcp

mcp protocol server with zero-dependency deployment

keyboard input and hotkey simulation via mcp

clipboard read/write with format preservation via mcp

window management and focus control via mcp

application launch and process control via mcp

system event monitoring and notification via mcp

text selection and clipboard-based content extraction via mcp

drag-and-drop file operations via mcp

screen region ocr and text recognition via mcp

Related Artifactssharing capabilities

@atomicbotai/computer-use-mcp

@github/computer-use-mcp

just-every/mcp-screenshot-website-fast

@github/computer-use-mcp

Screeny

url-to-image-mcp

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to mac-use-mcp

Are you the builder of mac-use-mcp?

Get the weekly brief

Data Sources