@github/computer-use-mcp
MCP ServerFreeComputer Use MCP Server
Capabilities11 decomposed
desktop-screenshot-capture-and-analysis
Medium confidenceCaptures full-screen or region-specific screenshots from the host desktop and returns pixel-perfect image data in base64 format, enabling AI agents to visually perceive and analyze the current UI state. Integrates with native OS screenshot APIs (macOS/Linux/Windows) through Node.js bindings, providing sub-100ms capture latency for real-time visual feedback loops in agent decision-making.
Implements native OS-level screenshot capture through MCP protocol, allowing LLM agents to directly perceive desktop state without requiring separate screenshot tools or browser automation libraries; uses base64 encoding for seamless integration with vision-capable LLMs
Provides lower latency and higher fidelity desktop perception than browser-only solutions like Playwright, and integrates natively into MCP agent workflows without requiring separate tool orchestration
mouse-cursor-movement-and-clicking
Medium confidenceEnables precise mouse cursor positioning and click operations (single-click, double-click, right-click) at specified screen coordinates, translating high-level agent intents into low-level input events. Uses native OS input APIs (Xdotool on Linux, CGEvent on macOS, SendInput on Windows) to simulate human-like mouse interactions with configurable timing and movement curves to avoid detection as automated input.
Abstracts OS-specific input APIs (Xdotool, CGEvent, SendInput) behind a unified MCP interface, allowing agents to perform mouse interactions without knowledge of underlying platform; includes configurable movement curves and timing to simulate human-like interaction patterns
Provides cross-platform mouse automation in a single MCP tool without requiring separate platform-specific libraries, and integrates directly into agent decision loops unlike standalone automation frameworks
operation-logging-and-audit-trail
Medium confidenceMaintains a detailed audit trail of all operations performed by agents, including operation type, parameters, timestamp, and result. Logs are stored locally and can be retrieved through MCP interface for debugging, compliance, or workflow analysis. Implements structured logging with configurable verbosity levels and optional sensitive data redaction for security-sensitive operations.
Provides structured operation logging with configurable verbosity and sensitive data redaction, maintaining an audit trail of all agent operations for compliance and debugging
Integrates audit logging directly into MCP server with sensitive data redaction, whereas most automation frameworks require external logging infrastructure
keyboard-input-simulation-with-hotkey-support
Medium confidenceSimulates keyboard input including text typing, individual key presses, and multi-key hotkey combinations (Ctrl+C, Cmd+Z, etc.) at the OS level. Implements key event queuing with configurable inter-key delays to simulate human typing speed, and supports modifier key combinations for application shortcuts. Routes through native OS keyboard APIs to ensure compatibility with applications that validate input source.
Provides unified keyboard input abstraction across Windows/macOS/Linux with support for both text typing and hotkey combinations, including configurable inter-key delays to simulate human typing patterns and avoid input detection systems
Combines text input and hotkey simulation in a single MCP tool with human-like timing, whereas most automation frameworks require separate libraries for keyboard vs hotkey handling
mcp-protocol-server-implementation
Medium confidenceImplements a complete MCP (Model Context Protocol) server that exposes computer-use capabilities as standardized MCP resources and tools, enabling any MCP-compatible client (Claude, custom agents, etc.) to discover and invoke desktop automation functions. Uses JSON-RPC 2.0 transport over stdio or network sockets, with automatic capability advertisement through MCP's resource and tool schemas.
Implements a full MCP server that standardizes computer-use capabilities as discoverable MCP tools and resources, allowing any MCP-compatible client to access desktop automation without custom integration code; uses JSON-RPC 2.0 for reliable request/response handling
Provides a standards-based integration point for desktop automation that works with any MCP client (Claude, custom agents, etc.), whereas point-to-point integrations require reimplementation for each client
multi-monitor-and-virtual-display-support
Medium confidenceDetects and handles multiple physical monitors and virtual display configurations, allowing agents to capture screenshots and perform interactions across the entire display landscape. Maintains a coordinate system that maps logical screen positions to physical monitor positions, enabling agents to work with multi-monitor setups without explicit monitor selection. Automatically detects display topology changes and updates coordinate mappings.
Automatically detects and maps multi-monitor topologies, allowing agents to work with global screen coordinates without explicit monitor selection; maintains coordinate system consistency across display topology changes
Provides transparent multi-monitor support without requiring agents to understand display topology, whereas most automation tools require explicit monitor selection or coordinate offset calculation
application-window-enumeration-and-focus-control
Medium confidenceEnumerates open application windows on the desktop and provides window focus control, allowing agents to switch between applications and ensure keyboard/mouse input targets the correct window. Returns window metadata including title, process ID, window bounds, and focus state. Implements platform-specific window management (wmctrl on Linux, NSWindow API on macOS, Windows API on Windows) with a unified interface.
Provides unified window enumeration and focus control across Windows/macOS/Linux, abstracting platform-specific window manager APIs (wmctrl, NSWindow, Windows API) behind a single interface
Combines window enumeration and focus control in a single MCP tool, whereas most automation frameworks require separate window management libraries or platform-specific code
clipboard-read-write-operations
Medium confidenceProvides read and write access to the system clipboard, enabling agents to exchange text data with applications through copy/paste operations. Implements platform-specific clipboard APIs (xclip on Linux, NSPasteboard on macOS, Windows Clipboard API) with support for both text and rich text formats. Allows agents to retrieve clipboard contents for verification or use clipboard as a data exchange mechanism.
Provides unified clipboard read/write access across Windows/macOS/Linux, abstracting platform-specific clipboard APIs and enabling clipboard-based data exchange in agent workflows
Integrates clipboard operations directly into MCP tool interface, allowing agents to use copy/paste as a data exchange mechanism without requiring separate clipboard management libraries
system-information-and-environment-detection
Medium confidenceDetects and reports system information including OS type/version, available displays, installed applications, and environment variables, enabling agents to adapt behavior based on system capabilities and configuration. Queries OS-level APIs to gather hardware information (CPU, memory, display resolution) and software environment (installed packages, PATH, environment variables). Provides this metadata to agents for capability negotiation and conditional execution.
Provides unified system information and environment detection across Windows/macOS/Linux, enabling agents to query OS capabilities and adapt behavior without platform-specific code
Integrates system information gathering into MCP interface, allowing agents to discover capabilities at runtime rather than requiring pre-configuration
error-recovery-and-state-validation
Medium confidenceImplements error handling and recovery mechanisms for failed operations, including retry logic with exponential backoff, state validation after operations, and detailed error reporting. Validates that operations succeeded by comparing expected state (e.g., window focus, clipboard contents) with actual state, and provides detailed error messages including OS error codes and recovery suggestions. Enables agents to detect and recover from transient failures without explicit error handling logic.
Implements automatic retry logic with state validation for desktop automation operations, detecting transient failures and recovering without explicit agent error handling; provides detailed error diagnostics including OS error codes
Provides built-in resilience and error recovery for desktop automation, whereas most frameworks require agents to implement their own retry and error handling logic
performance-monitoring-and-operation-timing
Medium confidenceTracks performance metrics for each operation including execution time, latency, and resource usage, enabling agents and developers to identify bottlenecks and optimize workflows. Records timing information for screenshot capture, input operations, and window management, and exposes metrics through MCP interface. Implements low-overhead instrumentation that doesn't significantly impact operation latency.
Provides built-in performance monitoring for desktop automation operations with low-overhead instrumentation, exposing timing and resource metrics through MCP interface for workflow optimization
Integrates performance monitoring directly into MCP server, allowing agents to track operation performance without external profiling tools
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with @github/computer-use-mcp, ranked by overlap. Discovered automatically through the match graph.
mcp.run
** - A hosted registry and control plane to install & run secure + portable MCP Servers.
Retool
Maximize productivity with intuitive drag-and-drop, versatile integrations, and rapid...
@ag-ui/mcp-apps-middleware
MCP Apps middleware for AG-UI that enables UI-enabled tools from MCP (Model Context Protocol) servers.
touchdesigner-mcp-server
MCP server for TouchDesigner
Rewind
Capture, transcribe, summarize digital interactions; enhance memory,...
cordon-cli
The security gateway for AI agents — firewall, auditor, and remote control for MCP tool calls
Best For
- ✓AI agent developers building desktop automation workflows
- ✓Teams implementing visual RPA (Robotic Process Automation) solutions
- ✓Developers creating cross-platform UI testing frameworks with LLM perception
- ✓Desktop automation engineers building cross-platform RPA solutions
- ✓AI agent developers creating interactive workflow orchestrators
- ✓QA automation teams implementing visual regression testing with agent-driven interactions
- ✓Enterprise automation systems requiring audit trails for compliance
- ✓Debugging complex automation workflows by examining operation history
Known Limitations
- ⚠Screenshot capture is blocking — high-frequency polling (>10 Hz) may degrade performance
- ⚠No built-in image compression — full screenshots can be 2-5MB uncompressed, increasing token usage in LLM context
- ⚠Region-based capture requires precise pixel coordinates; no automatic UI element detection
- ⚠Wayland display server support on Linux may be limited depending on compositor implementation
- ⚠No built-in coordinate mapping — agent must translate visual element positions from screenshots to screen coordinates
- ⚠Click timing is not synchronized with application event loops — rapid clicks may be missed if application is processing
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Package Details
About
Computer Use MCP Server
Categories
Alternatives to @github/computer-use-mcp
Search the Supabase docs for up-to-date guidance and troubleshoot errors quickly. Manage organizations, projects, databases, and Edge Functions, including migrations, SQL, logs, advisors, keys, and type generation, in one flow. Create and manage development branches to iterate safely, confirm costs
Compare →Are you the builder of @github/computer-use-mcp?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →