Mouse And Keyboard Control For Ui Interaction

1

Open InterpreterAgent61/100

Natural language computer interface — runs local code to accomplish tasks, like local Code Interpreter.

Unique: Abstracts platform-specific input libraries (pyautogui, pynput) behind a unified Computer API, enabling the same code to work across Windows, macOS, and Linux without modification

vs others: More portable than platform-specific scripts and more flexible than record-and-playback tools, but less reliable than API-based automation due to coordinate fragility

2

Claude Opus 4Model56/100

via “computer-use-tool-for-ui-automation”

Anthropic's most intelligent model, best-in-class for coding and agentic tasks.

Unique: Provides a general-purpose computer use tool that enables the model to interact with any UI, not just specific applications or APIs. This is architecturally different from specialized automation tools because it's application-agnostic and works with any UI that can be captured and controlled.

vs others: More general-purpose than competitors who focus on specific applications (e.g., Zapier for SaaS), and more capable than API-based automation because it can interact with legacy systems and web-only tools that don't have APIs.

3

mcp-playwrightMCP Server53/100

via “keyboard-and-mouse-event-simulation”

Playwright Model Context Protocol Server - Tool to automate Browsers and APIs in Claude Desktop, Cline, Cursor IDE and More 🔌

Unique: Exposes Playwright's type(), press(), hover(), and drag() APIs as separate MCP tools with modifier key support, enabling LLMs to simulate complex keyboard and mouse interactions without understanding Playwright's event API or timing semantics

vs others: More flexible than click-only automation because it supports keyboard shortcuts, special characters, and drag-and-drop, enabling agents to interact with complex UIs that require multi-key combinations or gesture-based interactions

4

Windows-MCPMCP Server49/100

via “synthetic input simulation with multi-modal action support”

MCP Server for Computer Use in Windows

Unique: Implements multi-modal input through UI Automation APIs with intelligent fallbacks: uses clipboard for large text payloads to avoid character-by-character typing delays, supports both element-based and coordinate-based targeting, and handles keyboard shortcuts through native Windows input event generation.

vs others: More reliable than pyautogui or keyboard libraries because it integrates with Windows UI Automation framework for element-aware targeting, and faster than character-by-character typing for large text blocks through clipboard optimization.

5

@executeautomation/playwright-mcp-serverMCP Server48/100

via “user-interaction-simulation”

Model Context Protocol servers for Playwright

Unique: Wraps Playwright's action APIs with automatic element waiting and focus management, allowing LLMs to issue high-level interaction commands ('fill form field X with value Y') without managing low-level event sequencing, element visibility checks, or focus state

vs others: Provides atomic interaction primitives (click, type, select) as separate MCP tools with built-in element waiting and error handling, reducing the complexity of multi-step interaction workflows compared to frameworks requiring manual event orchestration

6

@github/computer-use-mcpMCP Server45/100

via “mouse-cursor-movement-and-clicking”

Computer Use MCP Server

Unique: Abstracts OS-specific input APIs (Xdotool, CGEvent, SendInput) behind a unified MCP interface, allowing agents to perform mouse interactions without knowledge of underlying platform; includes configurable movement curves and timing to simulate human-like interaction patterns

vs others: Provides cross-platform mouse automation in a single MCP tool without requiring separate platform-specific libraries, and integrates directly into agent decision loops unlike standalone automation frameworks

7

Agent-desktop – Native desktop automation CLI for AI agentsCLI Tool42/100

via “keyboard-and-mouse-input-simulation”

I've been building computer-use tools for a while, and I quietly launched this about a month ago (122 Stars on GH). I figured it was worth sharing here.Over the last few months, a lot of computer-use agents have come out: Codex, Claude Code, CUA, and others. Most of them seem to work roughly li

Unique: Injects input events directly into the OS input queue rather than sending events to specific application windows — ensures compatibility with any application regardless of how it handles input, but requires careful timing and state management

vs others: More universal than application-specific input APIs because it works at the OS level, but requires more careful timing and state management than higher-level automation frameworks that provide built-in synchronization

8

@github/computer-use-mcpMCP Server41/100

via “mouse control with absolute positioning”

Computer Use MCP Server

Unique: Exposes mouse control as discrete MCP tools (move, click) with absolute coordinate parameters, allowing agents to compose clicks with screenshot analysis in a tight perception-action loop. No gesture or drag abstractions — forces explicit coordinate calculation.

vs others: More granular than high-level UI automation frameworks (Selenium, Playwright) because it operates at raw input level; more flexible for non-web UIs but requires agent to handle coordinate math

9

mac-use-mcpMCP Server38/100

via “mouse movement and click control via mcp”

Zero-dependency macOS desktop automation for AI agents. Screenshot, mouse, keyboard, clipboard, and window control via MCP. 18 tools, macOS 13+, one command: npx mac-use-mcp.

Unique: Integrates mouse control directly into MCP tool schema with coordinate-based targeting, allowing agents to chain screenshot analysis → coordinate extraction → click execution in a single agent loop without external tool dependencies or subprocess management

vs others: More direct than PyAutoGUI or xdotool because it uses native macOS CGEvent APIs with MCP protocol binding, eliminating subprocess overhead and enabling real-time feedback loops between vision analysis and mouse actions

10

Safari MCPMCP Server37/100

via “interactive element manipulation (click, type, scroll)”

Native Safari browser automation for AI agents — 80 tools via AppleScript, zero Chrome overhead, keeps logins, runs silently. macOS only.

Unique: Uses AppleScript event simulation for native input handling rather than synthetic DOM events, providing more realistic user interaction that triggers native browser handlers. Includes pre-interaction visibility validation to prevent silent failures.

vs others: More reliable than synthetic DOM events because it uses native OS-level input; better error detection than Puppeteer because it validates element visibility before interaction; less flexible than low-level WebDriver but more user-friendly for typical form automation.

11

PeekabooMCP Server35/100

via “deterministic ui interaction via accessibility actions and synthetic input”

** - a macOS-only MCP server that enables AI agents to capture screenshots of applications, or the entire system.

Unique: Dual-path interaction architecture that uses native accessibility actions (AXPress, AXSetValue) as primary path for reliability, with automatic fallback to synthetic CGEvent input for inaccessible elements; includes interaction queue serialization and exponential backoff retry logic to handle transient failures and race conditions

vs others: More reliable than pure coordinate-based automation (e.g., pyautogui) because it uses semantic element references that survive layout changes; faster than pure vision-based interaction because it avoids repeated vision model calls for each action

12

playwright-mcpMCP Server33/100

via “keyboard-and-mouse-input-simulation”

MCP server: playwright-mcp

Unique: Exposes Playwright's low-level keyboard and mouse APIs as MCP tools, enabling agents to simulate complex user interactions beyond simple element clicks. Supports modifier key combinations and arbitrary key sequences.

vs others: More flexible than element-based interaction because it supports coordinate-based clicking and keyboard shortcuts. More reliable than simulating keyboard input via JavaScript because it uses native browser input events.

13

browser-devtools-mcpMCP Server33/100

via “user-interaction-simulation”

MCP Server for Browser Dev Tools

Unique: Combines CDP Input domain (for low-level event injection) with element targeting via selectors, providing agents with high-level interaction primitives (click element by selector) without requiring coordinate calculation or JavaScript event handling

vs others: More reliable than JavaScript-based click simulation because it uses CDP's native input injection, which properly triggers browser event handlers and respects z-index/visibility rules

14

@iflow-mcp/puppeteer-mcp-serverMCP Server33/100

via “user-interaction-simulation”

Experimental MCP server for browser automation using Puppeteer (inspired by @modelcontextprotocol/server-puppeteer)

Unique: Abstracts Puppeteer's input APIs into declarative MCP tools, allowing LLMs to specify interactions at a high level (click button, type text) without managing low-level event handling or timing concerns.

vs others: More reliable than raw JavaScript injection for form filling because it uses Puppeteer's native input simulation, which properly triggers browser event handlers and respects form validation logic.

15

@executeautomation/playwright-mcp-serverMCP Server32/100

via “keyboard-and-mouse-event-simulation”

Model Context Protocol servers for Playwright

Unique: Exposes Playwright's keyboard and mouse APIs as discrete MCP tools with modifier key support and drag-and-drop coordination, enabling Claude to simulate complex user interactions without JavaScript event construction

vs others: More reliable than raw JavaScript event dispatch because Playwright's keyboard/mouse APIs account for browser-specific event ordering and timing; more flexible than Selenium because it supports drag-and-drop natively

16

gmod-mcpMCP Server30/100

via “game-window-interaction-and-control”

MCP tool for Garry's Mod: RCON, Lua execution, window screenshot/control, and SFTP file management

Unique: Wraps OS-level input simulation (SendInput, etc.) as MCP tools, enabling LLM agents to control the game window without custom input handling; integrates with screenshot capture for closed-loop automation

vs others: More direct than scripting game mods for client-side automation; enables AI agents to interact with the game UI and client without modifying game code

17

playwrightFramework29/100

via “keyboard and mouse input simulation with timing control”

A high-level API to automate web browsers

Unique: Simulates input through native browser event APIs rather than DOM manipulation, ensuring event handlers and form validation logic execute as they would for real user input, with configurable timing to test debouncing and throttling logic

vs others: More realistic than direct DOM manipulation because it triggers native event handlers, and more flexible than WebDriver input because it supports arbitrary key combinations and timing control

18

User Prompt MCPMCP Server29/100

via “cursor ide ui integration for user input collection”

** - An MCP server for Cursor that enables requesting user input during generation process.

Unique: Leverages Cursor's native MCP UI capabilities to render input prompts directly in the IDE rather than spawning separate windows or requiring custom UI implementation, creating a seamless integrated experience.

vs others: Provides better UX than tools requiring external input windows or CLI prompts, and simpler implementation than tools building custom UI frameworks.

19

@atomicbotai/computer-use-mcpMCP Server28/100

via “mouse-control-with-coordinate-targeting”

MCP server exposing desktop computer-use as an MCP tool

Unique: Exposes raw coordinate-based mouse control through MCP protocol, allowing clients to implement their own coordinate detection strategies (vision models, OCR, element detection) rather than bundling a specific vision system, enabling flexibility in how coordinates are determined.

vs others: More flexible than vision-integrated automation tools because it decouples coordinate detection from mouse control, allowing clients to use any vision model or coordinate source while maintaining a simple, stateless MCP interface.

20

Windows ControlRepository27/100

via “programmatic mouse control with pixel-level positioning”

** - Programmatic control over Windows system operations including mouse, keyboard, window management, and screen capture using nut.js.

Unique: Uses nut.js's abstraction over Windows native input APIs (SendInput) rather than simulating raw hardware events, enabling reliable cross-application mouse control that respects Windows input queuing and cursor acceleration

vs others: More reliable than raw Win32 SendInput calls because nut.js handles platform-specific quirks; faster than image-recognition-based automation because it uses direct coordinate targeting rather than screen analysis

Top Matches

Also Known As

Company