mobile-mcp
MCP ServerFreeModel Context Protocol Server for Mobile Automation and Scraping (iOS, Android, Emulators, Simulators and Real Devices)
Capabilities13 decomposed
unified-cross-platform-device-abstraction
Medium confidenceProvides a single Robot interface abstraction layer that normalizes interactions across Android (physical devices and AVD emulators), iOS (physical devices via USB), and iOS Simulators (via xcrun simctl). The architecture uses platform-specific manager implementations (AndroidRobot, IosRobot, SimctlManager) that all conform to a common Device API contract, eliminating the need for agents to understand platform-specific tool invocation patterns. Device resolution is request-scoped and stateless, with each tool call resolving the target device parameter through getRobotFromDevice() to the appropriate platform manager.
Uses a request-scoped, stateless Robot interface pattern that dynamically resolves platform managers at invocation time rather than maintaining persistent device connections, enabling horizontal scaling and multi-device orchestration without session management overhead. The common Device API contract ensures all platform implementations (ADB-based Android, WebDriverAgent-based iOS, simctl-based simulators) expose identical method signatures.
Unlike Appium (which requires separate server instances per platform) or Detox (which is iOS-focused), mobile-mcp provides true platform-agnostic automation through a unified MCP protocol interface that works with physical devices, emulators, and simulators without configuration changes.
accessibility-tree-based-ui-element-detection
Medium confidenceExtracts and parses native accessibility trees from both Android (via ADB accessibility service) and iOS (via WebDriverAgent accessibility API) to enable deterministic, coordinate-free UI interaction. The system builds a hierarchical representation of UI elements with semantic labels, roles, and bounds, allowing agents to locate and interact with elements by accessibility properties rather than fragile pixel coordinates. Falls back to screenshot-based coordinate tapping only when accessibility data is unavailable, providing a two-tier interaction strategy that prioritizes semantic stability.
Implements a two-tier interaction strategy that prioritizes native accessibility trees (Android AccessibilityService, iOS WebDriverAgent accessibility API) as the primary interaction mechanism, with screenshot-based coordinate fallback only when semantic data is unavailable. This approach provides deterministic, layout-resilient automation that survives UI changes without requiring coordinate recalibration.
Outperforms image-based automation tools (like Appium with image recognition) by using semantic accessibility metadata for element location, eliminating the need for ML-based visual matching and providing 100% deterministic element identification when accessibility labels are present.
webdriveragent-session-management
Medium confidenceManages WebDriverAgent session lifecycle for iOS devices (both physical and simulators) including session creation, teardown, and error recovery. The WebDriverAgent client (src/webdriveragent.ts) handles HTTP communication with WebDriverAgent endpoints, session initialization with app bundle IDs, and timeout management. The system maintains session state per device and automatically re-establishes sessions on failure. Session management is abstracted from agents — they invoke Robot interface methods without understanding WebDriverAgent protocol details. The implementation handles both localhost communication (simulators) and USB tunnel communication (physical devices) transparently.
Abstracts WebDriverAgent session lifecycle (creation, teardown, error recovery) behind the Robot interface, allowing agents to invoke iOS automation without understanding WebDriverAgent protocol or session management details. Handles both localhost (simulator) and USB tunnel (physical device) communication transparently.
Simpler than managing WebDriverAgent sessions directly (no protocol knowledge required) while providing automatic recovery on timeout, making it suitable for LLM agents that need straightforward iOS automation without WebDriverAgent expertise.
image-processing-and-screenshot-analysis
Medium confidenceProvides image processing utilities for screenshot analysis, including screenshot capture, image format conversion, and visual element detection support. The system captures screenshots from devices through platform-specific mechanisms (ADB screencap for Android, WebDriverAgent screenshot API for iOS) and processes them through image utilities for format conversion and metadata extraction. The implementation supports PNG and JPEG formats and provides hooks for visual element detection (though advanced CV/ML-based detection is not built-in). Screenshots are used as fallback when accessibility tree data is unavailable and for visual validation workflows.
Integrates screenshot capture as a secondary interaction tier with image processing utilities, providing visual fallback when accessibility trees are unavailable while maintaining performance for well-instrumented apps. Screenshot processing is platform-agnostic, supporting both Android (ADB screencap) and iOS (WebDriverAgent) capture mechanisms.
Provides pragmatic screenshot support for fallback scenarios without requiring external image processing libraries, though it lacks advanced CV/ML capabilities for visual element detection compared to specialized visual automation tools.
app-lifecycle-management
Medium confidenceProvides app installation, launch, termination, and state management capabilities across Android and iOS platforms. On Android, app lifecycle is managed through ADB commands (adb install, adb shell am start, adb shell am force-stop). On iOS, app lifecycle is managed through go-ios (for physical devices) and simctl (for simulators). The system supports app installation from APK/IPA files, launching apps with intent/URL parameters, and force-stopping/terminating apps. App state is managed per device, allowing agents to control app lifecycle as part of automation workflows.
Provides cross-platform app lifecycle management through platform-specific mechanisms (ADB for Android, go-ios/simctl for iOS) abstracted behind a common Robot interface, allowing agents to manage app installation and launch without platform-specific knowledge.
Simpler than app-specific testing frameworks (Espresso, XCUITest) for basic app lifecycle management, making it suitable for agents that need straightforward app installation and launch without framework overhead.
screenshot-and-coordinate-based-interaction
Medium confidenceCaptures full-screen screenshots from the device and enables coordinate-based interaction (tap, swipe, drag) when accessibility tree data is unavailable or insufficient. The system processes screenshots through image processing utilities to extract visual information, then maps agent-specified coordinates or visual regions to device touch events. This provides a fallback mechanism for apps with poor accessibility implementation or for visual-based automation scenarios where semantic interaction is not viable.
Implements screenshot capture as a secondary interaction tier that activates only when accessibility tree data is unavailable, reducing screenshot overhead for well-instrumented apps while maintaining fallback capability for legacy or third-party apps. Screenshot processing is integrated with the common Device API, allowing agents to seamlessly switch between semantic and coordinate-based interaction.
Provides a pragmatic hybrid approach compared to pure accessibility-based tools (which fail on inaccessible apps) or pure image-based tools (which are slow and fragile) — using accessibility as primary with screenshot fallback ensures broad app compatibility while maintaining performance for well-instrumented applications.
android-adb-device-automation
Medium confidenceImplements AndroidRobot class that wraps Android Debug Bridge (ADB) for controlling physical Android devices and AVD emulators. The implementation handles ADB command execution, device state management, accessibility service integration for UI tree extraction, and gesture simulation (tap, swipe, long-press) through ADB input events. Device discovery and management is handled by AndroidDeviceManager, which enumerates connected devices via 'adb devices' and maintains device-specific state. The architecture abstracts ADB complexity behind the common Robot interface, allowing agents to control Android devices without direct ADB knowledge.
Wraps ADB command execution within a stateless Robot interface that handles device discovery, accessibility service integration, and gesture simulation without requiring agents to understand ADB protocol details. AndroidDeviceManager provides automatic device enumeration and resolution, eliminating manual device serial number management.
Simpler than Appium for basic Android automation (no server setup required, works with standard ADB) while providing accessibility tree extraction comparable to Espresso, making it ideal for LLM agents that need straightforward device control without framework overhead.
ios-physical-device-automation-via-go-ios
Medium confidenceImplements IosRobot class that controls iOS physical devices (iPhone, iPad) connected via USB using the go-ios tool for device communication and WebDriverAgent for UI automation. The architecture uses go-ios for low-level device operations (device discovery, app installation, log streaming) and WebDriverAgent (a native iOS testing framework) for UI interaction and accessibility tree extraction. Device management is handled by IosManager, which discovers connected iOS devices via go-ios and maintains WebDriverAgent session state. The implementation abstracts the complexity of USB tunneling, WebDriverAgent session management, and iOS-specific constraints behind the common Robot interface.
Combines go-ios for device-level operations with WebDriverAgent for UI automation, providing a lightweight alternative to Xcode-dependent tools. The architecture handles WebDriverAgent session lifecycle (creation, teardown, error recovery) transparently, allowing agents to treat iOS physical devices as simple automation targets without understanding WebDriverAgent protocol details.
Lighter than XCUITest-based approaches (no Xcode required) while providing comparable UI automation capabilities through WebDriverAgent, making it accessible to non-iOS developers and LLM agents that need straightforward iOS device control.
ios-simulator-automation-via-simctl
Medium confidenceImplements SimctlManager and simulator-specific Robot implementation that controls iOS Simulators using xcrun simctl (Xcode's simulator control tool) for device management and WebDriverAgent for UI automation. The architecture uses simctl to enumerate running simulators, launch/terminate simulator instances, and manage simulator state, while WebDriverAgent provides UI interaction and accessibility tree extraction. Unlike physical iOS devices, simulators run locally and communicate with WebDriverAgent over localhost, eliminating USB tunnel complexity. The implementation abstracts simctl command execution and WebDriverAgent session management behind the common Robot interface.
Leverages xcrun simctl for simulator lifecycle management (launch, terminate, state queries) combined with local WebDriverAgent communication over localhost, eliminating USB tunnel overhead and enabling rapid simulator startup/teardown cycles. SimctlManager provides automatic simulator discovery and enumeration, allowing agents to target simulators by name or UDID without manual configuration.
Faster than physical device automation (no USB latency) and simpler than managing multiple Xcode projects, making it ideal for CI/CD pipelines and development-time automation where simulator coverage is sufficient.
multi-device-orchestration-and-discovery
Medium confidenceProvides device discovery and management across all supported platforms through platform-specific managers (AndroidDeviceManager, IosManager, SimctlManager) that enumerate connected/running devices and resolve device identifiers to Robot instances. The system maintains a registry of available devices and their capabilities, enabling agents to query device lists, filter by platform/type, and dynamically resolve device parameters at invocation time. Device resolution is request-scoped and stateless, allowing horizontal scaling and multi-device orchestration without persistent device state management. The architecture supports mixed-platform automation (Android + iOS in same workflow) through unified device resolution.
Implements request-scoped, stateless device resolution that dynamically discovers and resolves devices at invocation time rather than maintaining persistent device registries. This enables horizontal scaling and multi-device orchestration without session management overhead, though it trades latency (re-discovery per invocation) for simplicity and scalability.
Unlike device farm solutions (like BrowserStack or Sauce Labs) that manage device state server-side, mobile-mcp's stateless approach enables local multi-device automation without external dependencies, though it requires agents to manage device selection logic.
gesture-simulation-and-input-event-handling
Medium confidenceProvides cross-platform gesture simulation (tap, swipe, long-press, drag, multi-touch) by translating high-level gesture specifications into platform-specific input events. On Android, gestures are implemented via ADB input event commands (sendevent for low-level events or input tap/swipe for high-level commands). On iOS, gestures are implemented through WebDriverAgent's gesture API. The system supports both coordinate-based gestures (when accessibility data is unavailable) and element-based gestures (when accessibility tree provides element bounds). Gesture parameters (duration, velocity, pressure) are normalized across platforms to provide consistent behavior.
Normalizes gesture specifications across Android (ADB input events) and iOS (WebDriverAgent gesture API) through a common gesture interface, allowing agents to specify gestures once and execute them on any platform. Supports both coordinate-based (for inaccessible apps) and element-based (for accessible apps) gesture targeting, providing flexibility for different app types.
Simpler than platform-specific gesture APIs (Espresso, XCUITest) while providing cross-platform consistency, making it suitable for LLM agents that need straightforward gesture simulation without learning platform-specific gesture syntax.
mcp-protocol-server-implementation
Medium confidenceImplements a Model Context Protocol (MCP) server that exposes mobile automation capabilities as MCP tools, enabling LLM clients and AI agents to invoke mobile automation through the standardized MCP protocol. The server (src/server.ts) registers tools for device discovery, UI interaction, screenshot capture, and gesture simulation, mapping MCP tool schemas to Robot interface methods. The implementation supports multiple MCP transport modes (stdio, SSE) and handles tool invocation, parameter validation, and error reporting through the MCP protocol. The server is stateless and request-scoped, allowing multiple concurrent clients to orchestrate different devices without session conflicts.
Implements a stateless MCP server that maps the Robot interface to MCP tools, enabling LLM clients to invoke mobile automation through standardized protocol without understanding platform-specific details. The server supports multiple transport modes (stdio, SSE) and handles concurrent client connections without persistent session state.
Provides LLM-native integration through MCP protocol (vs. REST APIs or custom client libraries), enabling seamless integration with Claude, ChatGPT, and other MCP-compatible LLM clients without custom adapter code.
error-handling-and-device-state-recovery
Medium confidenceImplements a cross-platform error handling strategy that catches platform-specific errors (ADB connection failures, WebDriverAgent session timeouts, simctl command failures) and translates them into standardized error responses through the MCP protocol. The system includes device state recovery mechanisms such as automatic WebDriverAgent session re-establishment on iOS, ADB reconnection on Android, and simulator state validation. Error handling is integrated into each platform manager (AndroidRobot, IosRobot, SimctlManager) and propagated through the Robot interface to the MCP server layer, providing consistent error reporting to agents.
Implements platform-specific error handling (ADB reconnection, WebDriverAgent session re-establishment, simctl state validation) that translates into standardized MCP error responses, providing agents with consistent error semantics across platforms while maintaining platform-specific recovery strategies.
More robust than simple error propagation by including automatic recovery mechanisms (WebDriverAgent reconnection, ADB reconnection) that handle transient failures without agent intervention, though less sophisticated than dedicated device farm solutions with centralized health monitoring.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with mobile-mcp, ranked by overlap. Discovered automatically through the match graph.
MobileAgent
Mobile-Agent: The Powerful GUI Agent Family
UI-TARS-desktop
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
lamda
The most powerful Android RPA agent framework, next generation mobile automation.
Browser MCP
** (by UI-TARS) - A fast, lightweight MCP server that empowers LLMs with browser automation via Puppeteer’s structured accessibility data, featuring optional vision mode for complex visual understanding and flexible, cross-platform configuration.
lamda
The most powerful Android RPA agent framework, next generation mobile automation.
Parsagon
Create browser automations with natural...
Best For
- ✓AI agents and LLM-based automation frameworks targeting multi-platform mobile testing
- ✓teams building cross-platform mobile test automation without platform expertise
- ✓developers migrating from platform-specific tools to unified MCP-based orchestration
- ✓QA automation teams building resilient mobile test suites
- ✓AI agents performing complex multi-step mobile workflows requiring stable element references
- ✓developers testing accessibility compliance while automating user flows
- ✓iOS automation workflows where WebDriverAgent session management should be transparent
- ✓agents automating multiple iOS devices that require independent session management
Known Limitations
- ⚠Abstraction adds latency per tool invocation due to device resolution and platform manager dispatch
- ⚠Platform-specific capabilities that don't map to common API (e.g., Android-only accessibility services) are not exposed
- ⚠Stateless design means no persistent session state across multiple tool calls — each invocation re-resolves the device
- ⚠Accessibility tree parsing requires apps to properly implement accessibility labels — apps with poor accessibility metadata will fall back to coordinate-based interaction
- ⚠Android accessibility service requires device to have accessibility services enabled and app to expose accessibility events
- ⚠iOS physical devices require WebDriverAgent tunnel setup and may have latency in accessibility tree extraction over USB
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Apr 13, 2026
About
Model Context Protocol Server for Mobile Automation and Scraping (iOS, Android, Emulators, Simulators and Real Devices)
Categories
Alternatives to mobile-mcp
Are you the builder of mobile-mcp?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →