Capability
18 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →via “element interaction via accessibility-aware selectors”
Automate browsers and run web tests via Playwright MCP.
Unique: Uses accessibility tree semantics to generate robust element selectors that survive DOM refactoring, unlike brittle CSS/XPath selectors; validates element state before interaction to prevent silent failures
vs others: More robust than pixel-based clicking (screenshot + vision) because it uses semantic element properties that don't change with styling; more reliable than CSS selectors because it references accessibility roles that persist across DOM restructuring
via “adaptive element relocation and dynamic selector resolution”
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
Unique: Implements automatic selector relocation using structural DOM analysis and fallback matching strategies, enabling selectors to survive DOM mutations without manual updates—most competitors require static selectors or manual maintenance when HTML changes
vs others: More resilient than Selenium's static selectors because it adapts to DOM changes automatically, and more maintainable than regex-based extraction because it understands HTML structure semantically
via “accessibility snapshot capture and dom state extraction”
Chrome DevTools for coding agents
Unique: Leverages Chrome DevTools Protocol's accessibility domain to extract semantic trees rather than parsing raw HTML or screenshots, providing structured element metadata (roles, labels, coordinates) optimized for LLM reasoning without visual processing overhead.
vs others: Provides semantic accessibility information (vs Puppeteer's raw DOM queries or Playwright's visual locators), enabling agents to reason about page structure without screenshots or visual analysis, reducing token consumption and improving reasoning accuracy.
via “accessibility-snapshot-extraction-with-aria-semantics”
Chrome DevTools for coding agents
Unique: Uses Chrome DevTools Protocol accessibility tree queries (not DOM parsing) to extract semantic structure with ARIA attributes, producing LLM-optimized hierarchical JSON that preserves parent-child relationships and element roles without visual rendering overhead. Specifically designed for agents that need to interact with complex widgets (comboboxes, trees, tabs) by understanding their semantic roles.
vs others: Extracts semantic structure via CDP accessibility tree (vs parsing raw HTML or screenshots), providing accurate ARIA semantics and role information that enables agents to interact with complex widgets, whereas visual screenshot analysis requires OCR and cannot reliably detect ARIA state changes.
via “accessibility-tree-based-ui-element-detection”
Model Context Protocol Server for Mobile Automation and Scraping (iOS, Android, Emulators, Simulators and Real Devices)
Unique: Implements a two-tier interaction strategy that prioritizes native accessibility trees (Android AccessibilityService, iOS WebDriverAgent accessibility API) as the primary interaction mechanism, with screenshot-based coordinate fallback only when semantic data is unavailable. This approach provides deterministic, layout-resilient automation that survives UI changes without requiring coordinate recalibration.
vs others: Outperforms image-based automation tools (like Appium with image recognition) by using semantic accessibility metadata for element location, eliminating the need for ML-based visual matching and providing 100% deterministic element identification when accessibility labels are present.
via “accessibility-tree-based page state capture”
Playwright MCP server
Unique: Uses Playwright's native accessibility tree API to generate structured page snapshots, avoiding screenshot-based vision model dependency. This is fundamentally different from Claude's web browsing (which uses screenshots) or Selenium-based approaches that require custom DOM traversal logic.
vs others: Provides deterministic, text-based page understanding 10-100x faster than vision models while maintaining full semantic accuracy for interactive elements.
via “ui element selection and interaction via accessibility tree parsing”
The most powerful Android RPA agent framework, next generation mobile automation.
Unique: Combines UIAutomator2 accessibility tree parsing with direct ADB input event injection, allowing element selection via semantic properties (text, resource-id) while maintaining pixel-perfect interaction accuracy. Caches hierarchy snapshots to reduce query latency and supports both absolute coordinates and relative positioning within element bounds.
vs others: More reliable than Appium for local Android devices because it uses native UIAutomator2 without HTTP overhead; more flexible than image-based automation (OCR) because it works with dynamic content and doesn't require visual training data.
via “windows ui element tree extraction and state capture”
MCP Server for Computer Use in Windows
Unique: Uses Windows native UI Automation COM APIs instead of computer vision or pixel-based detection, providing reliable element identification across all Windows applications without ML model dependencies. Implements dual-mode capture: standard UI tree for desktop apps and filtered DOM mode for browsers that strips browser UI chrome.
vs others: More reliable than vision-based automation (PyAutoGUI, Selenium screenshot analysis) because it accesses the actual UI element hierarchy rather than inferring from pixels, and works with any LLM without requiring vision capabilities.
via “dom-element-selection-and-querying”
Model Context Protocol servers for Playwright
Unique: Exposes Playwright's locator API as MCP tools with rich metadata responses (bounding box, visibility, attributes), enabling LLMs to make informed decisions about element interaction without trial-and-error clicking, and supporting both CSS and XPath with automatic selector validation
vs others: Returns structured element metadata (visibility, enabled state, bounding box) in a single query, reducing the number of round-trips needed compared to frameworks that require separate queries for element existence, visibility, and interaction readiness
via “ui element selection and interaction via accessibility hierarchy inspection”
The most powerful Android RPA agent framework, next generation mobile automation.
Unique: Leverages Android's native Accessibility API and UIAutomator2 framework for robust element selection instead of image recognition or coordinate-based clicking, enabling selector-based automation that survives UI layout changes
vs others: More reliable than image-based automation (Appium with OpenCV) because it uses semantic element attributes; more maintainable than coordinate-based scripts because selectors adapt to layout changes
via “dom-element-interaction-with-selector-based-targeting”
Your browser is the API. CLI + MCP server for AI agents to control Chrome with your login state.
Unique: Uses CDP protocol for direct DOM interaction with built-in element visibility waits and multi-element batch operations. Integrates with the authenticated browser context to interact with pages as the logged-in user.
vs others: More reliable than Playwright/Selenium for authenticated pages because it uses the real browser session; built-in waits reduce flakiness vs raw CDP usage
via “dom-aware-element-selection-with-multi-strategy-matching”
🌐Web Agent Protocol (WAP) - Record and replay user interactions in the browser with MCP support
Unique: Implements intelligent fallback chain with selector strategy caching — learns which selector type works for each element and reuses it, reducing retry overhead on subsequent interactions
vs others: More resilient than single-strategy selectors (pure CSS or XPath) because it adapts to DOM changes, but more performant than brute-force fuzzy matching because it caches successful strategies
via “window-and-element-discovery-via-accessibility-tree”
I've been building computer-use tools for a while, and I quietly launched this about a month ago (122 Stars on GH). I figured it was worth sharing here.Over the last few months, a lot of computer-use agents have come out: Codex, Claude Code, CUA, and others. Most of them seem to work roughly li
Unique: Exposes raw accessibility tree structure as queryable data rather than requiring agents to know exact element IDs or coordinates — enables semantic element discovery based on accessibility metadata (roles, labels, states) that applications provide for assistive technology
vs others: More reliable than image-based UI automation (no OCR errors) and more flexible than coordinate-based clicking because it uses semantic accessibility metadata that persists across UI theme changes and layout adjustments
via “dom-based element selection and targeting”
Hey HN,Claude Code is pretty agentic now. It writes scripts, calls APIs, uses CLIs. But when something requires actually clicking through a website, it stops and asks me to do it.Problem is, I'm often unfamiliar with these platforms myself. "Go to App Store Connect and generate a P8 key&qu
Unique: Exposes DOM element metadata as structured data through MCP, allowing Claude to reason about page structure programmatically rather than relying solely on visual screenshots or trial-and-error clicking.
vs others: More reliable than coordinate-based clicking because it targets semantic elements rather than pixel positions, making automation resistant to layout changes or responsive design variations.
via “accessibility hierarchy inspection and ui element querying”
** - Popular MCP server that enables AI agents to scaffold, build, run and test iOS, macOS, visionOS and watchOS apps or simulators and wired and wireless devices. It has powerful UI-automation capabilities like controlling the simulator, capturing run-time logs, as well as taking screenshots and
Unique: Exposes XCTest's accessibility tree inspection as MCP tools, providing AI agents with structured UI element data for programmatic interaction — enables accessibility-based UI automation without screen coordinate guessing
vs others: More reliable than coordinate-based UI automation because it uses accessibility attributes; enables AI agents to interact with dynamic UIs that change layout or position
via “accessibility tree-based browser element targeting”
** (by UI-TARS) - A fast, lightweight MCP server that empowers LLMs with browser automation via Puppeteer’s structured accessibility data, featuring optional vision mode for complex visual understanding and flexible, cross-platform configuration.
Unique: Uses Puppeteer's native accessibility tree extraction rather than screenshot-based vision or regex DOM parsing, providing semantic-aware element identification that preserves ARIA relationships and computed accessibility properties in a structured format suitable for LLM reasoning
vs others: Faster and cheaper than vision-based browser agents (no VLM calls) while more reliable than regex/CSS selector approaches on dynamic or complex UIs, as it leverages browser-native accessibility APIs that understand semantic intent
via “accessibility testing with aria and role inspection”
A high-level API to automate web browsers
Unique: Exposes the browser's accessibility tree (ARIA roles, labels, descriptions) natively through the page API, enabling accessibility assertions without external tools or axe-core integration
vs others: More integrated than external accessibility tools because it uses the browser's native accessibility tree, and more flexible than manual ARIA inspection because it supports programmatic assertions
via “intelligent-element-targeting-and-interaction”
Notte is the fastest, most reliable Browser Using Agents framework
Unique: Likely implements a multi-strategy targeting approach: (1) semantic matching using ARIA roles and labels, (2) visual matching using screenshot analysis, (3) fuzzy matching for text-based element descriptions, (4) coordinate-based targeting as fallback. May use a scoring system to rank candidate elements and select the most confident match.
vs others: More resilient than selector-based automation (Selenium, Playwright) because it doesn't break when HTML changes, and more practical than pure vision-based approaches because it leverages semantic HTML to reduce false positives and improve targeting accuracy.
Building an AI tool with “Accessibility Tree Based Browser Element Targeting”?
Submit your artifact →curl unfragile.ai/agents.md | sh© 2026 Unfragile. The platform for software for agents.