What can mcp-playwright do?

stateful-browser-automation-via-mcp, dom-interaction-via-playwright-selectors, form-interaction-and-select-dropdown-handling, page-context-and-frame-switching, response-validation-and-assertion-tools, screenshot-and-pdf-export-with-viewport-control, page-content-extraction-and-screenshot-capture, browser-navigation-and-history-control, rest-api-testing-with-request-context, browser-console-monitoring-and-logging, action-recording-and-codegen-session-management, mcp-protocol-tool-dispatch-and-request-handling, element-wait-and-visibility-polling, keyboard-and-mouse-event-simulation

mcp-playwright

MCP ServerFree

Playwright Model Context Protocol Server - Tool to automate Browsers and APIs in Claude Desktop, Cline, Cursor IDE and More 🔌

Open Source

/ 100

14 capabilities

Capabilities14 decomposed

stateful-browser-automation-via-mcp

Medium confidence

Launches and maintains a single persistent Playwright browser instance (Chromium, Firefox, or WebKit) across multiple MCP tool invocations, with automatic page context management and error recovery. The server implements a global browser state pattern where the browser instance persists until explicitly closed, enabling multi-step workflows where each tool call operates on the same page context without re-initialization overhead.

Solves for

I want to automate a multi-step web workflow (login → navigate → extract data) without reinitializing the browser between stepsI need my AI agent to maintain browser state across multiple tool calls so it can handle complex user interactionsI want to reduce latency by keeping a single browser instance alive rather than spawning new instances per action

Best for

AI agents and LLMs (Claude, Copilot) automating multi-step web workflows

Teams building browser automation agents that need persistent session state

Developers integrating Playwright automation into MCP-compatible IDEs (Claude Desktop, Cline, Cursor)

Requires

Node.js 18+

Playwright 1.40+ (installed as dependency)

MCP SDK for Node.js

Limitations

Single global browser instance means concurrent requests from multiple clients will serialize or conflict — no multi-browser isolation per client

Browser state is in-memory only — no persistence across server restarts, requiring external state management for long-lived sessions

Page context is shared across all tool invocations, so one tool's navigation can affect subsequent tools' page state unexpectedly

What makes it unique

Implements MCP protocol binding for Playwright with a global browser singleton pattern, allowing LLMs to invoke 27 browser tools against a persistent page context without managing browser lifecycle — the server handles all browser state internally via BrowserToolBase inheritance and requestHandler.ts dispatch logic

vs alternatives

Simpler than Selenium Grid or Puppeteer clusters for LLM integration because it abstracts browser lifecycle entirely behind MCP tools, eliminating the need for agents to manage WebDriver sessions or connection pooling

dom-interaction-via-playwright-selectors

Medium confidence

Provides 8+ DOM interaction tools (click, fill, hover, drag, select, type, focus, blur) that use Playwright's selector engine to locate and manipulate elements. Each tool accepts CSS selectors, XPath, or Playwright's built-in locator strategies (role-based, text-based), validates element visibility and interactability before action, and returns detailed error messages if elements are not found or disabled.

Solves for

I need to click a button, fill a form field, or hover over an element using natural selectors without writing complex XPathI want my AI agent to interact with dynamic or shadow-DOM elements that CSS selectors alone can't reachI need validation that an element is actually visible and clickable before attempting interaction, with clear error feedback

Best for

LLM agents automating web forms, e-commerce checkouts, and user workflows

QA automation engineers generating test code from recorded interactions

Non-technical users recording browser actions and converting them to executable scripts

Requires

Active Playwright page context (from browser-automation capability)

Valid CSS selector, XPath, or Playwright locator string

Element must be in DOM and not hidden by CSS (display:none, visibility:hidden)

Limitations

Selector brittleness — if DOM structure changes, selectors may fail; no built-in selector repair or fuzzy matching

Shadow DOM and iframe traversal requires explicit frame context switching; no automatic cross-frame selector resolution

Drag-and-drop operations are limited to Playwright's drag() API — complex gesture sequences (multi-touch, pinch) not supported

What makes it unique

Wraps Playwright's locator engine with MCP tool contracts, enabling LLMs to use role-based and text-based selectors (e.g., 'button with text Submit') instead of brittle CSS selectors, with built-in visibility and interactability validation via Playwright's isVisible() and isEnabled() checks before action execution

vs alternatives

More robust than raw Selenium WebDriver for LLM use because Playwright's locator strategies (role, text, label) are more resilient to DOM changes, and the MCP abstraction eliminates the need for agents to manage WebDriver waits or exception handling

form-interaction-and-select-dropdown-handling

Medium confidence

Provides playwright_fill, playwright_select, and playwright_check tools that handle form input, dropdown selection, and checkbox/radio button toggling. The tools use Playwright's fill() for text inputs, selectOption() for <select> elements, and check()/uncheck() for checkboxes and radio buttons. Each tool validates element type before interaction and returns success/error status.

Solves for

I need to fill text input fields, select options from dropdowns, and toggle checkboxes as part of form automationI want to handle both standard HTML forms and custom form components that use JavaScript for state managementI need to validate that form fields are in the correct state before submitting

Best for

LLM agents automating web forms, surveys, and checkout flows

QA engineers testing form validation and submission workflows

Web scraping agents that need to fill forms to access gated content

Requires

Active Playwright page context

Valid element selector for form field

For fill: text value to enter

Limitations

Select tool works only with standard HTML <select> elements — custom dropdown components (built with divs, React, Vue) require click-based interaction

Fill tool clears the field before typing — no support for appending text or partial field updates

Check/uncheck tools assume standard HTML checkbox/radio elements — custom toggle components may not work

What makes it unique

Provides separate MCP tools for fill, select, and check operations, each with element-type validation and error handling, enabling LLMs to interact with standard HTML forms without understanding the differences between input types or managing Playwright's type-specific APIs

vs alternatives

More robust than generic click-and-type automation because it uses Playwright's type-specific APIs (selectOption for dropdowns, check for checkboxes) which handle browser quirks and validation, reducing flakiness compared to simulating clicks and keyboard input

page-context-and-frame-switching

Medium confidence

Provides playwright_switch_frame and playwright_get_frames tools that manage frame and iframe context switching. The tools use Playwright's frame() API to select frames by name, URL, or index, and return frame information (name, URL, parent frame). Enables automation of pages with iframes, nested frames, and cross-origin frames (if allowed by CORS).

Solves for

I need to interact with elements inside iframes or nested frames without losing contextI want to switch between multiple frames on the same page and perform actions in each frameI need to detect and list all frames on a page to understand its structure

Best for

LLM agents automating pages with iframes (e.g., payment gateways, embedded widgets, third-party content)

QA engineers testing multi-frame applications and cross-frame interactions

Web scraping agents that need to extract content from iframes

Requires

Active Playwright page context

Frame name, URL, or index to switch to

Frame must be same-origin (CORS-compliant) for content access

Limitations

Cross-origin iframes are not accessible due to browser security restrictions — no way to interact with frames from different domains

Frame selection by URL or name is fragile — if frame attributes change, selectors may fail

No automatic frame detection or traversal — caller must know frame names or indices to switch

What makes it unique

Exposes Playwright's frame() API as MCP tools for frame switching and enumeration, enabling LLMs to navigate iframe hierarchies without understanding Playwright's frame context model or managing frame references across tool invocations

vs alternatives

More explicit than Selenium's frame switching because it provides frame enumeration (get_frames) and returns frame metadata (name, URL), allowing agents to discover frames dynamically rather than hardcoding frame selectors

response-validation-and-assertion-tools

Medium confidence

Provides expect_response and assert_response tools that validate HTTP responses from API calls or page navigation. The tools check response status codes, headers, body content (JSON schema, text patterns), and return validation results (pass/fail) with detailed error messages. Useful for verifying API contracts and detecting unexpected responses during automation.

Solves for

I need to verify that an API response has the expected status code and JSON structure before proceedingI want to assert that a page navigation returned a 200 status, not a 404 or 500 errorI need to validate response headers (Content-Type, Set-Cookie) to ensure the server is behaving correctly

Best for

QA engineers testing API contracts and response validation

LLM agents that need to detect API errors and decide whether to retry or abort

Full-stack testing frameworks that validate both API and UI behavior

Requires

Recent HTTP response from API call or page navigation

Expected status code, headers, or body content to validate against

Limitations

JSON schema validation is basic — no support for complex schemas or custom validation rules

Response body validation is string-based — no support for binary content or large payloads

Assertion failures are reported but don't stop execution — caller must check assertion results and decide next action

What makes it unique

Provides dedicated assertion tools (expect_response, assert_response) that validate HTTP responses with structured error reporting, enabling LLMs to verify API contracts and detect errors without writing custom validation logic or parsing response objects

vs alternatives

More integrated than generic assertion libraries because it works directly with MCP tool responses and provides structured validation results that agents can reason about, rather than requiring agents to parse response objects and write custom validation code

screenshot-and-pdf-export-with-viewport-control

Medium confidence

Provides playwright_screenshot and playwright_save_as_pdf tools that capture page visuals in PNG or PDF format with optional viewport and full-page rendering. The tools accept options for full-page capture, viewport dimensions, clip regions, and quality settings. Screenshots are returned as base64-encoded PNG, and PDFs are returned as binary files. Useful for visual testing, documentation, and evidence collection.

Solves for

I need to capture a screenshot of the current page state for debugging or documentationI want to export a page as PDF for archival or sharing with non-technical stakeholdersI need to capture specific regions of a page (clip) without capturing the entire viewport

Best for

QA engineers collecting visual evidence for test reports

LLM agents that need visual feedback to understand page state

Documentation and tutorial generation tools that need page screenshots

Requires

Active Playwright page context

Page must be fully loaded (no automatic wait-for-load logic)

Limitations

Screenshots capture only the rendered page — dynamic content loaded via infinite scroll is not captured unless scrolled into view

PDF generation is a full-page render — very long pages may produce large files; no built-in pagination or section-based splitting

Base64 encoding of large screenshots increases token usage in LLM contexts — no streaming or chunked image transfer

What makes it unique

Exposes Playwright's screenshot() and pdf() APIs as MCP tools with base64 encoding for easy transport over STDIO, enabling LLMs to capture visual evidence without managing file I/O or image encoding, and returning images directly in tool responses for agent reasoning

vs alternatives

More convenient than raw Playwright screenshots because it returns base64-encoded images directly in MCP tool responses, allowing LLMs to reason about visual content without requiring separate file handling or image transport mechanisms

page-content-extraction-and-screenshot-capture

Medium confidence

Extracts visible text, HTML structure, and accessibility tree from the current page via playwright_get_visible_text and playwright_get_page_content tools, and captures full-page or viewport screenshots as PNG/PDF via playwright_screenshot and playwright_save_as_pdf. The extraction logic uses Playwright's textContent() and innerHTML() APIs with optional filtering to return only visible, non-hidden elements.

Solves for

I need to read all visible text from a page to understand its current state and make decisions about next actionsI want to capture a screenshot of the page for debugging, documentation, or visual verificationI need to extract the DOM structure or accessibility tree to understand page layout and element relationships

Best for

LLM agents that need visual or textual feedback to decide next steps in a workflow

Test automation engineers generating test evidence (screenshots, PDFs) for reports

Accessibility auditing tools that need to analyze page structure and ARIA attributes

Requires

Active Playwright page context

Page must be fully loaded (no automatic wait-for-load logic; caller must ensure page is ready)

For PDF: page must fit within reasonable memory bounds (very large pages may timeout)

Limitations

Text extraction returns only visible text — hidden elements (display:none, aria-hidden) are excluded, so agents cannot see off-screen or collapsed content

Screenshots capture only the current viewport or full page height; dynamic content loaded via infinite scroll is not captured unless scrolled into view first

PDF generation is a full-page render — very long pages may produce large files; no built-in pagination or section-based PDF splitting

What makes it unique

Combines Playwright's textContent(), innerHTML(), and accessibility tree APIs into MCP tools that return structured data (text, HTML, ARIA tree) alongside visual captures (PNG, PDF), enabling LLMs to reason about page state using both textual and visual information without requiring separate vision models

vs alternatives

More comprehensive than Puppeteer's screenshot-only approach because it extracts both visual (PNG/PDF) and semantic (text, HTML, accessibility tree) representations, allowing agents to understand page structure without vision model overhead

browser-navigation-and-history-control

Medium confidence

Provides playwright_navigate, playwright_go_back, playwright_go_forward, and playwright_reload tools that control page navigation using Playwright's page.goto(), page.goBack(), page.goForward(), and page.reload() APIs. Each tool accepts URLs, handles redirects and timeouts, and returns navigation status (success, timeout, network error) with optional wait-for-load-state configuration (load, domcontentloaded, networkidle).

Solves for

I need to navigate to a URL and wait for the page to fully load before proceeding with interactionsI want to go back or forward in browser history to revisit previous pages without re-entering URLsI need to reload the current page to refresh data or recover from stale state

Best for

LLM agents automating multi-page workflows (search → results → detail → checkout)

Web scraping agents that need to navigate between pages and extract data

Testing frameworks that need to simulate user navigation patterns

Requires

Active Playwright page context

Valid URL (for navigate tool) or existing navigation history (for back/forward tools)

Network connectivity to target URL

Limitations

No automatic wait-for-element logic — caller must use separate playwright_wait_for_selector tool if waiting for specific elements after navigation

Redirect chains are followed automatically, but final URL may differ from requested URL; no built-in redirect tracking or history inspection

History navigation (back/forward) fails silently if no history exists — no error returned, just stays on current page

What makes it unique

Wraps Playwright's navigation APIs with MCP tool contracts that expose wait-until strategies (load, domcontentloaded, networkidle) as tool parameters, allowing LLMs to specify load-state expectations without understanding Playwright internals, and returns structured navigation status (success/timeout/error) for agent decision-making

vs alternatives

More flexible than Selenium's WebDriver.get() because Playwright's wait-until strategies (networkidle) detect when dynamic content has finished loading, not just when DOM is ready, reducing flaky waits in AJAX-heavy applications

rest-api-testing-with-request-context

Medium confidence

Provides 5 HTTP method tools (playwright_api_get, playwright_api_post, playwright_api_put, playwright_api_patch, playwright_api_delete) that create APIRequestContext instances on-demand to execute REST requests. Each tool accepts URL, headers, body, authentication (Bearer token, Basic auth), query parameters, and returns response status, headers, and body (JSON or text) with optional response validation via expect_response and assert_response tools.

Solves for

I need to test REST API endpoints directly without going through the browser UII want to set up API state (create test data, authenticate) before browser automation beginsI need to validate API responses (status code, JSON schema, header values) as part of my automation workflow

Best for

QA engineers testing full-stack workflows (API setup → browser interaction → API verification)

LLM agents that need to interact with both REST APIs and web UIs in the same workflow

API testing frameworks that want to leverage Playwright's request context for cookie/session management

Requires

Valid HTTP URL

API key or authentication credentials (if endpoint requires auth)

Network connectivity to API endpoint

Limitations

No built-in request retry logic or exponential backoff — failed requests fail immediately without retry

Response body is returned as raw text or JSON; no automatic schema validation or type coercion

Cookie and session management is automatic but opaque — no direct access to cookie jar or session state inspection

What makes it unique

Leverages Playwright's APIRequestContext to share cookies and session state between API calls and browser automation, enabling seamless workflows where API authentication tokens can be used in subsequent browser requests without manual cookie management, implemented via ApiToolBase inheritance pattern

vs alternatives

More integrated than separate curl/axios tools because it shares browser cookies and session context automatically, eliminating the need for agents to manually extract and pass authentication tokens between API and browser layers

browser-console-monitoring-and-logging

Medium confidence

Captures browser console messages (log, warn, error, debug) via playwright_console_logs tool using Playwright's page.on('console') event listener. The ConsoleLogsTool registers a message handler that buffers console messages in memory and returns them as structured objects with message type, text content, and optional stack traces. Useful for debugging JavaScript errors and monitoring application behavior during automation.

Solves for

I need to see JavaScript errors and warnings that occur during automation to debug page behaviorI want to capture application logs (console.log statements) to verify that the page is behaving as expectedI need to detect when JavaScript errors occur and fail the automation workflow if critical errors are logged

Best for

QA engineers debugging flaky automation tests by inspecting console errors

LLM agents that need to detect JavaScript errors and decide whether to retry or abort

Full-stack testing frameworks that want to correlate browser console logs with API responses

Requires

Active Playwright page context

Console logging must be enabled in the browser (default behavior)

Limitations

Console messages are buffered in memory only — no persistence across page reloads or browser restarts

Large volumes of console messages (>10k per page) may cause memory bloat; no automatic message pruning or rotation

Stack traces are captured only if the browser includes them in the console message — source maps are not resolved

What makes it unique

Implements a ConsoleLogsTool that registers Playwright's page.on('console') event listener to capture all console messages in a structured format, enabling LLMs to inspect JavaScript errors and application logs without requiring separate DevTools protocol connections or browser extensions

vs alternatives

Simpler than DevTools protocol inspection because it exposes console messages directly as MCP tool output, allowing agents to reason about errors without parsing DevTools JSON or managing separate protocol connections

action-recording-and-codegen-session-management

Medium confidence

Provides start_codegen_session, end_codegen_session, get_codegen_session, and clear_codegen_session tools that manage ActionRecorder instances. When a codegen session is active, all browser tool invocations (click, fill, navigate, etc.) are recorded as action objects. When end_codegen_session is called, the PlaywrightGenerator converts recorded actions into executable Playwright test code (JavaScript or TypeScript) that can be saved and run independently.

Solves for

I want to record a series of browser interactions and automatically generate test code that I can run laterI need to convert manual browser actions into executable automation scripts without writing code from scratchI want to generate test code that other team members can understand and maintain

Best for

Non-technical QA engineers who want to record tests without writing code

Test automation engineers who want to generate boilerplate test code and then refine it

LLM agents that need to generate executable test scripts from recorded interactions

Requires

Active Playwright page context

Browser tools must be invoked while codegen session is active (start_codegen_session called first)

Limitations

Generated code is basic boilerplate — no assertions, error handling, or page object patterns; requires manual refinement

Recording captures only tool invocations, not the reasoning or intent behind actions — generated code lacks comments explaining why actions were taken

Complex interactions (multi-step drag-and-drop, keyboard shortcuts, file uploads) may not record accurately or may generate incorrect code

What makes it unique

Implements ActionRecorder that intercepts all browser tool invocations during a codegen session and converts them to Playwright test code via PlaywrightGenerator, enabling LLMs to generate executable test scripts from recorded interactions without requiring users to understand Playwright API syntax

vs alternatives

More integrated than Playwright Inspector because it generates code automatically from MCP tool invocations without requiring users to interact with a separate UI, and it works within LLM agent workflows where recording and code generation happen programmatically

mcp-protocol-tool-dispatch-and-request-handling

Medium confidence

Implements the Model Context Protocol (MCP) server that exposes 36 tools (27 browser, 5 API, 4 codegen) as MCP resources. The server receives tool invocation requests via STDIO transport, routes them through requestHandler.ts and toolHandler.ts, executes the appropriate tool via the tool's execute(args, context) method, and returns ToolResponse objects with result or error. Each tool is registered with JSON schema describing parameters and return types, enabling MCP clients (Claude Desktop, Cline, Cursor) to discover and invoke tools with type safety.

Solves for

I want to integrate Playwright automation into my LLM agent (Claude, Copilot) without writing custom tool bindingsI need my IDE (Cursor, Cline) to have access to browser automation tools that work seamlessly with AI code generationI want to expose browser and API testing capabilities to multiple MCP clients without duplicating tool implementations

Best for

LLM application developers building agents that need browser automation (Claude Desktop, Cline, Cursor)

Teams standardizing on MCP for tool integration across multiple AI clients

Developers building custom MCP clients that need Playwright automation capabilities

Requires

Node.js 18+

MCP SDK for Node.js (@modelcontextprotocol/sdk)

MCP client that supports STDIO transport (Claude Desktop, Cline, Cursor, or custom client)

Limitations

STDIO transport is synchronous — tool invocations block until completion; no async/await support for long-running operations

Tool schema is static — no dynamic tool registration or runtime schema updates; adding new tools requires server restart

Error handling is tool-level only — no built-in retry logic, circuit breakers, or graceful degradation if tools fail

What makes it unique

Implements a complete MCP server that wraps Playwright tools with MCP protocol contracts, enabling seamless integration with Claude Desktop, Cline, and Cursor without requiring users to write custom tool bindings or manage Playwright lifecycle — the server handles all MCP protocol details and tool dispatch internally

vs alternatives

More standardized than custom Playwright integrations because it uses the MCP protocol, allowing the same tool set to work across multiple AI clients (Claude, Copilot, custom agents) without reimplementation, and it provides automatic tool discovery and schema validation

element-wait-and-visibility-polling

Medium confidence

Provides playwright_wait_for_selector and playwright_wait_for_navigation tools that poll for element visibility or navigation completion using Playwright's waitForSelector() and waitForNavigation() APIs. The tools accept CSS selectors, XPath, or Playwright locators, and timeout values (default 30s), and return success/timeout status. Useful for handling dynamic content loading, AJAX requests, and asynchronous page updates.

Solves for

I need to wait for a dynamically-loaded element to appear before interacting with itI want to wait for a page navigation to complete after clicking a link, without hardcoding sleep delaysI need to handle pages with slow network or JavaScript that loads content asynchronously

Best for

LLM agents automating AJAX-heavy or single-page applications (SPAs) with dynamic content

QA engineers writing robust tests that don't rely on hardcoded sleep() delays

Web scraping agents that need to wait for JavaScript-rendered content before extraction

Requires

Active Playwright page context

Valid CSS selector, XPath, or Playwright locator string

Timeout value in milliseconds (default 30000)

Limitations

Timeout is fixed per tool invocation — no adaptive timeout based on page load time or network conditions

Polling is element-level only — no support for waiting for multiple elements or complex conditions (e.g., 'wait for element A OR element B')

No visibility threshold configuration — element is considered visible if any part is in viewport; no support for 'wait for 80% visible'

What makes it unique

Wraps Playwright's waitForSelector() and waitForNavigation() APIs as MCP tools with configurable timeout and state parameters, enabling LLMs to wait for dynamic content without understanding Playwright's async/await patterns or managing Promise resolution

vs alternatives

More reliable than hardcoded sleep() delays because it polls for actual element visibility or navigation completion, reducing flakiness in AJAX-heavy applications, and it integrates timeout and state configuration as tool parameters for agent control

keyboard-and-mouse-event-simulation

Medium confidence

Provides playwright_type, playwright_press, playwright_hover, and playwright_drag tools that simulate keyboard and mouse events using Playwright's type(), press(), hover(), and drag() APIs. The tools accept element selectors, keyboard input strings, modifier keys (Shift, Ctrl, Alt, Meta), and drag source/target coordinates. Useful for complex interactions like keyboard shortcuts, form filling with special characters, and drag-and-drop operations.

Solves for

I need to type text with special characters or keyboard shortcuts (Ctrl+A, Shift+Tab) that click-based interaction can't handleI want to simulate mouse hover effects that trigger tooltips or dropdown menusI need to perform drag-and-drop operations on elements like kanban boards or file uploads

Best for

LLM agents automating complex form interactions with keyboard shortcuts and special input

QA engineers testing keyboard accessibility and focus management

Web scraping agents that need to interact with drag-and-drop interfaces

Requires

Active Playwright page context

Valid element selector for keyboard/mouse target

For drag: source and target element selectors or coordinates

Limitations

Keyboard input is character-by-character — no support for complex input methods (IME, voice input, paste from clipboard)

Modifier key combinations are limited to Shift, Ctrl, Alt, Meta — no support for custom key sequences or macros

Drag-and-drop is limited to Playwright's drag() API — no support for multi-touch gestures, pinch, or swipe

What makes it unique

Exposes Playwright's type(), press(), hover(), and drag() APIs as separate MCP tools with modifier key support, enabling LLMs to simulate complex keyboard and mouse interactions without understanding Playwright's event API or timing semantics

vs alternatives

More flexible than click-only automation because it supports keyboard shortcuts, special characters, and drag-and-drop, enabling agents to interact with complex UIs that require multi-key combinations or gesture-based interactions

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with mcp-playwright, ranked by overlap. Discovered automatically through the match graph.

MCP Server40

@executeautomation/playwright-mcp-server

Model Context Protocol servers for Playwright

element-selection-and-interactionform-filling-and-input-automation

2 shared capabilities

MCP Server42

@executeautomation/playwright-mcp-server

Model Context Protocol servers for Playwright

user-interaction-simulationdom-element-selection-and-querying

2 shared capabilities

MCP Server40

playwright-mcp

Playwright MCP server

interactive element interaction and form automation

1 shared capability

MCP Server24

onestep-puppeteer-mcp-server

Experimental MCP server for browser automation using Puppeteer (inspired by @modelcontextprotocol/server-puppeteer)

dom-element-interaction-and-selection

1 shared capability

MCP Server38

bb-browser

Your browser is the API. CLI + MCP server for AI agents to control Chrome with your login state.

dom-element-interaction-with-selector-based-targeting

1 shared capability

MCP Server27

@hisma/server-puppeteer

Fork and update (v0.6.5) of the original @modelcontextprotocol/server-puppeteer MCP server for browser automation using Puppeteer.

dom-element-interaction-and-manipulation

1 shared capability

Best For

✓AI agents and LLMs (Claude, Copilot) automating multi-step web workflows
✓Teams building browser automation agents that need persistent session state
✓Developers integrating Playwright automation into MCP-compatible IDEs (Claude Desktop, Cline, Cursor)
✓LLM agents automating web forms, e-commerce checkouts, and user workflows
✓QA automation engineers generating test code from recorded interactions
✓Non-technical users recording browser actions and converting them to executable scripts
✓LLM agents automating web forms, surveys, and checkout flows
✓QA engineers testing form validation and submission workflows

Known Limitations

⚠Single global browser instance means concurrent requests from multiple clients will serialize or conflict — no multi-browser isolation per client
⚠Browser state is in-memory only — no persistence across server restarts, requiring external state management for long-lived sessions
⚠Page context is shared across all tool invocations, so one tool's navigation can affect subsequent tools' page state unexpectedly
⚠Selector brittleness — if DOM structure changes, selectors may fail; no built-in selector repair or fuzzy matching
⚠Shadow DOM and iframe traversal requires explicit frame context switching; no automatic cross-frame selector resolution
⚠Drag-and-drop operations are limited to Playwright's drag() API — complex gesture sequences (multi-touch, pinch) not supported

Requirements

Node.js 18+Playwright 1.40+ (installed as dependency)MCP SDK for Node.jsSTDIO transport available (Claude Desktop, Cline, Cursor IDE, or custom MCP client)Active Playwright page context (from browser-automation capability)Valid CSS selector, XPath, or Playwright locator stringElement must be in DOM and not hidden by CSS (display:none, visibility:hidden)Active Playwright page context

Input / Output

Accepts: browser-engine-type (chromium|firefox|webkit), launch-options (headless boolean, viewport dimensions), selector (CSS|XPath|playwright-locator), interaction-type (click|fill|hover|drag|select|type|focus|blur), value (for fill/type operations), options (modifiers like Shift, Ctrl for keyboard events), value (for fill tool: text to enter), option (for select tool: option value or label), force (boolean: force interaction even if element is disabled), frame-selector (name|url|index), frame-value (frame name, URL pattern, or numeric index), response-object (from API call or navigation), expected-status (HTTP status code), expected-headers (key-value pairs), expected-body (JSON object or text pattern), full-page (boolean: capture entire page or viewport only), clip (object: x, y, width, height for region capture), quality (for PNG: 0-100, default 100), omit-background (boolean: transparent background for PNG), extraction-type (visible-text|page-content|accessibility-tree), screenshot-options (full-page boolean, viewport-only boolean, format: png|pdf), filter-options (optional: exclude-selectors, include-only-selectors), url (for navigate tool), wait-until (load|domcontentloaded|networkidle|commit), timeout-ms (milliseconds, default 30000), url (HTTP/HTTPS endpoint), headers (key-value pairs), body (JSON object or form data), auth (Bearer token, Basic auth, or custom header), query-params (URL query string parameters), timeout-ms (milliseconds), clear-buffer (boolean, optional: clear previous messages before returning), session-name (optional: identifier for the recording session), language (javascript|typescript, default: javascript), tool-name (string, e.g., 'playwright_click'), tool-arguments (JSON object matching tool schema), mcp-request (JSON-RPC 2.0 request with method 'tools/call'), state (for wait-for-selector: visible|hidden|attached|detached), text (for type tool), key (for press tool: Enter, Tab, Escape, etc.), modifiers (array of Shift|Ctrl|Alt|Meta), source-selector (for drag source), target-selector (for drag target)

Produces: browser-handle (internal reference), page-context (active page object), status-confirmation (success/error), interaction-result (success|element-not-found|not-visible|not-enabled), error-message (detailed reason for failure), element-state (before/after screenshots or DOM snapshot), interaction-status (success|element-not-found|invalid-element-type|not-visible), error-message (if interaction failed), frame-info (name, URL, parent frame), frames-list (array of all frames on page), switch-status (success|frame-not-found), validation-result (pass|fail), error-message (if validation failed), actual-vs-expected (comparison of actual and expected values), image (PNG as base64 string), pdf (PDF as binary or base64), file-size (bytes), text (plain text string), html (DOM structure as HTML string), accessibility-tree (ARIA roles and labels), image (PNG base64 or binary), pdf (PDF binary), navigation-status (success|timeout|network-error|navigation-aborted), final-url (actual URL after redirects), page-title (document.title), status-code (HTTP status if available), status-code (HTTP status), headers (response headers as key-value pairs), body (response body as JSON or text), ok (boolean, true if status 200-299), console-messages (array of objects with type, text, location, args), message-type (log|warn|error|debug), timestamp (when message was logged), generated-code (Playwright test script as string), action-list (array of recorded actions with selectors and values), code-format (javascript or typescript), tool-response (JSON object with content array, isError boolean), result (tool-specific output, e.g., screenshot base64, text content), error (error message and stack trace if tool fails), wait-status (success|timeout|element-not-found), elapsed-time (milliseconds waited), final-state (element visibility state when wait completed), interaction-status (success|element-not-found|not-visible)

UnfragileRank

Adoption32%(30% weight)

Quality38%(25% weight)

Ecosystem50%(25% weight)

Match Graph10%(15% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: MCP Server

14 capabilities

Visit mcp-playwright→

Repository Details

5,456

Stars

496

Forks

TypeScript

Language

MIT

License

Last commit: Dec 13, 2025

About

Playwright Model Context Protocol Server - Tool to automate Browsers and APIs in Claude Desktop, Cline, Cursor IDE and More 🔌

Alternatives to mcp-playwright

IntelliCode50Extension

AI-assisted development

Compare →

GitHub Copilot Chat53Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot52Extension

Your AI pair programmer

Compare →

Claude Code for VS Code52Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

Are you the builder of mcp-playwright?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

github awesomemcp registry

Looking for something else?

Search →

Capabilities14 decomposed

stateful-browser-automation-via-mcp

Medium confidence

Solves for

Best for

AI agents and LLMs (Claude, Copilot) automating multi-step web workflows

Teams building browser automation agents that need persistent session state

Developers integrating Playwright automation into MCP-compatible IDEs (Claude Desktop, Cline, Cursor)

Requires

Node.js 18+

Playwright 1.40+ (installed as dependency)

MCP SDK for Node.js

Limitations

Single global browser instance means concurrent requests from multiple clients will serialize or conflict — no multi-browser isolation per client

Browser state is in-memory only — no persistence across server restarts, requiring external state management for long-lived sessions

Page context is shared across all tool invocations, so one tool's navigation can affect subsequent tools' page state unexpectedly

What makes it unique

vs alternatives

dom-interaction-via-playwright-selectors

Medium confidence

Solves for

Best for

LLM agents automating web forms, e-commerce checkouts, and user workflows

QA automation engineers generating test code from recorded interactions

Non-technical users recording browser actions and converting them to executable scripts

Requires

Active Playwright page context (from browser-automation capability)

Valid CSS selector, XPath, or Playwright locator string

Element must be in DOM and not hidden by CSS (display:none, visibility:hidden)

Limitations

Selector brittleness — if DOM structure changes, selectors may fail; no built-in selector repair or fuzzy matching

Shadow DOM and iframe traversal requires explicit frame context switching; no automatic cross-frame selector resolution

Drag-and-drop operations are limited to Playwright's drag() API — complex gesture sequences (multi-touch, pinch) not supported

What makes it unique

vs alternatives

form-interaction-and-select-dropdown-handling

Medium confidence

Solves for

Best for

LLM agents automating web forms, surveys, and checkout flows

QA engineers testing form validation and submission workflows

Web scraping agents that need to fill forms to access gated content

Requires

Active Playwright page context

Valid element selector for form field

For fill: text value to enter

Limitations

Select tool works only with standard HTML <select> elements — custom dropdown components (built with divs, React, Vue) require click-based interaction

Fill tool clears the field before typing — no support for appending text or partial field updates

Check/uncheck tools assume standard HTML checkbox/radio elements — custom toggle components may not work

What makes it unique

vs alternatives

page-context-and-frame-switching

Medium confidence

Solves for

Best for

LLM agents automating pages with iframes (e.g., payment gateways, embedded widgets, third-party content)

QA engineers testing multi-frame applications and cross-frame interactions

Web scraping agents that need to extract content from iframes

Requires

Active Playwright page context

Frame name, URL, or index to switch to

Frame must be same-origin (CORS-compliant) for content access

Limitations

Cross-origin iframes are not accessible due to browser security restrictions — no way to interact with frames from different domains

Frame selection by URL or name is fragile — if frame attributes change, selectors may fail

No automatic frame detection or traversal — caller must know frame names or indices to switch

What makes it unique

vs alternatives

response-validation-and-assertion-tools

Medium confidence

Solves for

Best for

QA engineers testing API contracts and response validation

LLM agents that need to detect API errors and decide whether to retry or abort

Full-stack testing frameworks that validate both API and UI behavior

Requires

Recent HTTP response from API call or page navigation

Expected status code, headers, or body content to validate against

Limitations

JSON schema validation is basic — no support for complex schemas or custom validation rules

Response body validation is string-based — no support for binary content or large payloads

Assertion failures are reported but don't stop execution — caller must check assertion results and decide next action

What makes it unique

vs alternatives

screenshot-and-pdf-export-with-viewport-control

Medium confidence

Solves for

Best for

QA engineers collecting visual evidence for test reports

LLM agents that need visual feedback to understand page state

Documentation and tutorial generation tools that need page screenshots

Requires

Active Playwright page context

Page must be fully loaded (no automatic wait-for-load logic)

Limitations

Screenshots capture only the rendered page — dynamic content loaded via infinite scroll is not captured unless scrolled into view

PDF generation is a full-page render — very long pages may produce large files; no built-in pagination or section-based splitting

Base64 encoding of large screenshots increases token usage in LLM contexts — no streaming or chunked image transfer

What makes it unique

vs alternatives

page-content-extraction-and-screenshot-capture

Medium confidence

Solves for

Best for

LLM agents that need visual or textual feedback to decide next steps in a workflow

Test automation engineers generating test evidence (screenshots, PDFs) for reports

Accessibility auditing tools that need to analyze page structure and ARIA attributes

Requires

Active Playwright page context

Page must be fully loaded (no automatic wait-for-load logic; caller must ensure page is ready)

For PDF: page must fit within reasonable memory bounds (very large pages may timeout)

Limitations

Text extraction returns only visible text — hidden elements (display:none, aria-hidden) are excluded, so agents cannot see off-screen or collapsed content

Screenshots capture only the current viewport or full page height; dynamic content loaded via infinite scroll is not captured unless scrolled into view first

PDF generation is a full-page render — very long pages may produce large files; no built-in pagination or section-based PDF splitting

What makes it unique

vs alternatives

browser-navigation-and-history-control

Medium confidence

Solves for

Best for

LLM agents automating multi-page workflows (search → results → detail → checkout)

Web scraping agents that need to navigate between pages and extract data

Testing frameworks that need to simulate user navigation patterns

Requires

Active Playwright page context

Valid URL (for navigate tool) or existing navigation history (for back/forward tools)

Network connectivity to target URL

Limitations

No automatic wait-for-element logic — caller must use separate playwright_wait_for_selector tool if waiting for specific elements after navigation

Redirect chains are followed automatically, but final URL may differ from requested URL; no built-in redirect tracking or history inspection

History navigation (back/forward) fails silently if no history exists — no error returned, just stays on current page

What makes it unique

vs alternatives

rest-api-testing-with-request-context

Medium confidence

Solves for

Best for

QA engineers testing full-stack workflows (API setup → browser interaction → API verification)

LLM agents that need to interact with both REST APIs and web UIs in the same workflow

API testing frameworks that want to leverage Playwright's request context for cookie/session management

Requires

Valid HTTP URL

API key or authentication credentials (if endpoint requires auth)

Network connectivity to API endpoint

Limitations

No built-in request retry logic or exponential backoff — failed requests fail immediately without retry

Response body is returned as raw text or JSON; no automatic schema validation or type coercion

Cookie and session management is automatic but opaque — no direct access to cookie jar or session state inspection

What makes it unique

vs alternatives

browser-console-monitoring-and-logging

Medium confidence

Solves for

Best for

QA engineers debugging flaky automation tests by inspecting console errors

LLM agents that need to detect JavaScript errors and decide whether to retry or abort

Full-stack testing frameworks that want to correlate browser console logs with API responses

Requires

Active Playwright page context

Console logging must be enabled in the browser (default behavior)

Limitations

Console messages are buffered in memory only — no persistence across page reloads or browser restarts

Large volumes of console messages (>10k per page) may cause memory bloat; no automatic message pruning or rotation

Stack traces are captured only if the browser includes them in the console message — source maps are not resolved

What makes it unique

vs alternatives

action-recording-and-codegen-session-management

Medium confidence

Solves for

Best for

Non-technical QA engineers who want to record tests without writing code

Test automation engineers who want to generate boilerplate test code and then refine it

LLM agents that need to generate executable test scripts from recorded interactions

Requires

Active Playwright page context

Browser tools must be invoked while codegen session is active (start_codegen_session called first)

Limitations

Generated code is basic boilerplate — no assertions, error handling, or page object patterns; requires manual refinement

Recording captures only tool invocations, not the reasoning or intent behind actions — generated code lacks comments explaining why actions were taken

Complex interactions (multi-step drag-and-drop, keyboard shortcuts, file uploads) may not record accurately or may generate incorrect code

What makes it unique

vs alternatives

mcp-protocol-tool-dispatch-and-request-handling

Medium confidence

Solves for

Best for

LLM application developers building agents that need browser automation (Claude Desktop, Cline, Cursor)

Teams standardizing on MCP for tool integration across multiple AI clients

Developers building custom MCP clients that need Playwright automation capabilities

Requires

Node.js 18+

MCP SDK for Node.js (@modelcontextprotocol/sdk)

MCP client that supports STDIO transport (Claude Desktop, Cline, Cursor, or custom client)

Limitations

STDIO transport is synchronous — tool invocations block until completion; no async/await support for long-running operations

Tool schema is static — no dynamic tool registration or runtime schema updates; adding new tools requires server restart

Error handling is tool-level only — no built-in retry logic, circuit breakers, or graceful degradation if tools fail

What makes it unique

vs alternatives

element-wait-and-visibility-polling

Medium confidence

Solves for

Best for

LLM agents automating AJAX-heavy or single-page applications (SPAs) with dynamic content

QA engineers writing robust tests that don't rely on hardcoded sleep() delays

Web scraping agents that need to wait for JavaScript-rendered content before extraction

Requires

Active Playwright page context

Valid CSS selector, XPath, or Playwright locator string

Timeout value in milliseconds (default 30000)

Limitations

Timeout is fixed per tool invocation — no adaptive timeout based on page load time or network conditions

Polling is element-level only — no support for waiting for multiple elements or complex conditions (e.g., 'wait for element A OR element B')

No visibility threshold configuration — element is considered visible if any part is in viewport; no support for 'wait for 80% visible'

What makes it unique

vs alternatives

keyboard-and-mouse-event-simulation

Medium confidence

Solves for

Best for

LLM agents automating complex form interactions with keyboard shortcuts and special input

QA engineers testing keyboard accessibility and focus management

Web scraping agents that need to interact with drag-and-drop interfaces

Requires

Active Playwright page context

Valid element selector for keyboard/mouse target

For drag: source and target element selectors or coordinates

Limitations

Keyboard input is character-by-character — no support for complex input methods (IME, voice input, paste from clipboard)

Modifier key combinations are limited to Shift, Ctrl, Alt, Meta — no support for custom key sequences or macros

Drag-and-drop is limited to Playwright's drag() API — no support for multi-touch gestures, pinch, or swipe

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to mcp-playwright

IntelliCode50Extension

AI-assisted development

Compare →

GitHub Copilot Chat53Extension

AI chat features powered by Copilot

Compare →

GitHub Copilot52Extension

Your AI pair programmer

Compare →

Claude Code for VS Code52Extension

Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE

Compare →

mcp-playwright

Capabilities14 decomposed

stateful-browser-automation-via-mcp

dom-interaction-via-playwright-selectors

form-interaction-and-select-dropdown-handling

page-context-and-frame-switching

response-validation-and-assertion-tools

screenshot-and-pdf-export-with-viewport-control

page-content-extraction-and-screenshot-capture

browser-navigation-and-history-control

rest-api-testing-with-request-context

browser-console-monitoring-and-logging

action-recording-and-codegen-session-management

mcp-protocol-tool-dispatch-and-request-handling

element-wait-and-visibility-polling

keyboard-and-mouse-event-simulation

Related Artifactssharing capabilities

@executeautomation/playwright-mcp-server

@executeautomation/playwright-mcp-server

playwright-mcp

onestep-puppeteer-mcp-server

bb-browser

@hisma/server-puppeteer

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to mcp-playwright

Are you the builder of mcp-playwright?

Get the weekly brief

Data Sources

mcp-playwright

Capabilities14 decomposed

stateful-browser-automation-via-mcp

dom-interaction-via-playwright-selectors

form-interaction-and-select-dropdown-handling

page-context-and-frame-switching

response-validation-and-assertion-tools

screenshot-and-pdf-export-with-viewport-control

page-content-extraction-and-screenshot-capture

browser-navigation-and-history-control

rest-api-testing-with-request-context

browser-console-monitoring-and-logging

action-recording-and-codegen-session-management

mcp-protocol-tool-dispatch-and-request-handling

element-wait-and-visibility-polling

keyboard-and-mouse-event-simulation

Related Artifactssharing capabilities

@executeautomation/playwright-mcp-server

@executeautomation/playwright-mcp-server

playwright-mcp

onestep-puppeteer-mcp-server

bb-browser

@hisma/server-puppeteer

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to mcp-playwright

Are you the builder of mcp-playwright?

Get the weekly brief

Data Sources