Paper Search

MCP ServerFree

Search and download academic papers from arXiv, PubMed, bioRxiv, medRxiv, Google Scholar, Semantic Scholar, and IACR. Fetch PDFs and extract full text to accelerate literature reviews. Get consistent metadata for easier filtering, citation, and analysis.

Open Source

signed passport verify →

/ 100

7 capabilities

Best for: multi-source academic paper search with unified query interface, pdf download and retrieval with source-specific handling, full-text extraction and normalization from pdfs
Type: MCP Server · Free
Score: 52/100
Best alternative: Parallel
Agent-compatible: Yes — MCP protocol

Capabilities7 decomposed

multi-source academic paper search with unified query interface

Medium confidence

Executes search queries across seven distinct academic repositories (arXiv, PubMed, bioRxiv, medRxiv, Google Scholar, Semantic Scholar, IACR) through a single MCP tool endpoint. Abstracts away source-specific API differences and query syntax variations, routing requests to appropriate backends and aggregating results into a consistent schema for downstream processing.

Solves for

search for papers across multiple academic databases without switching between interfacesfind papers on a topic and get consistent metadata regardless of sourcediscover papers from specialized repositories like IACR for cryptography researchbuild literature review tools that query multiple sources programmatically

Best for

researchers building automated literature discovery systems

LLM agents that need to autonomously search academic literature

teams building knowledge synthesis tools across multiple domains

Requires

MCP client implementation (e.g., Claude Desktop, custom MCP host)

network access to arXiv, PubMed, bioRxiv, medRxiv, Google Scholar, Semantic Scholar, and IACR APIs

API keys for sources that require authentication (Semantic Scholar API key recommended for higher rate limits)

Limitations

rate limits vary by source — some APIs (Google Scholar) have aggressive throttling requiring backoff strategies

search result quality and relevance ranking differs significantly between sources; no cross-source result normalization

some sources (Google Scholar) may require proxy rotation or session management to avoid blocking

What makes it unique

Implements a unified search abstraction layer that handles source-specific API quirks (arXiv's OAI-PMH protocol, PubMed's E-utilities, Google Scholar's anti-bot measures) within a single MCP tool, eliminating the need for clients to manage multiple search SDK integrations

vs alternatives

Broader source coverage (7 repositories) than single-source tools like arxiv-cli, and MCP integration enables direct use in Claude and other LLM agents without custom wrapper code

pdf download and retrieval with source-specific handling

Medium confidence

Fetches full-text PDFs from academic repositories using source-aware download strategies. Handles authentication, redirects, and format variations across sources (arXiv direct downloads, PubMed Central's FTP structure, bioRxiv/medRxiv preprint servers). Implements fallback chains when primary sources are unavailable, attempting alternative mirrors or formats.

Solves for

download full PDF of a paper identified through search resultsretrieve papers programmatically without manual browser navigationbuild offline literature databases by batch-downloading papershandle cases where PDFs are behind paywalls by checking open-access mirrors

Best for

automated literature review pipelines that need full-text access

researchers building local paper repositories

LLM agents that need to fetch and analyze full paper content

Requires

MCP client with file I/O capabilities

network access to PDF hosting servers

sufficient disk space for storing downloaded PDFs

Limitations

paywall-protected papers cannot be downloaded unless open-access version exists in indexed sources

PDF download success rates vary by source — some repositories have stricter access controls

no OCR fallback for image-based PDFs; text extraction depends on PDF being digitally encoded

What makes it unique

Implements source-specific download handlers that understand repository-specific access patterns (arXiv's versioning system, PubMed Central's hierarchical structure, preprint server conventions) rather than generic HTTP fetching, enabling reliable downloads across heterogeneous sources

vs alternatives

More robust than generic PDF downloaders because it handles source-specific authentication and redirect patterns; broader than single-source tools like arxiv-downloader by supporting 7 repositories with fallback chains

full-text extraction and normalization from pdfs

Medium confidence

Extracts and parses text content from downloaded PDFs into structured, normalized formats. Applies heuristics to identify paper sections (abstract, introduction, methods, results, discussion), handles multi-column layouts, and removes boilerplate (headers, footers, page numbers). Outputs clean text suitable for downstream NLP analysis, embedding generation, or LLM consumption.

Solves for

extract full paper text for semantic analysis or summarizationfeed paper content into LLM for question-answering or synthesisbuild embeddings from paper text for similarity searchparse structured sections (abstract, methods) for targeted analysis

Best for

LLM agents that need to reason over full paper content

teams building semantic search systems over academic literature

researchers analyzing paper structure or methodology patterns

Requires

PDF file (local or from download capability)

PDF parsing library (e.g., pdfplumber, PyPDF2, or similar)

sufficient memory for large PDFs (100+ page papers)

Limitations

extraction quality degrades on scanned/image-based PDFs without OCR support

section detection heuristics may fail on non-standard paper layouts (e.g., some conference proceedings)

mathematical equations and tables are extracted as text, losing semantic structure

What makes it unique

Applies domain-specific heuristics for academic paper structure (section detection, boilerplate removal) rather than generic PDF-to-text conversion, producing cleaner input for downstream NLP tasks and LLM consumption

vs alternatives

More specialized than generic PDF extractors like pdfplumber because it understands academic paper conventions; produces structured section output vs plain text, enabling targeted analysis of methodology or results

consistent metadata normalization across heterogeneous sources

Medium confidence

Transforms source-specific metadata schemas (arXiv's XML structure, PubMed's MEDLINE format, Google Scholar's HTML scraping results) into a unified JSON schema. Normalizes author names, dates, identifiers (DOI, PMID, arXiv ID), and subject classifications. Handles missing fields gracefully with fallbacks and confidence scores, enabling consistent filtering and citation generation.

Solves for

filter search results by consistent criteria (year, author, subject) regardless of sourcegenerate citations in standard formats (BibTeX, APA) from heterogeneous sourcesdeduplicate papers found across multiple sources using normalized identifiersbuild structured datasets of papers with consistent field types

Best for

teams building citation management tools

researchers aggregating papers from multiple sources into a database

LLM agents that need to reason over paper metadata consistently

Requires

raw metadata from search or download operations

mapping configuration for source-specific field extraction

Limitations

some sources provide incomplete metadata — normalized output may have null fields

author name normalization is heuristic-based and may fail on non-Latin scripts or complex names

subject classification varies by source (MeSH for PubMed, arXiv categories, ACM CCS) — no cross-source mapping

What makes it unique

Implements source-aware metadata extraction that understands each repository's data model (arXiv's category taxonomy, PubMed's MeSH indexing, Google Scholar's ranking signals) and normalizes into a unified schema with confidence scores for missing fields

vs alternatives

More robust than generic metadata extractors because it handles source-specific quirks (e.g., arXiv versioning, PubMed's PMID vs PMCID distinction); enables consistent filtering across sources vs single-source tools that expose raw metadata

mcp protocol integration for llm agent tool calling

Medium confidence

Exposes all paper search, download, and extraction capabilities as MCP tools that Claude and other LLM agents can invoke directly. Implements MCP's tool schema specification with proper input validation, error handling, and streaming support for long-running operations. Enables agents to autonomously discover, retrieve, and analyze papers without human intervention.

Solves for

enable Claude to search for papers and cite them in responsesbuild autonomous research agents that can fetch and analyze papersintegrate paper search into multi-step reasoning chains (search → download → extract → analyze)allow LLM agents to maintain context across multiple paper retrievals

Best for

developers building Claude-based research assistants

teams creating autonomous literature review agents

LLM application builders who want paper search as a native capability

Requires

MCP-compatible client (Claude Desktop, custom MCP host, or compatible LLM platform)

MCP server implementation (this artifact provides the server)

network connectivity to MCP server and academic source APIs

Limitations

MCP tool invocation adds latency (network round-trip to MCP server) vs direct library calls

tool output size is limited by MCP message size constraints — very large PDFs may need chunking

error handling depends on MCP client implementation; some clients may not handle streaming responses

What makes it unique

Implements MCP server pattern that exposes academic paper operations as first-class tools for LLM agents, enabling multi-step reasoning chains where agents autonomously search, retrieve, and analyze papers as part of larger tasks

vs alternatives

Tighter integration than REST API wrappers because it uses MCP's native tool-calling protocol, enabling Claude to invoke paper search with proper context and error handling; more composable than single-function tools by supporting chained operations

batch paper search and download with progress tracking

Medium confidence

Supports querying multiple search terms or downloading multiple papers in a single operation, with progress tracking and error recovery. Implements rate-limit awareness to avoid triggering source API throttling, uses exponential backoff for retries, and provides detailed status reporting per item. Enables efficient bulk literature discovery without manual iteration.

Solves for

search for papers across multiple related queries in one operationdownload a list of papers identified from search resultsbuild literature databases by bulk-importing papers from multiple sourcesmonitor progress of long-running batch operations

Best for

researchers conducting comprehensive literature reviews

teams building automated paper ingestion pipelines

LLM agents that need to gather papers on multiple related topics

Requires

MCP client with support for long-running operations or streaming responses

list of search queries or paper identifiers

sufficient time for batch operation to complete (may take minutes for large batches)

Limitations

rate limiting is source-specific and may cause batch operations to slow down or fail partway through

no built-in deduplication across batch results — same paper may appear multiple times if found in multiple sources

error recovery is per-item; failure on one paper doesn't prevent others from processing, but partial failures require manual retry

What makes it unique

Implements rate-limit-aware batch processing with exponential backoff and per-item error recovery, allowing efficient bulk operations across multiple sources without triggering API throttling or losing progress on partial failures

vs alternatives

More robust than naive batch loops because it handles rate limiting and retries automatically; provides progress visibility vs fire-and-forget approaches, enabling monitoring of long-running operations

source-specific search parameter mapping and query optimization

Medium confidence

Translates high-level search queries into source-specific query syntax and parameters. Maps common search fields (author, title, year range, subject) to each source's native query language (arXiv's field prefixes, PubMed's MeSH terms, Google Scholar's operators). Optimizes queries for each source's search algorithm to improve result relevance and reduce noise.

Solves for

search across sources using a unified query syntax without learning source-specific operatorsoptimize search queries for better result relevance on each sourcefilter results by structured criteria (year range, author, subject) consistently across sourcesbuild advanced search interfaces that abstract source differences

Best for

teams building unified search interfaces across multiple sources

LLM agents that need to formulate effective searches without source-specific knowledge

researchers who want consistent search behavior across repositories

Requires

search query in unified format or natural language

optional: source hints or preferences

Limitations

not all search fields are supported by all sources — some sources may ignore unsupported filters

query optimization is heuristic-based and may not produce optimal results for all query types

complex boolean queries may not translate correctly across sources with different query languages

What makes it unique

Implements source-aware query translation that understands each repository's native search syntax (arXiv field prefixes like 'cat:cs.AI', PubMed's MeSH hierarchy, Google Scholar's operators) and optimizes queries for each source's ranking algorithm

vs alternatives

More sophisticated than simple string concatenation because it translates structured search parameters into source-specific syntax; enables consistent search behavior vs exposing raw source APIs that require users to learn each source's query language

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with Paper Search, ranked by overlap. Discovered automatically through the match graph.

MCP Server50

paper-search-mcp-openai-v2

Find and download academic papers from leading sources like arXiv, PubMed, bioRxiv, medRxiv, Google Scholar, Semantic Scholar, CrossRef, and IACR. Get standardized results and fetch full-text PDFs when available. Accelerate literature reviews with deep search and effortless retrieval.

multi-source academic paper retrievalfull-text pdf fetching

2 shared capabilities

Product40

BrainyPDF

Serves as a valuable resource for students, researchers, and professionals to instantly answer questions and understand research using...

semantic-question-answering-over-pdf-documentsmulti-document-context-aggregation-for-comparative-analysis

2 shared capabilities

Agent58

Elicit

AI agent for automated systematic literature reviews.

automated-paper-metadata-and-abstract-extractionsemantic-academic-database-search-with-query-expansion

2 shared capabilities

Product39

Doclime

Revolutionize research with AI-driven search and PDF...

semantic-search-across-document-collectionspdf-text-extraction-and-indexing

2 shared capabilities

MCP Server26

scholarmcp

MCP server: scholarmcp

multi-source-academic-database-aggregation

1 shared capability

Product20

genei

Summarise academic articles in seconds and save 80% on your research times.

multi-format-document-ingestion-and-parsing

1 shared capability

Best For

✓researchers building automated literature discovery systems
✓LLM agents that need to autonomously search academic literature
✓teams building knowledge synthesis tools across multiple domains
✓automated literature review pipelines that need full-text access
✓researchers building local paper repositories
✓LLM agents that need to fetch and analyze full paper content
✓LLM agents that need to reason over full paper content
✓teams building semantic search systems over academic literature

Known Limitations

⚠rate limits vary by source — some APIs (Google Scholar) have aggressive throttling requiring backoff strategies
⚠search result quality and relevance ranking differs significantly between sources; no cross-source result normalization
⚠some sources (Google Scholar) may require proxy rotation or session management to avoid blocking
⚠no full-text search capability — limited to metadata and abstract search on most sources
⚠paywall-protected papers cannot be downloaded unless open-access version exists in indexed sources
⚠PDF download success rates vary by source — some repositories have stricter access controls

Requirements

MCP client implementation (e.g., Claude Desktop, custom MCP host)network access to arXiv, PubMed, bioRxiv, medRxiv, Google Scholar, Semantic Scholar, and IACR APIsAPI keys for sources that require authentication (Semantic Scholar API key recommended for higher rate limits)MCP client with file I/O capabilitiesnetwork access to PDF hosting serverssufficient disk space for storing downloaded PDFsoptional: local PDF storage path configurationPDF file (local or from download capability)

Input / Output

Accepts: text query string, structured search parameters (author, year range, subject area), paper identifier (DOI, arXiv ID, PubMed ID, or direct URL), source type hint to optimize download strategy, PDF file path or binary PDF content, source-specific metadata objects (XML, JSON, or parsed HTML), MCP tool invocation with JSON parameters, tool names: search_papers, download_pdf, extract_text, get_metadata, array of search query strings, array of paper identifiers (DOI, arXiv ID, etc.), optional: batch configuration (max concurrent requests, timeout per item), unified search query object with fields: keywords, author, title, year_min, year_max, subject_category, or natural language query string

Produces: JSON array of paper metadata objects, structured fields: title, authors, abstract, publication date, source, DOI, URL, binary PDF file, download status/metadata (success, source used, file size, timestamp), plain text string, structured JSON with sections (abstract, introduction, methods, results, discussion, references), metadata (page count, extraction confidence, detected language), normalized JSON object with standard schema, fields: title, authors (normalized array), publication_date, abstract, source, identifiers (DOI, PMID, arXiv_id), subject_categories, url, MCP tool result JSON, structured data (search results, extracted text, metadata), error messages with diagnostic information, array of results (one per input item), per-item status (success, error, retry count), aggregate statistics (total processed, success rate, failed items), source-specific query strings or parameter objects, mapping metadata (which fields were translated, which were dropped)

UnfragileRank

Adoption82%(25% weight)

Quality39%(25% weight)

Ecosystem59%(15% weight)

Match Graph25%(23% weight)

Freshness60%(12% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: MCP Server

7 capabilities

Visit Paper Search→

Repository Details

About

Alternatives to Paper Search

Parallel60API

Agent-native web APIs — search returning LLM-ready excerpts, deep-research tasks with calibrated evidence.

Compare →

Apify MCP Server56MCP Server

Official Apify MCP — 6,000+ scrapers/automations (Actors) callable as agent tools.

Compare →

Perplexity80API

AI search engine — direct answers with citations, Pro Search, Focus modes, research Spaces.

Compare →

GPT Researcher57Agent

Autonomous agent for comprehensive research reports.

Compare →

See all alternatives to Paper Search→

Are you the builder of Paper Search?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Continue with GitHub or claim by email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

smithery

Looking for something else?

Search →

Capabilities7 decomposed

multi-source academic paper search with unified query interface

Medium confidence

Solves for

Best for

researchers building automated literature discovery systems

LLM agents that need to autonomously search academic literature

teams building knowledge synthesis tools across multiple domains

Requires

MCP client implementation (e.g., Claude Desktop, custom MCP host)

network access to arXiv, PubMed, bioRxiv, medRxiv, Google Scholar, Semantic Scholar, and IACR APIs

API keys for sources that require authentication (Semantic Scholar API key recommended for higher rate limits)

Limitations

rate limits vary by source — some APIs (Google Scholar) have aggressive throttling requiring backoff strategies

search result quality and relevance ranking differs significantly between sources; no cross-source result normalization

some sources (Google Scholar) may require proxy rotation or session management to avoid blocking

What makes it unique

vs alternatives

Broader source coverage (7 repositories) than single-source tools like arxiv-cli, and MCP integration enables direct use in Claude and other LLM agents without custom wrapper code

pdf download and retrieval with source-specific handling

Medium confidence

Solves for

Best for

automated literature review pipelines that need full-text access

researchers building local paper repositories

LLM agents that need to fetch and analyze full paper content

Requires

MCP client with file I/O capabilities

network access to PDF hosting servers

sufficient disk space for storing downloaded PDFs

Limitations

paywall-protected papers cannot be downloaded unless open-access version exists in indexed sources

PDF download success rates vary by source — some repositories have stricter access controls

no OCR fallback for image-based PDFs; text extraction depends on PDF being digitally encoded

What makes it unique

vs alternatives

full-text extraction and normalization from pdfs

Medium confidence

Solves for

Best for

LLM agents that need to reason over full paper content

teams building semantic search systems over academic literature

researchers analyzing paper structure or methodology patterns

Requires

PDF file (local or from download capability)

PDF parsing library (e.g., pdfplumber, PyPDF2, or similar)

sufficient memory for large PDFs (100+ page papers)

Limitations

extraction quality degrades on scanned/image-based PDFs without OCR support

section detection heuristics may fail on non-standard paper layouts (e.g., some conference proceedings)

mathematical equations and tables are extracted as text, losing semantic structure

What makes it unique

vs alternatives

consistent metadata normalization across heterogeneous sources

Medium confidence

Solves for

Best for

teams building citation management tools

researchers aggregating papers from multiple sources into a database

LLM agents that need to reason over paper metadata consistently

Requires

raw metadata from search or download operations

mapping configuration for source-specific field extraction

Limitations

some sources provide incomplete metadata — normalized output may have null fields

author name normalization is heuristic-based and may fail on non-Latin scripts or complex names

subject classification varies by source (MeSH for PubMed, arXiv categories, ACM CCS) — no cross-source mapping

What makes it unique

vs alternatives

mcp protocol integration for llm agent tool calling

Medium confidence

Solves for

Best for

developers building Claude-based research assistants

teams creating autonomous literature review agents

LLM application builders who want paper search as a native capability

Requires

MCP-compatible client (Claude Desktop, custom MCP host, or compatible LLM platform)

MCP server implementation (this artifact provides the server)

network connectivity to MCP server and academic source APIs

Limitations

MCP tool invocation adds latency (network round-trip to MCP server) vs direct library calls

tool output size is limited by MCP message size constraints — very large PDFs may need chunking

error handling depends on MCP client implementation; some clients may not handle streaming responses

What makes it unique

vs alternatives

batch paper search and download with progress tracking

Medium confidence

Solves for

Best for

researchers conducting comprehensive literature reviews

teams building automated paper ingestion pipelines

LLM agents that need to gather papers on multiple related topics

Requires

MCP client with support for long-running operations or streaming responses

list of search queries or paper identifiers

sufficient time for batch operation to complete (may take minutes for large batches)

Limitations

rate limiting is source-specific and may cause batch operations to slow down or fail partway through

no built-in deduplication across batch results — same paper may appear multiple times if found in multiple sources

error recovery is per-item; failure on one paper doesn't prevent others from processing, but partial failures require manual retry

What makes it unique

vs alternatives

source-specific search parameter mapping and query optimization

Medium confidence

Solves for

Best for

teams building unified search interfaces across multiple sources

LLM agents that need to formulate effective searches without source-specific knowledge

researchers who want consistent search behavior across repositories

Requires

search query in unified format or natural language

optional: source hints or preferences

Limitations

not all search fields are supported by all sources — some sources may ignore unsupported filters

query optimization is heuristic-based and may not produce optimal results for all query types

complex boolean queries may not translate correctly across sources with different query languages

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to Paper Search

Parallel60API

Agent-native web APIs — search returning LLM-ready excerpts, deep-research tasks with calibrated evidence.

Compare →

Apify MCP Server56MCP Server

Official Apify MCP — 6,000+ scrapers/automations (Actors) callable as agent tools.

Compare →

Perplexity80API

AI search engine — direct answers with citations, Pro Search, Focus modes, research Spaces.

Paper Search

Capabilities7 decomposed

multi-source academic paper search with unified query interface

pdf download and retrieval with source-specific handling

full-text extraction and normalization from pdfs

consistent metadata normalization across heterogeneous sources

mcp protocol integration for llm agent tool calling

batch paper search and download with progress tracking

source-specific search parameter mapping and query optimization

Related Artifactssharing capabilities

paper-search-mcp-openai-v2

BrainyPDF

Elicit

Doclime

scholarmcp

genei

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to Paper Search

Are you the builder of Paper Search?

Get the weekly brief

Data Sources

Paper Search

Capabilities7 decomposed

multi-source academic paper search with unified query interface

pdf download and retrieval with source-specific handling

full-text extraction and normalization from pdfs

consistent metadata normalization across heterogeneous sources

mcp protocol integration for llm agent tool calling

batch paper search and download with progress tracking

source-specific search parameter mapping and query optimization

Related Artifactssharing capabilities

paper-search-mcp-openai-v2

BrainyPDF

Elicit

Doclime

scholarmcp

genei

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to Paper Search

Are you the builder of Paper Search?

Get the weekly brief

Data Sources