Document Indexing And Full Text Search With Keyword Matching

1

LanceDBPlatform58/100

via “hybrid search combining vector and full-text retrieval”

Serverless embedded vector DB — Lance format, multimodal, versioning, no server needed.

Unique: Integrates full-text and vector search at the storage layer using Lance's columnar format, avoiding separate indices and enabling single-pass retrieval; combines both modalities without requiring external search engines like Elasticsearch

vs others: Simpler than Elasticsearch + vector plugin because both search modes share the same columnar storage, but less mature than Pinecone's hybrid search in terms of tuning options and performance optimization

2

mongodb-mcp-serverMCP Server58/100

via “text search and full-text indexing”

MongoDB Model Context Protocol Server

Unique: Integrates MongoDB's native text search indexes with MCP tools, enabling LLM clients to perform full-text queries without understanding MongoDB's $text operator syntax

vs others: Provides database-native text search (faster than application-level filtering) compared to vector-based semantic search, but lacks semantic understanding — best for keyword-based retrieval

3

GlaspExtension56/100

via “full-text-search-across-highlights”

Social web highlighter with AI summarization.

Unique: Implements full-text search with relevance ranking and metadata filtering, indexing highlight text and source metadata to enable fast retrieval across large libraries. Uses a search backend (likely Elasticsearch) to support boolean operators and phrase matching in paid tiers.

vs others: More powerful than browser-based search (Ctrl+F) because it searches across all highlights and sources, not just the current page. More accessible than building a custom search index because search is built-in and requires no configuration.

4

MeilisearchRepository55/100

via “typo-tolerant full-text search with inverted indexes”

Lightning-fast search engine with vector search.

Unique: Uses word_pair_proximity_docids indexes to track word adjacency during indexing, enabling proximity-aware ranking without post-search filtering. Charabia tokenization handles typo tolerance at index time rather than query time, avoiding expensive edit-distance calculations on every search.

vs others: Faster than Elasticsearch for typo-tolerant search because proximity indexes are pre-computed at index time rather than calculated at query time; simpler to deploy than Solr because it's a single Rust binary with no JVM overhead.

5

TurbopufferProduct54/100

via “bm25 full-text search with metadata filtering”

Low-cost vector database — pay-per-query, S3-backed, up to 10x cheaper at scale.

Unique: Integrates BM25 full-text search as a first-class capability alongside vector search within the same API, enabling hybrid search queries that combine both ranking signals without requiring separate search infrastructure or post-processing to merge results

vs others: Simpler than maintaining separate Elasticsearch/Meilisearch instances for keyword search because full-text and vector search are unified in a single API with shared namespace isolation and S3 storage

6

RediSearchMCP Server53/100

via “full-text search with boolean operators and phrase matching”

A query and indexing engine for Redis, providing secondary indexing, full-text search, vector similarity search and aggregations.

Unique: Uses a trie-based term dictionary with incremental indexing via Redis keyspace notifications (src/redis_index.c), enabling real-time index updates without batch reindexing, unlike traditional search engines that require explicit commit/refresh cycles

vs others: Faster than Elasticsearch for sub-million-document workloads because it avoids network round-trips and leverages Redis' in-memory architecture; simpler operational model than Solr with no separate JVM process

7

oramaFramework51/100

via “full-text search with typo tolerance and linguistic normalization”

🌌 A complete search engine and RAG pipeline in your browser, server or edge network with support for full-text, vector, and hybrid search in less than 2kb.

Unique: Uses a hybrid radix tree + AVL tree architecture for term indexing combined with Levenshtein distance for typo tolerance, all compiled to <2kb core, whereas most full-text engines either sacrifice typo tolerance or require external services. Supports 12+ languages with built-in stemmers without external NLP dependencies.

vs others: Significantly smaller bundle footprint than Lunr.js or MiniSearch while offering better multilingual support and typo tolerance; runs entirely in-browser or edge without backend infrastructure unlike Elasticsearch or Algolia.

8

memvidAgent50/100

via “full-text lexical search with inverted indexing”

Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

Unique: Embeds an inverted index directly in the .mv2 file alongside vector indexes, enabling hybrid lexical+semantic search without external search infrastructure. The append-only design allows incremental index updates as new Smart Frames are added.

vs others: More lightweight and portable than Elasticsearch or Solr for agents that need both keyword and semantic search, since the entire index is self-contained in a single file with no separate infrastructure.

9

serverRepository47/100

via “full-text search indexing and query execution”

MariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry.

Unique: Implements FTS via auxiliary tables (FTS_*_INDEX_*) that store the inverted index separately from the main table, enabling incremental updates without modifying the main table structure. Supports both boolean and natural language search modes with configurable stop words and minimum word length.

vs others: Simpler than Elasticsearch (no distributed indexing, no real-time updates) but faster for small-to-medium datasets; more integrated than external search engines but less feature-rich

10

llm-appTemplate42/100

via “hybrid vector and keyword indexing with efficient similarity search”

Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. 🐳Docker-friendly.⚡Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

Unique: Implements hybrid search through a unified query interface that abstracts over multiple index types, allowing dynamic selection of retrieval strategy (pure vector, pure keyword, or combined) at query time without re-indexing. Supports metadata filtering as a first-class retrieval primitive alongside similarity scoring.

vs others: More flexible than vector-only systems (Pinecone, Weaviate) for exact matching use cases; simpler than building separate keyword and vector pipelines. Pathway's configuration-driven approach enables switching retrieval strategies without code changes.

11

OSS AI agent that indexes and searches the Epstein filesAgent42/100

via “full-text document indexing with semantic embeddings”

Hi HN,I built an open-source AI agent that has already indexed and can search the entire Epstein files, roughly 100M words of publicly released documents.The goal was simple: make a large, messy corpus of PDFs and text files immediately searchable in a precise way, without relying on keyword search

Unique: Combines full-text and semantic search in a single index specifically optimized for investigative document corpora, likely using chunk-aware retrieval that preserves document context and metadata lineage

vs others: More comprehensive than keyword-only search (e.g., Elasticsearch) and faster than pure semantic search because hybrid approach filters with keywords before expensive vector similarity

12

meilisearchAPI42/100

via “hybrid keyword-semantic search with weighted fusion”

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

Unique: Uses weighted fusion of separate inverted indexes (for keyword) and arroy vector stores (for semantic) with configurable semanticRatio parameter, enabling per-index tuning of keyword vs. semantic weight without requiring external ranking services or re-indexing

vs others: Faster than Elasticsearch's hybrid search because Meilisearch's Rust-based milli engine pre-computes both index types at ingest time rather than computing similarity scores at query time, achieving sub-50ms latency on large datasets

13

infinityProduct39/100

via “sparse-vector-bm25-full-text-search”

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.

Unique: Integrates BM25 ranking directly into the database engine alongside vector search, enabling single-query hybrid retrieval without separate Elasticsearch/Solr instances; uses C++20 modules for compile-time inverted index structure optimization.

vs others: More integrated than Elasticsearch + Pinecone stacks because both search types share transaction semantics and metadata; faster than Milvus for text-heavy workloads due to native BM25 implementation vs. plugin-based approaches.

14

oceanbaseProduct36/100

via “full-text search indexing and query execution”

The Fastest Distributed Database for Transactional, Analytical, and AI Workloads.

Unique: Implements full-text indexing as a native storage engine feature rather than a separate service, allowing full-text predicates to be pushed down into the query optimizer and executed alongside other filters

vs others: Faster than Elasticsearch for small-to-medium datasets because indexes are co-located with data; simpler than Lucene because it integrates directly with SQL

15

Pocketbase Document ExtractorMCP Server35/100

via “search and retrieval of documents”

Extract content from Microsoft Learn and GitHub URLs and store it in PocketBase for easy retrieval and search. Manage documents with tools for extraction, listing, searching, retrieval, and deletion. Benefit from real-time server statistics, dynamic tool management, and multi-transport support inclu

Unique: Leverages PocketBase's native querying capabilities to provide fast and efficient search results, allowing for both keyword and structured searches.

vs others: More efficient than manual search implementations, as it utilizes built-in indexing and querying features of PocketBase.

16

Zettelkasten Knowledge Management ServerMCP Server34/100

via “advanced search capabilities”

Manage and explore atomic notes using the Zettelkasten methodology through an MCP-compatible interface. Create, link, search, and synthesize notes with AI assistance to build a rich, interconnected knowledge graph. Enhance your knowledge workflow with bidirectional linking, tagging, and markdown-bas

Unique: Utilizes a full-text search engine specifically tuned for markdown notes, improving retrieval speed and relevance.

vs others: Faster and more relevant than traditional file-based search methods due to its optimization for note structure.

17

taladbRepository33/100

via “multi-field full-text search with configurable tokenization”

Local-first document and vector database for React, React Native, and Node.js

Unique: Provides configurable tokenization and field-specific boosting in a local full-text search engine, whereas browser-native search APIs (Ctrl+F) lack relevance ranking and field weighting

vs others: Eliminates Elasticsearch dependency for basic full-text search with simpler API, though with lower performance on very large corpora (>1M documents)

18

ChromaMCP Server32/100

via “full-text search with bm25 ranking”

** - Embeddings, vector search, document storage, and full-text search with the open-source AI application database

Unique: Chroma integrates BM25 search directly into the same collection API as vector search, allowing developers to query both modalities from a single interface without switching between systems or managing separate indices

vs others: More lightweight than Elasticsearch for simple keyword search while maintaining compatibility with semantic search in the same codebase, reducing operational complexity for small-to-medium applications

19

@convex-dev/ragRepository32/100

via “metadata filtering and hybrid search (semantic + keyword)”

A rag component for Convex.

Unique: Performs metadata filtering within Convex's query engine before similarity computation, reducing the number of documents to score and enabling efficient combination of structured filtering with semantic ranking in a single database query

vs others: More integrated than Elasticsearch hybrid search (no separate index), but less flexible than Pinecone's metadata filtering for complex boolean queries on high-cardinality fields

20

pdf-readerMCP Server31/100

via “keyword search within pdfs”

Read entire PDFs or specific pages on demand. Search documents for keywords and jump to relevant passages. Retrieve metadata to quickly understand document properties.

Unique: Integrates a custom indexing engine that allows for real-time search results as the user types, enhancing user experience over traditional search methods.

vs others: Faster and more responsive than static search implementations because it indexes text dynamically.

Top Matches

Also Known As

Company