Full Text And Semantic Hybrid Search

1

WeaviatePlatform77/100

via “hybrid-search-vector-keyword-fusion”

Open-source vector DB — built-in vectorizers, hybrid search, GraphQL API, multi-tenancy.

Unique: Implements explicit alpha-weighted fusion of vector and keyword scores (not just re-ranking), allowing fine-grained control over semantic vs. lexical matching; built-in to the database layer rather than requiring post-processing

vs others: More transparent and tunable than Elasticsearch's hybrid search (which uses internal scoring), and simpler to implement than Pinecone's keyword filtering which requires separate keyword index management

2

Pinecone MCP ServerMCP Server64/100

via “sparse-dense-hybrid-vector-search”

Manage Pinecone vector indexes and similarity searches via MCP.

Unique: Official Pinecone MCP server exposes hybrid search as a first-class capability with native sparse-dense vector support, avoiding the need for custom score combination logic in agents. Integrates sparse and dense search seamlessly through unified MCP interface.

vs others: More effective than dense-only search for keyword-heavy queries because it preserves exact term matching; simpler than maintaining separate keyword and semantic indexes because Pinecone handles dual indexing internally.

3

LanceDBPlatform59/100

via “hybrid search combining vector and full-text retrieval”

Serverless embedded vector DB — Lance format, multimodal, versioning, no server needed.

Unique: Integrates full-text and vector search at the storage layer using Lance's columnar format, avoiding separate indices and enabling single-pass retrieval; combines both modalities without requiring external search engines like Elasticsearch

vs others: Simpler than Elasticsearch + vector plugin because both search modes share the same columnar storage, but less mature than Pinecone's hybrid search in terms of tuning options and performance optimization

4

LangChain RAG TemplateTemplate57/100

via “hybrid search combining dense and sparse retrieval”

LangChain reference RAG implementation from scratch.

Unique: Implements hybrid search by running parallel dense (vector similarity) and sparse (BM25) retrieval and merging results using configurable weighting (e.g., 0.7 * dense_score + 0.3 * sparse_score), enabling developers to tune the balance between semantic and lexical relevance.

vs others: More effective than pure semantic search for specialized vocabularies because BM25 captures exact term matches; more practical than pure keyword search because dense retrieval captures semantic relationships and synonyms that keyword search misses.

5

oramaFramework55/100

via “hybrid search combining full-text and vector results”

🌌 A complete search engine and RAG pipeline in your browser, server or edge network with support for full-text, vector, and hybrid search in less than 2kb.

Unique: Implements score normalization and weighted combination of BM25 and cosine similarity in a single unified query interface, allowing developers to tune the balance without maintaining separate search endpoints. Most vector databases treat hybrid search as an afterthought; Orama makes it a first-class citizen with configurable weighting.

vs others: Simpler API than Elasticsearch's hybrid search which requires separate queries and manual score combination; more flexible than Pinecone's hybrid search which uses fixed weighting algorithms.

6

TurbopufferProduct55/100

via “hybrid vector + full-text search with combined ranking”

Low-cost vector database — pay-per-query, S3-backed, up to 10x cheaper at scale.

Unique: Provides native hybrid search combining vector and full-text signals in a single query without requiring application-level result merging or separate API calls, with unified ranking across both modalities within the same namespace isolation model

vs others: More efficient than querying vector and full-text search separately and merging results in application code because ranking is unified server-side, reducing latency and eliminating deduplication logic

7

paraphrase-multilingual-mpnet-base-v2Model55/100

via “multilingual semantic search with vector indexing”

sentence-similarity model by undefined. 48,24,450 downloads.

Unique: Combines paraphrase-optimized embeddings with standard vector database integration patterns, enabling zero-shot multilingual search without language-specific indexing. The embedding space is trained to preserve semantic similarity across languages, allowing a single index to serve queries in any of 50+ supported languages.

vs others: Achieves 2-3x faster search latency than BM25 full-text search on multilingual corpora while maintaining 15-20% higher recall on semantic queries, and requires no language-specific tokenization or stemming

8

llmwareFramework54/100

via “semantic and hybrid retrieval with query expansion”

Unified framework for building enterprise RAG pipelines with small, specialized models

Unique: Implements query expansion at retrieval time using small specialized models (SLIM models) to inject synonyms and related concepts, improving recall without expensive reranking. Hybrid retrieval combines vector similarity with keyword matching through configurable alpha weighting, enabling both semantic and exact-match queries in a single call.

vs others: Built-in query expansion via SLIM models improves recall vs static vector-only retrieval; hybrid approach handles both semantic and keyword queries vs pure vector solutions like Pinecone; integrated with llmware's small model ecosystem for on-device expansion.

9

all-MiniLM-L6-v2Model51/100

via “semantic-text-search-with-ranking”

feature-extraction model by undefined. 32,39,437 downloads.

Unique: Combines embedding-based retrieval with similarity ranking to enable semantic search without keyword matching — the distilled BERT model is optimized for semantic similarity, making search results more relevant than BM25 for intent-based queries

vs others: More accurate than BM25 keyword search for semantic relevance; faster than cross-encoder reranking because it uses pre-computed embeddings; simpler than learning-to-rank approaches because it requires no training data

10

pg-aiguideMCP Server49/100

via “hybrid-search-semantic-and-keyword-fallback”

MCP server and Claude plugin for Postgres skills and documentation. Helps AI coding tools generate better PostgreSQL code.

Unique: Implements both semantic (pgvector cosine similarity) and keyword (BM25) search on the same documentation corpus, allowing AI models to choose the most appropriate method per query. Both methods are in-database, avoiding external search service dependencies. Results are returned in the same format, enabling easy comparison and combination.

vs others: More flexible than semantic-only or keyword-only search because it supports both approaches and allows AI models to choose. More cost-effective than external search services because both methods use in-database indexing. More effective than single-method search because it enables fallback strategies and hybrid result combination.

11

txtaiRepository48/100

via “multi-backend vector search with hybrid sparse-dense indexing”

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

Unique: Unified sparse-dense index architecture that automatically merges BM25 and neural embeddings without requiring separate systems; supports pluggable ANN backends (Faiss, Annoy, HNSW) with configurable scoring fusion strategies, enabling single-query hybrid search without external orchestration

vs others: More flexible than Pinecone or Weaviate for hybrid search because it lets you choose and swap ANN backends locally, and more integrated than Elasticsearch + separate vector DB because sparse and dense search are co-indexed and merged atomically

12

MineContextRepository46/100

via “semantic-context-retrieval-with-hybrid-search”

MineContext is your proactive context-aware AI partner（Context-Engineering+ChatGPT Pulse）

Unique: Implements hybrid search combining vector similarity with structured SQL filters, enabling queries that blend semantic relevance with temporal and categorical constraints. Supports both programmatic API and UI-based search with configurable ranking and filtering.

vs others: More powerful than vector-only search because it enables structured filtering (date range, type) combined with semantic similarity, whereas vector-only databases lack efficient categorical filtering. More intelligent than SQL-only search because it understands semantic meaning rather than just keyword matching.

13

llm-appTemplate44/100

via “hybrid vector and keyword indexing with efficient similarity search”

Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. 🐳Docker-friendly.⚡Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

Unique: Implements hybrid search through a unified query interface that abstracts over multiple index types, allowing dynamic selection of retrieval strategy (pure vector, pure keyword, or combined) at query time without re-indexing. Supports metadata filtering as a first-class retrieval primitive alongside similarity scoring.

vs others: More flexible than vector-only systems (Pinecone, Weaviate) for exact matching use cases; simpler than building separate keyword and vector pipelines. Pathway's configuration-driven approach enables switching retrieval strategies without code changes.

14

memento-mcpMCP Server43/100

via “hybrid semantic and keyword search with adaptive strategy selection”

Memento MCP: A Knowledge Graph Memory System for LLMs

Unique: Implements adaptive strategy selection that automatically routes queries to semantic or keyword search based on query characteristics, rather than requiring explicit user configuration. Combines Neo4j's vector index and full-text index capabilities in a single unified search interface.

vs others: More intelligent than single-strategy search systems; avoids the latency overhead of always running both semantic and keyword searches by adaptively selecting the optimal path.

15

meilisearchAPI43/100

via “hybrid keyword-semantic search with weighted fusion”

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

Unique: Uses weighted fusion of separate inverted indexes (for keyword) and arroy vector stores (for semantic) with configurable semanticRatio parameter, enabling per-index tuning of keyword vs. semantic weight without requiring external ranking services or re-indexing

vs others: Faster than Elasticsearch's hybrid search because Meilisearch's Rust-based milli engine pre-computes both index types at ingest time rather than computing similarity scores at query time, achieving sub-50ms latency on large datasets

16

OSS AI agent that indexes and searches the Epstein filesAgent43/100

via “full-text document indexing with semantic embeddings”

Hi HN,I built an open-source AI agent that has already indexed and can search the entire Epstein files, roughly 100M words of publicly released documents.The goal was simple: make a large, messy corpus of PDFs and text files immediately searchable in a precise way, without relying on keyword search

Unique: Combines full-text and semantic search in a single index specifically optimized for investigative document corpora, likely using chunk-aware retrieval that preserves document context and metadata lineage

vs others: More comprehensive than keyword-only search (e.g., Elasticsearch) and faster than pure semantic search because hybrid approach filters with keywords before expensive vector similarity

17

SurfSenseWeb App41/100

via “hybrid semantic and full-text search with reranking”

An open source, privacy focused alternative to NotebookLM for teams with no data limits. Join our Discord: https://discord.gg/ejRNvftDp9

Unique: Implements a true hybrid search combining vector embeddings with BM25 full-text indexing and explicit reranking, rather than relying on vector-only search. This architecture allows precise keyword matching (critical for technical documentation) while maintaining semantic understanding, with configurable scoring weights to tune the balance per use case.

vs others: More sophisticated than NotebookLM's document search (semantic-only) and more flexible than Perplexity's web search (which lacks internal document indexing); comparable to enterprise search platforms like Glean but open-source and self-hostable

18

vectraRepository39/100

via “bm25 full-text search with hybrid ranking”

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Unique: Combines BM25 and vector similarity in a single ranking framework with configurable weighting, avoiding the need for separate lexical and semantic search pipelines. Implements BM25 from scratch rather than wrapping an external library.

vs others: Simpler than Elasticsearch for hybrid search but lacks advanced features like phrase queries, stemming, and distributed indexing. Better integrated with vector search than bolting BM25 onto a pure vector database.

19

ruvectorRepository39/100

via “hybrid search combining dense and sparse retrieval”

Self-learning vector database for Node.js — hybrid search, Graph RAG, FlashAttention-3, HNSW, 50+ attention mechanisms

Unique: Implements configurable fusion strategies (RRF, weighted sum) with per-query weight tuning, whereas most vector DBs treat hybrid search as an afterthought or require external re-ranking services

vs others: More flexible than Elasticsearch's dense_vector + text search because fusion weights are tunable per query; simpler than Vespa because it doesn't require complex ranking expressions

20

infinityProduct39/100

via “hybrid-search-with-configurable-fusion”

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.

Unique: Implements hybrid search as a first-class SQL query primitive with query planner support, executing vector and BM25 searches in parallel and fusing results inside the database engine; unlike external fusion (e.g., LangChain), maintains transaction semantics and enables index-aware optimization.

vs others: More integrated than Elasticsearch + Pinecone because both search types share query planning and metadata; faster than sequential searches because vector and BM25 indices are queried in parallel within single transaction.

Top Matches

Also Known As

Company