quivr
ModelFreeOpiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
Capabilities14 decomposed
multi-format document ingestion with automatic chunking
Medium confidenceIngests diverse document types (PDF, TXT, Markdown, DOCX) through Brain.from_files() and automatically chunks content into semantically meaningful segments for vector storage. Uses configurable chunking strategies that preserve document structure while optimizing for retrieval performance. Handles file parsing, text extraction, and pre-processing in a unified pipeline before embedding.
Provides opinionated, configuration-driven document ingestion through Brain.from_files() that abstracts away format-specific parsing complexity while maintaining a unified interface across PDF, TXT, Markdown, and DOCX — eliminates need for custom file handlers in most use cases
Simpler than LangChain's document loaders because it bundles ingestion, chunking, and embedding in one call rather than requiring separate loader + splitter + embedding chains
vector embedding and storage with pluggable backends
Medium confidenceAbstracts vector storage through a configurable backend system supporting PGVector (PostgreSQL), FAISS (local), and other vector databases. Automatically generates embeddings using configured LLM endpoints and persists vectors with metadata. The Brain class manages the lifecycle of vector store initialization, document indexing, and retrieval without exposing backend-specific APIs to the user.
Implements a configuration-driven vector store abstraction that decouples embedding generation from storage backend, allowing seamless switching between PGVector and FAISS without code changes — achieved through a unified VectorStore interface that normalizes backend-specific APIs
More flexible than LangChain's vector store integrations because it treats vector storage as a first-class configurable component rather than an afterthought, enabling production teams to optimize storage independently from retrieval logic
brain persistence and state management
Medium confidenceProvides the Brain class as a stateful container for RAG operations, managing document ingestion, vector store lifecycle, conversation history, and pipeline configuration. Brain instances can be serialized and persisted to disk or external storage, enabling recovery of RAG state across application restarts. Supports both in-memory and persistent backends.
Treats Brain as a first-class stateful object that encapsulates all RAG components (documents, vectors, conversation, configuration), enabling atomic persistence and recovery — eliminates need to manage vector store, conversation history, and configuration separately
More cohesive than managing RAG state across separate components because Brain provides a unified interface for persistence, reducing complexity in production deployments
prompt templating and customization system
Medium confidenceProvides configurable prompt templates for each RAG pipeline step (query rewriting, retrieval, generation) that can be customized via configuration files or programmatically. Templates support variable substitution for query, context, and conversation history. Enables fine-tuning of LLM behavior without code changes.
Exposes prompt templates as configuration artifacts rather than hardcoding them in pipeline code, enabling non-developers to tune generation behavior through YAML without touching Python
More flexible than fixed prompts because it allows per-deployment customization, enabling teams to optimize for domain-specific language and generation quality
fastapi backend service with rest api
Medium confidenceProvides a production-ready FastAPI backend that exposes Quivr RAG capabilities through REST endpoints. Handles authentication, request validation, error handling, and response formatting. Integrates with Supabase for user management and document storage. Enables deployment of RAG as a scalable web service.
Wraps quivr-core RAG engine in a production-ready FastAPI service with built-in authentication (Supabase), request validation, and error handling — eliminates need to build custom backend infrastructure around RAG
More complete than raw FastAPI wrappers because it includes authentication, multi-user support, and document storage integration out-of-the-box
next.js frontend application with chat ui
Medium confidenceProvides a production-ready Next.js frontend application with a chat interface for interacting with RAG. Includes real-time message streaming, conversation history display, document upload, and configuration management. Integrates with the FastAPI backend and provides a reference implementation for RAG UI patterns.
Provides a complete, production-ready chat UI built with Next.js that demonstrates RAG best practices (streaming, history management, error handling) — serves as both a functional application and a reference implementation
More complete than example code because it's a fully functional application with proper error handling, styling, and UX patterns that can be deployed immediately
langgraph-orchestrated rag pipeline with multi-step workflow
Medium confidenceImplements a sophisticated RAG workflow using LangGraph that chains together four key steps: filter_history (conversation context management), rewrite (query optimization), retrieve (semantic search), and generate_rag (LLM-based answer generation). Each step is a discrete node in a directed acyclic graph, enabling conditional routing, error handling, and extensibility. The QuivrQARAGLangGraph class manages state transitions and data flow between steps.
Uses LangGraph's node-based workflow model to decompose RAG into discrete, composable steps (filter_history → rewrite → retrieve → generate_rag) rather than a monolithic function, enabling conditional routing and step-level customization while maintaining clean state management across the pipeline
More modular than simple RAG chains because LangGraph's explicit node structure allows developers to insert custom logic, conditional branching, or tool calls at any pipeline stage without rewriting the entire flow
query rewriting for improved retrieval
Medium confidenceAutomatically rewrites user queries using an LLM before retrieval to improve semantic matching and reduce ambiguity. The rewrite step in the RAG pipeline transforms natural language queries into optimized forms that better align with document content and retrieval model expectations. This step operates within the LangGraph pipeline and uses the configured LLM endpoint.
Integrates query rewriting as a first-class pipeline step in the LangGraph workflow rather than an optional post-processing layer, ensuring all queries benefit from optimization before retrieval and enabling conditional routing based on rewrite confidence
More transparent than implicit query expansion in vector databases because the rewritten query is visible and debuggable, allowing developers to understand and tune retrieval behavior
semantic search with conversation history filtering
Medium confidencePerforms semantic similarity search against the vector store to retrieve relevant document chunks, with optional filtering based on conversation history to avoid redundant or contradictory context. The retrieve step uses the rewritten query to find top-k similar chunks, and the filter_history step prunes conversation history to fit within token budgets while preserving semantic continuity. Both steps operate within the LangGraph pipeline.
Couples semantic retrieval with conversation history filtering in a single pipeline step, ensuring retrieved context is both semantically relevant AND fits within token budgets — prevents common failure mode where RAG systems retrieve perfect context but exceed LLM limits
More practical than pure semantic search because it explicitly manages conversation context size, a critical constraint in production RAG systems that other frameworks often ignore
multi-provider llm endpoint abstraction
Medium confidenceAbstracts LLM provider integrations through the LLMEndpoint class, supporting OpenAI, Anthropic Claude, Mistral, and local models via Ollama. Provides a unified interface for model inference, streaming, and function calling across providers with automatic fallback and error handling. Configuration-driven provider selection allows switching models without code changes.
Implements a unified LLMEndpoint interface that normalizes API differences across OpenAI, Anthropic, Mistral, and Ollama, enabling true provider-agnostic code — achieved through a provider factory pattern with consistent request/response schemas
More flexible than LangChain's LLM wrappers because it treats provider abstraction as a core architectural concern rather than an adapter layer, enabling seamless model switching without application-level branching logic
streaming response generation with token-by-token output
Medium confidenceProvides ask_streaming() method that returns tokens incrementally as the LLM generates them, enabling real-time response display in user interfaces. Implements streaming across the entire RAG pipeline, from query rewriting through final answer generation. Handles provider-specific streaming protocols (Server-Sent Events for OpenAI, etc.) and normalizes them into a unified token stream.
Implements streaming across the entire RAG pipeline (not just final generation), allowing progressive token output from query rewriting and retrieval steps — enables UI to show intermediate reasoning and retrieved context in real-time
More complete than basic LLM streaming because it streams the entire RAG workflow rather than just the final answer, providing users with visibility into retrieval and reasoning steps
configuration-driven rag customization via yaml workflows
Medium confidenceEnables RAG pipeline customization through YAML configuration files that define workflow steps, LLM endpoints, vector stores, and tool integrations without code changes. The configuration system parses YAML specs and instantiates the corresponding Brain and RAG pipeline components. Supports conditional routing, tool definitions, and prompt templates within the configuration layer.
Treats RAG pipeline configuration as a first-class artifact through YAML specs, enabling non-developers to customize behavior without touching code — achieved through a configuration parser that maps YAML to Brain/RAG component instantiation
More accessible than programmatic RAG configuration because YAML is human-readable and editable by non-technical users, reducing deployment friction for teams with diverse skill levels
tool integration and function calling framework
Medium confidenceProvides a framework for integrating external tools (web search, APIs, custom functions) into the RAG pipeline through a tool registry and function calling interface. Tools are defined declaratively with schemas and can be invoked by the LLM during generation or as separate pipeline steps. Includes built-in web search tools and supports custom tool definitions.
Implements a declarative tool registry that decouples tool definitions from RAG pipeline logic, allowing tools to be added/removed via configuration without code changes — supports both LLM-driven tool selection and explicit pipeline tool steps
More flexible than LangChain's tool calling because it treats tools as first-class pipeline components that can be invoked conditionally or in parallel, rather than only through LLM function calling
conversation memory management with context windowing
Medium confidenceManages multi-turn conversation state through the filter_history pipeline step, which maintains conversation history while respecting token budgets and semantic coherence. Implements heuristic-based history pruning that removes older messages while preserving recent context and key information. Conversation state is tracked in the Brain object and passed through the RAG pipeline.
Integrates conversation history management as a dedicated pipeline step rather than an afterthought, ensuring all conversations benefit from context windowing and enabling conditional routing based on history length
More explicit than implicit history truncation in LLM APIs because the pruning logic is visible and customizable, allowing teams to tune context preservation strategies for their use cases
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with quivr, ranked by overlap. Discovered automatically through the match graph.
PrivateGPT
Private document Q&A with local LLMs.
bRAG-langchain
Everything you need to know to build your own RAG application
WeKnora
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
R2R
SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.
Open WebUI
Self-hosted ChatGPT-like UI — supports Ollama/OpenAI, RAG, web search, multi-user, plugins.
quivr
Dump all your files and chat with it using your generative AI second brain using LLMs & embeddings.
Best For
- ✓Teams building knowledge bases from heterogeneous document sources
- ✓Developers integrating RAG into existing document management systems
- ✓Non-technical users uploading files without preprocessing
- ✓Teams evaluating different vector databases for production RAG
- ✓Developers building privacy-first applications requiring local-only storage
- ✓Organizations with existing PostgreSQL infrastructure wanting to leverage PGVector
- ✓Production RAG systems requiring state persistence
- ✓Multi-user systems with per-user or per-tenant RAG instances
Known Limitations
- ⚠Chunking strategy is fixed per configuration — no dynamic chunk size adjustment based on content type
- ⚠No built-in OCR for scanned PDFs; requires pre-processing for image-based documents
- ⚠Large files (>100MB) may require external streaming ingestion; in-memory processing limits apply
- ⚠Vector store abstraction adds ~50-100ms overhead per operation due to adapter layer
- ⚠No built-in vector store replication or failover; requires external orchestration
- ⚠Embedding model is fixed per Brain instance — cannot mix embeddings from different models in same store
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Jul 9, 2025
About
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
Categories
Alternatives to quivr
Are you the builder of quivr?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →