retrieval-augmented generation (rag) with pluggable document stores and retrievers
LangChain provides a Retriever abstraction that enables RAG by connecting LLMs to external knowledge sources. The framework supports multiple retrieval strategies: vector similarity search (via VectorStore), BM25 keyword search, hybrid search, and custom retrievers. Documents are chunked, embedded, and stored in vector databases (Pinecone, Weaviate, Chroma, FAISS, etc.). The RetrievalQA chain automatically retrieves relevant documents and passes them as context to the LLM. This enables LLMs to answer questions grounded in custom data without fine-tuning.
Unique: Provides a unified Retriever interface that abstracts different retrieval strategies (vector, keyword, hybrid, custom) and integrates seamlessly with LLM chains via RetrievalQA. Includes built-in document loaders for 50+ formats (PDF, HTML, Markdown, code files) and automatic chunking strategies, reducing boilerplate for document ingestion.
vs alternatives: More integrated than building RAG from scratch because document loading, chunking, embedding, and retrieval are unified in one framework; more flexible than specialized RAG platforms (Pinecone, Weaviate) because it supports multiple vector stores and custom retrieval logic.