RAG_Techniques
ModelFreeThis repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.
Capabilities16 decomposed
foundational-rag-pipeline-implementation
Medium confidenceImplements a standard RAG pipeline architecture with document ingestion, embedding generation, vector storage, semantic retrieval, and LLM-based generation. Uses a modular pattern where each stage (chunking, embedding, retrieval, generation) is independently configurable, allowing developers to swap components (e.g., different embedding models, vector databases, LLM providers) without rewriting the pipeline. The architecture follows a consistent interface across 40+ technique implementations, enabling pedagogical progression from simple RAG to advanced variants.
Provides a unified pedagogical pipeline architecture that all 40+ techniques build upon, with dual-framework implementations (LangChain and LlamaIndex) showing how the same logical pipeline maps to different frameworks, enabling developers to understand RAG concepts independent of framework choice
More comprehensive than single-technique tutorials because it shows the complete pipeline context and how techniques compose, whereas most RAG guides focus on isolated techniques without showing integration points
semantic-chunking-with-size-optimization
Medium confidenceImplements intelligent document chunking strategies that go beyond fixed-size splitting by using semantic boundaries (sentence/paragraph breaks, code blocks) and configurable chunk size optimization. The technique analyzes document structure to preserve semantic coherence while optimizing for embedding model context windows and retrieval performance. Includes methods to test different chunk sizes against a query workload to empirically determine optimal chunk dimensions, with metrics tracking retrieval quality vs. computational cost tradeoffs.
Combines semantic boundary detection with empirical chunk size optimization through query-based testing, rather than just providing fixed-size or rule-based chunking — developers can run A/B tests on chunk sizes against their actual query patterns to find optimal configurations
More sophisticated than LangChain's basic text splitter because it preserves semantic structure and includes optimization methodology, whereas most RAG tutorials use fixed chunk sizes without justification or testing
self-correcting-rag-with-answer-validation
Medium confidenceImplements Self-RAG and Corrective RAG (CRAG) techniques where the system generates answers, then validates them against retrieved context and self-corrects if validation fails. The system uses learned or rule-based validators to assess whether generated answers are supported by retrieved context, and if validation fails, triggers retrieval refinement (new queries, different retrieval strategies) and regeneration. This approach creates a feedback loop within the generation process, enabling the system to detect and correct hallucinations or unsupported claims without requiring external feedback.
Implements Self-RAG and CRAG techniques that validate generated answers against retrieved context and trigger self-correction (re-retrieval and regeneration) if validation fails, creating an internal feedback loop that detects and corrects hallucinations without external validators
More proactive than post-hoc fact-checking because it validates during generation and corrects immediately, and more practical than requiring external validators because it uses the LLM itself for validation
multi-modal-rag-with-image-and-text
Medium confidenceExtends RAG to handle multi-modal documents containing both text and images by using multi-modal embedding models that encode images and text into a shared embedding space, enabling retrieval across modalities. The system processes images (extracting text via OCR, generating captions, or using vision models) and text separately, embeds them into a unified space, and retrieves relevant content regardless of modality. This approach enables queries to find relevant images when asking text questions and vice versa, supporting richer document understanding.
Implements multi-modal RAG using shared embedding spaces for text and images, enabling cross-modal retrieval where text queries find images and image queries find text — a unified approach that treats modalities symmetrically
More comprehensive than text-only RAG because it handles visual content, and more practical than separate text and image pipelines because it uses unified embeddings for symmetric cross-modal retrieval
rag-evaluation-with-deepeval-framework
Medium confidenceProvides a comprehensive evaluation framework (DeepEval) for assessing RAG system quality across multiple dimensions: retrieval quality (precision, recall, NDCG), answer quality (faithfulness, relevance, coherence), and end-to-end performance. The framework includes pre-built metrics, dataset management, and evaluation pipelines that can be integrated into development workflows. Developers can define evaluation criteria, run automated evaluations against test datasets, and track metrics over time to monitor RAG system quality and detect regressions.
Provides an integrated evaluation framework (DeepEval) with pre-built metrics for retrieval quality, answer quality, and end-to-end performance, enabling systematic RAG evaluation without building custom evaluation pipelines — a comprehensive approach to RAG quality assurance
More comprehensive than ad-hoc evaluation because it provides standardized metrics and automated evaluation pipelines, and more practical than building custom evaluators because it includes pre-built metrics for common RAG quality dimensions
rag-benchmarking-with-test-datasets
Medium confidenceProvides standardized benchmark datasets and evaluation protocols for comparing RAG techniques and implementations. The repository includes curated test datasets with queries, expected answers, and ground-truth retrieved documents, enabling developers to benchmark their RAG systems against known baselines. Benchmarks cover different domains (general knowledge, technical documentation, research papers) and query types (factual, conceptual, reasoning), allowing developers to assess RAG performance across diverse scenarios and compare their implementations against published baselines.
Provides curated benchmark datasets with ground-truth annotations for standardized RAG evaluation, enabling developers to compare implementations against known baselines and across different domains/query types — a structured approach to RAG benchmarking
More rigorous than ad-hoc testing because it uses standardized datasets and protocols, and more practical than building custom benchmarks because datasets are pre-curated with ground truth
dual-framework-implementation-with-langchain-and-llamaindex
Medium confidenceProvides parallel implementations of all RAG techniques using both LangChain and LlamaIndex frameworks, showing how the same logical RAG concepts map to different framework abstractions. Each technique has implementations in both frameworks, allowing developers to understand RAG architecture independent of framework choice and to compare framework approaches. This dual-implementation strategy helps developers make informed framework choices and understand how to port RAG implementations between frameworks.
Provides parallel implementations of all 40+ RAG techniques in both LangChain and LlamaIndex, showing how the same logical RAG architecture maps to different framework abstractions — a framework-agnostic approach to RAG education
More educational than single-framework tutorials because it shows framework-independent RAG concepts, and more practical than framework-specific guides because it enables developers to choose frameworks based on understanding rather than framework lock-in
production-ready-runnable-scripts-for-rag-techniques
Medium confidenceProvides standalone, executable Python scripts for each RAG technique that can be run immediately without modification (with API keys configured). Scripts include all necessary imports, configuration, and error handling, demonstrating production-ready patterns. Each script is self-contained and can serve as a template for implementing the technique in production systems. Scripts include examples with real data, showing end-to-end execution from document loading through answer generation.
Provides standalone, immediately-executable Python scripts for each RAG technique with all necessary configuration and error handling, serving as production-ready templates rather than just educational notebooks — a practical approach to RAG implementation
More practical than notebooks because scripts are immediately runnable and production-oriented, and more complete than code snippets because they include full implementations with error handling and configuration
query-transformation-and-enhancement
Medium confidenceImplements multiple query transformation techniques (query rewriting, expansion, decomposition) that improve retrieval by reformulating user queries into forms more likely to match relevant documents. Techniques include HyDE (Hypothetical Document Embeddings) which generates synthetic relevant documents from queries, HyPE which generates hypothetical passages, and multi-query expansion that creates semantically similar query variants. Each transformation is applied before retrieval to increase the likelihood of finding relevant chunks, with optional fusion of results from multiple query variants.
Provides implementations of HyDE and HyPE techniques that use LLMs to generate synthetic documents or passages from queries, improving retrieval without modifying the embedding model or document index — a novel approach compared to traditional query expansion
More effective than simple query expansion (synonyms, stemming) because it uses LLM understanding to generate contextually relevant synthetic documents, whereas traditional methods rely on lexical similarity
contextual-chunk-enrichment-with-headers
Medium confidenceEnhances retrieved chunks with contextual metadata by automatically generating or extracting chunk headers, parent document context, and hierarchical position information. When a chunk is retrieved, the system includes its semantic context (what section of the document it belongs to, what the surrounding chunks discuss) alongside the chunk content itself. This enrichment happens during indexing (headers are computed and stored with chunks) and retrieval (context is appended to retrieved chunks before passing to the LLM), improving the LLM's ability to understand chunk meaning without requiring larger context windows.
Automatically enriches chunks with hierarchical context and semantic headers during indexing, allowing the LLM to understand chunk meaning from context rather than requiring larger chunks or longer context windows — a preprocessing approach rather than prompt-engineering
More efficient than increasing chunk size because it preserves semantic context without proportionally increasing embedding costs or context window usage, whereas naive approaches just make chunks larger
fusion-retrieval-with-multi-strategy-ranking
Medium confidenceCombines results from multiple retrieval strategies (dense semantic search, sparse BM25 keyword search, hypothetical document embeddings) using fusion algorithms (Reciprocal Rank Fusion, weighted scoring) to produce a unified ranked result set. Each retrieval strategy is executed independently, then results are merged using configurable fusion methods that balance semantic relevance (from dense retrieval) with keyword matching (from sparse retrieval). This approach captures both semantic and lexical relevance without requiring a single unified index.
Implements Reciprocal Rank Fusion and weighted scoring to combine dense semantic retrieval with sparse keyword retrieval, allowing developers to balance semantic understanding with exact-match precision without choosing one strategy — a hybrid approach that's more robust than single-strategy retrieval
More comprehensive than pure semantic search because it captures both meaning and keywords, and more practical than pure BM25 because it includes semantic understanding; fusion is more maintainable than building a custom unified ranking function
intelligent-reranking-with-cross-encoders
Medium confidenceImplements a two-stage retrieval pipeline where an initial retriever (fast, approximate) returns candidate chunks, then a cross-encoder reranker (slower, more accurate) scores and reorders results based on query-document relevance. The reranker uses transformer models that jointly encode the query and document to compute relevance scores, providing more accurate ranking than embedding-based similarity. This approach maintains retrieval speed (initial retrieval is still fast) while improving result quality through expensive but accurate reranking on a smaller candidate set.
Implements a two-stage retrieval pipeline with cross-encoder reranking that jointly encodes query-document pairs for more accurate relevance scoring than embedding similarity, allowing developers to use expensive but accurate models on a small candidate set rather than all documents
More accurate than single-stage embedding-based retrieval because cross-encoders directly model query-document relevance, but more efficient than applying cross-encoders to all documents because reranking only operates on initial retrieval candidates
hierarchical-index-construction-and-traversal
Medium confidenceBuilds multi-level document indices where documents are recursively summarized into hierarchies (leaf chunks → summaries → higher-level summaries) and retrieval traverses this hierarchy top-down. The system first retrieves relevant high-level summaries, then recursively retrieves more detailed chunks from relevant branches, reducing the number of embeddings needed and improving retrieval efficiency. This approach is particularly effective for large document collections where flat indices become inefficient, enabling both faster retrieval and better handling of documents with varying levels of detail.
Implements recursive document summarization to build multi-level hierarchies that enable top-down retrieval traversal, reducing embedding computations and improving efficiency for large collections — a structural approach to retrieval efficiency rather than algorithmic optimization
More efficient than flat indices for large collections because it reduces embeddings computed per query, and more effective than simple filtering because it uses semantic hierarchies rather than metadata-based pruning
adaptive-retrieval-with-query-routing
Medium confidenceImplements dynamic retrieval strategies that adapt based on query characteristics, routing different query types to different retrieval methods. The system analyzes incoming queries to determine optimal retrieval strategy (e.g., simple keyword search for factual lookups, semantic search for conceptual questions, graph-based retrieval for relationship queries) and applies the appropriate method. This routing can be rule-based (query classification) or learned (trained classifier), enabling the system to use the most efficient and effective retrieval method for each query type without requiring all queries to use the same strategy.
Implements query-aware routing that dynamically selects retrieval strategies based on query characteristics, allowing different query types to use optimized methods rather than forcing all queries through a single pipeline — an adaptive approach that improves both efficiency and quality
More efficient than applying all retrieval strategies to every query (fusion) because it selects the most appropriate strategy, and more effective than single-strategy systems because it adapts to query type
retrieval-with-feedback-loops-and-iteration
Medium confidenceImplements iterative retrieval where initial retrieval results are evaluated, and based on evaluation (relevance feedback, answer quality assessment), the system refines queries or retrieval parameters and retrieves again. The feedback loop can be explicit (user indicates whether results are relevant) or implicit (system evaluates answer quality and decides whether to retrieve more context). This approach enables the system to improve results through iteration without requiring perfect initial retrieval, particularly useful for complex queries that may need multiple retrieval rounds to gather sufficient context.
Implements explicit feedback loops where retrieval results are evaluated and used to trigger query refinement and re-retrieval, enabling iterative improvement without requiring perfect initial retrieval — a feedback-driven approach that's more robust for complex queries
More effective for complex queries than single-shot retrieval because it allows refinement based on intermediate results, and more practical than requiring users to formulate perfect queries upfront
graph-based-rag-with-knowledge-graphs
Medium confidenceImplements RAG using knowledge graphs (GraphRAG, RAPTOR) where documents are converted into structured knowledge graphs with entities and relationships, and retrieval operates on graph structure rather than flat chunks. The system extracts entities and relationships from documents, builds a graph index, and retrieves relevant subgraphs based on query entities and relationship patterns. This approach enables relationship-aware retrieval (finding documents about related entities) and supports complex queries that depend on understanding connections between concepts, not just individual chunks.
Converts documents into structured knowledge graphs with entities and relationships, enabling retrieval based on graph structure and relationship patterns rather than text similarity — a structural approach that captures semantic relationships explicitly
More effective for relationship-dependent queries than text-based retrieval because it explicitly models connections between entities, and more scalable than storing full documents because it stores compressed graph representations
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with RAG_Techniques, ranked by overlap. Discovered automatically through the match graph.
postgresml
Postgres with GPUs for ML/AI apps.
AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
LlamaIndex
A data framework for building LLM applications over external data.
Crawl4AI
AI-optimized web crawler — clean markdown extraction, JS rendering, structured output for RAG.
LlamaParse
Document parsing API — complex PDFs with tables and charts to structured markdown for RAG.
@memberjunction/ai-vectordb
MemberJunction: AI Vector Database Module
Best For
- ✓developers building their first RAG system
- ✓teams evaluating RAG frameworks and needing architectural reference
- ✓researchers prototyping new RAG techniques within a standardized pipeline
- ✓teams optimizing RAG retrieval quality and cost
- ✓developers working with domain-specific documents (code, legal, medical) where semantic boundaries matter
- ✓practitioners tuning RAG systems for production deployment
- ✓high-stakes applications where answer correctness is critical
- ✓systems where hallucination detection is important
Known Limitations
- ⚠Pipeline assumes synchronous processing — no built-in support for streaming or async document ingestion at scale
- ⚠Standard pipeline doesn't handle multi-modal documents natively; multi-modal RAG is a separate technique
- ⚠No built-in persistence layer — requires external vector database and document store configuration
- ⚠Semantic chunking adds preprocessing latency (typically 2-5x slower than fixed-size splitting) due to boundary detection
- ⚠Optimal chunk size is workload-dependent — no universal best size; requires empirical testing with your specific queries
- ⚠Doesn't handle overlapping chunks natively; overlap must be implemented as a separate post-processing step
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Last commit: Apr 15, 2026
About
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.
Categories
Alternatives to RAG_Techniques
Are you the builder of RAG_Techniques?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →