What can RAG_Techniques do?

foundational-rag-pipeline-implementation, semantic-chunking-with-size-optimization, self-correcting-rag-with-answer-validation, multi-modal-rag-with-image-and-text, rag-evaluation-with-deepeval-framework, rag-benchmarking-with-test-datasets, dual-framework-implementation-with-langchain-and-llamaindex, production-ready-runnable-scripts-for-rag-techniques, query-transformation-and-enhancement, contextual-chunk-enrichment-with-headers, fusion-retrieval-with-multi-strategy-ranking, intelligent-reranking-with-cross-encoders, hierarchical-index-construction-and-traversal, adaptive-retrieval-with-query-routing, retrieval-with-feedback-loops-and-iteration, graph-based-rag-with-knowledge-graphs

RAG_Techniques

ModelFree

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.

Open Source

/ 100

16 capabilities

Capabilities16 decomposed

foundational-rag-pipeline-implementation

Medium confidence

Implements a standard RAG pipeline architecture with document ingestion, embedding generation, vector storage, semantic retrieval, and LLM-based generation. Uses a modular pattern where each stage (chunking, embedding, retrieval, generation) is independently configurable, allowing developers to swap components (e.g., different embedding models, vector databases, LLM providers) without rewriting the pipeline. The architecture follows a consistent interface across 40+ technique implementations, enabling pedagogical progression from simple RAG to advanced variants.

Solves for

I need to understand the complete flow of a RAG system from raw documents to generated answersI want to build a RAG application but need a reference architecture that shows best practicesI need to compare how different RAG techniques fit into a standard pipeline

Best for

developers building their first RAG system

teams evaluating RAG frameworks and needing architectural reference

researchers prototyping new RAG techniques within a standardized pipeline

Requires

Python 3.8+

LangChain or LlamaIndex framework installed

Vector database (Chroma, Pinecone, Weaviate, Milvus, etc.)

Limitations

Pipeline assumes synchronous processing — no built-in support for streaming or async document ingestion at scale

Standard pipeline doesn't handle multi-modal documents natively; multi-modal RAG is a separate technique

No built-in persistence layer — requires external vector database and document store configuration

What makes it unique

Provides a unified pedagogical pipeline architecture that all 40+ techniques build upon, with dual-framework implementations (LangChain and LlamaIndex) showing how the same logical pipeline maps to different frameworks, enabling developers to understand RAG concepts independent of framework choice

vs alternatives

More comprehensive than single-technique tutorials because it shows the complete pipeline context and how techniques compose, whereas most RAG guides focus on isolated techniques without showing integration points

semantic-chunking-with-size-optimization

Medium confidence

Implements intelligent document chunking strategies that go beyond fixed-size splitting by using semantic boundaries (sentence/paragraph breaks, code blocks) and configurable chunk size optimization. The technique analyzes document structure to preserve semantic coherence while optimizing for embedding model context windows and retrieval performance. Includes methods to test different chunk sizes against a query workload to empirically determine optimal chunk dimensions, with metrics tracking retrieval quality vs. computational cost tradeoffs.

Solves for

I'm getting poor retrieval results and suspect my chunk size is wrong — how do I find the optimal size?I need to chunk documents while preserving semantic meaning, not just splitting at arbitrary boundariesI want to understand how chunk size affects both retrieval accuracy and latency in my RAG system

Best for

teams optimizing RAG retrieval quality and cost

developers working with domain-specific documents (code, legal, medical) where semantic boundaries matter

practitioners tuning RAG systems for production deployment

Requires

Python 3.8+

Document content in text or structured format

Query dataset for empirical chunk size optimization

Limitations

Semantic chunking adds preprocessing latency (typically 2-5x slower than fixed-size splitting) due to boundary detection

Optimal chunk size is workload-dependent — no universal best size; requires empirical testing with your specific queries

Doesn't handle overlapping chunks natively; overlap must be implemented as a separate post-processing step

What makes it unique

Combines semantic boundary detection with empirical chunk size optimization through query-based testing, rather than just providing fixed-size or rule-based chunking — developers can run A/B tests on chunk sizes against their actual query patterns to find optimal configurations

vs alternatives

More sophisticated than LangChain's basic text splitter because it preserves semantic structure and includes optimization methodology, whereas most RAG tutorials use fixed chunk sizes without justification or testing

self-correcting-rag-with-answer-validation

Medium confidence

Implements Self-RAG and Corrective RAG (CRAG) techniques where the system generates answers, then validates them against retrieved context and self-corrects if validation fails. The system uses learned or rule-based validators to assess whether generated answers are supported by retrieved context, and if validation fails, triggers retrieval refinement (new queries, different retrieval strategies) and regeneration. This approach creates a feedback loop within the generation process, enabling the system to detect and correct hallucinations or unsupported claims without requiring external feedback.

Solves for

I want to detect when my RAG system generates unsupported answers and automatically correct themI need to validate that generated answers are actually grounded in retrieved contextI want to implement self-correction without requiring user feedback or external validators

Best for

high-stakes applications where answer correctness is critical

systems where hallucination detection is important

applications where self-correction can improve quality without user intervention

Requires

Python 3.8+

LLM for answer generation and validation

Retriever for initial and refinement retrieval

Limitations

Self-correction adds latency (validation + potential re-retrieval and regeneration, typically 1-3s per query)

Validator quality is critical — poor validators miss hallucinations or reject valid answers

Correction strategy must be carefully designed to avoid infinite loops or excessive iterations

What makes it unique

Implements Self-RAG and CRAG techniques that validate generated answers against retrieved context and trigger self-correction (re-retrieval and regeneration) if validation fails, creating an internal feedback loop that detects and corrects hallucinations without external validators

vs alternatives

More proactive than post-hoc fact-checking because it validates during generation and corrects immediately, and more practical than requiring external validators because it uses the LLM itself for validation

multi-modal-rag-with-image-and-text

Medium confidence

Extends RAG to handle multi-modal documents containing both text and images by using multi-modal embedding models that encode images and text into a shared embedding space, enabling retrieval across modalities. The system processes images (extracting text via OCR, generating captions, or using vision models) and text separately, embeds them into a unified space, and retrieves relevant content regardless of modality. This approach enables queries to find relevant images when asking text questions and vice versa, supporting richer document understanding.

Solves for

I have documents with both text and images and need to retrieve relevant content across both modalitiesI want to find images relevant to text queries and text relevant to image queriesI need to understand documents that combine text and visual information

Best for

applications with rich media documents (technical documentation, research papers, product catalogs)

systems requiring cross-modal retrieval (find images for text queries)

teams building comprehensive document understanding systems

Requires

Python 3.8+

Multi-modal embedding model (CLIP, LLaVA, or similar)

Image processing tools (OCR, vision model for captioning)

Limitations

Multi-modal embedding models are computationally expensive; inference is slower than text-only models

Image processing (OCR, captioning) adds preprocessing overhead and introduces errors

Multi-modal models have smaller context windows and fewer options than text-only models

What makes it unique

Implements multi-modal RAG using shared embedding spaces for text and images, enabling cross-modal retrieval where text queries find images and image queries find text — a unified approach that treats modalities symmetrically

vs alternatives

More comprehensive than text-only RAG because it handles visual content, and more practical than separate text and image pipelines because it uses unified embeddings for symmetric cross-modal retrieval

rag-evaluation-with-deepeval-framework

Medium confidence

Provides a comprehensive evaluation framework (DeepEval) for assessing RAG system quality across multiple dimensions: retrieval quality (precision, recall, NDCG), answer quality (faithfulness, relevance, coherence), and end-to-end performance. The framework includes pre-built metrics, dataset management, and evaluation pipelines that can be integrated into development workflows. Developers can define evaluation criteria, run automated evaluations against test datasets, and track metrics over time to monitor RAG system quality and detect regressions.

Solves for

I need to measure whether my RAG system is actually improving with changes I makeI want to evaluate retrieval quality, answer quality, and end-to-end performance systematicallyI need to track RAG system quality over time and detect regressions

Best for

teams building production RAG systems where quality monitoring is critical

developers iterating on RAG techniques and needing systematic evaluation

organizations requiring quality metrics for compliance or stakeholder reporting

Requires

Python 3.8+

DeepEval framework installed

Test dataset with queries and expected answers/retrieved documents

Limitations

Evaluation requires labeled test datasets; creating high-quality evaluation sets is time-consuming

Some metrics (faithfulness, relevance) require LLM-based assessment, adding cost and latency

Metric selection is domain-dependent; no universal set of metrics works for all RAG applications

What makes it unique

Provides an integrated evaluation framework (DeepEval) with pre-built metrics for retrieval quality, answer quality, and end-to-end performance, enabling systematic RAG evaluation without building custom evaluation pipelines — a comprehensive approach to RAG quality assurance

vs alternatives

More comprehensive than ad-hoc evaluation because it provides standardized metrics and automated evaluation pipelines, and more practical than building custom evaluators because it includes pre-built metrics for common RAG quality dimensions

rag-benchmarking-with-test-datasets

Medium confidence

Provides standardized benchmark datasets and evaluation protocols for comparing RAG techniques and implementations. The repository includes curated test datasets with queries, expected answers, and ground-truth retrieved documents, enabling developers to benchmark their RAG systems against known baselines. Benchmarks cover different domains (general knowledge, technical documentation, research papers) and query types (factual, conceptual, reasoning), allowing developers to assess RAG performance across diverse scenarios and compare their implementations against published baselines.

Solves for

I want to benchmark my RAG implementation against standard datasets to understand its performanceI need to compare different RAG techniques using the same evaluation datasetI want to know how my RAG system performs on different query types and domains

Best for

researchers comparing RAG techniques

developers evaluating RAG frameworks and implementations

teams establishing baseline performance before optimization

Requires

Python 3.8+

RAG implementation to evaluate

Evaluation framework (DeepEval or similar)

Limitations

Benchmark datasets may not reflect your specific domain or query distribution

Performance on benchmarks doesn't guarantee performance on production data

Benchmarks are static; they don't evolve with new RAG techniques or domains

What makes it unique

Provides curated benchmark datasets with ground-truth annotations for standardized RAG evaluation, enabling developers to compare implementations against known baselines and across different domains/query types — a structured approach to RAG benchmarking

vs alternatives

More rigorous than ad-hoc testing because it uses standardized datasets and protocols, and more practical than building custom benchmarks because datasets are pre-curated with ground truth

dual-framework-implementation-with-langchain-and-llamaindex

Medium confidence

Provides parallel implementations of all RAG techniques using both LangChain and LlamaIndex frameworks, showing how the same logical RAG concepts map to different framework abstractions. Each technique has implementations in both frameworks, allowing developers to understand RAG architecture independent of framework choice and to compare framework approaches. This dual-implementation strategy helps developers make informed framework choices and understand how to port RAG implementations between frameworks.

Solves for

I want to understand RAG concepts independent of which framework I chooseI need to compare LangChain and LlamaIndex to decide which framework to useI want to see how the same RAG technique is implemented differently in different frameworks

Best for

developers evaluating RAG frameworks

teams migrating between LangChain and LlamaIndex

learners wanting to understand RAG concepts independent of framework

Requires

Python 3.8+

Both LangChain and LlamaIndex installed

Understanding of both frameworks' abstractions and APIs

Limitations

Maintaining dual implementations increases maintenance burden; techniques may diverge between frameworks

Framework differences mean implementations aren't perfectly equivalent; some features may be framework-specific

Dual implementations may not cover all framework features; some advanced features may only be shown in one framework

What makes it unique

Provides parallel implementations of all 40+ RAG techniques in both LangChain and LlamaIndex, showing how the same logical RAG architecture maps to different framework abstractions — a framework-agnostic approach to RAG education

vs alternatives

More educational than single-framework tutorials because it shows framework-independent RAG concepts, and more practical than framework-specific guides because it enables developers to choose frameworks based on understanding rather than framework lock-in

production-ready-runnable-scripts-for-rag-techniques

Medium confidence

Provides standalone, executable Python scripts for each RAG technique that can be run immediately without modification (with API keys configured). Scripts include all necessary imports, configuration, and error handling, demonstrating production-ready patterns. Each script is self-contained and can serve as a template for implementing the technique in production systems. Scripts include examples with real data, showing end-to-end execution from document loading through answer generation.

Solves for

I want to quickly test a RAG technique without building from scratchI need a production-ready template for implementing a specific RAG techniqueI want to see a complete working example of a technique before integrating it into my system

Best for

developers prototyping RAG techniques quickly

teams building production RAG systems and needing reference implementations

practitioners wanting to understand techniques through working code

Requires

Python 3.8+

API keys for LLM and embedding model providers

Vector database setup (local or cloud)

Limitations

Scripts are examples; production deployment requires additional error handling, logging, monitoring

Scripts assume specific API keys and configurations; customization required for different environments

Scripts may not handle edge cases or scale to production data volumes without modification

What makes it unique

Provides standalone, immediately-executable Python scripts for each RAG technique with all necessary configuration and error handling, serving as production-ready templates rather than just educational notebooks — a practical approach to RAG implementation

vs alternatives

More practical than notebooks because scripts are immediately runnable and production-oriented, and more complete than code snippets because they include full implementations with error handling and configuration

query-transformation-and-enhancement

Medium confidence

Implements multiple query transformation techniques (query rewriting, expansion, decomposition) that improve retrieval by reformulating user queries into forms more likely to match relevant documents. Techniques include HyDE (Hypothetical Document Embeddings) which generates synthetic relevant documents from queries, HyPE which generates hypothetical passages, and multi-query expansion that creates semantically similar query variants. Each transformation is applied before retrieval to increase the likelihood of finding relevant chunks, with optional fusion of results from multiple query variants.

Solves for

My RAG system misses relevant documents because the user's query wording doesn't match document contentI want to implement HyDE or similar techniques to improve retrieval without retraining my embedding modelI need to handle ambiguous or under-specified queries by generating multiple interpretations

Best for

teams dealing with vocabulary mismatch between queries and documents

applications with domain-specific terminology where query expansion helps

systems where query quality is unpredictable (user-facing chatbots, search interfaces)

Requires

Python 3.8+

LLM API access for query transformation (OpenAI, Anthropic, or local model)

Embedding model for encoding transformed queries

Limitations

Query transformation adds latency (HyDE requires generating synthetic documents via LLM, typically 500-2000ms per query)

Synthetic document generation quality depends on LLM capability — weaker models produce less useful transformations

Multiple query variants increase vector database load; fusion of multiple retrievals adds computational cost

What makes it unique

Provides implementations of HyDE and HyPE techniques that use LLMs to generate synthetic documents or passages from queries, improving retrieval without modifying the embedding model or document index — a novel approach compared to traditional query expansion

vs alternatives

More effective than simple query expansion (synonyms, stemming) because it uses LLM understanding to generate contextually relevant synthetic documents, whereas traditional methods rely on lexical similarity

contextual-chunk-enrichment-with-headers

Medium confidence

Enhances retrieved chunks with contextual metadata by automatically generating or extracting chunk headers, parent document context, and hierarchical position information. When a chunk is retrieved, the system includes its semantic context (what section of the document it belongs to, what the surrounding chunks discuss) alongside the chunk content itself. This enrichment happens during indexing (headers are computed and stored with chunks) and retrieval (context is appended to retrieved chunks before passing to the LLM), improving the LLM's ability to understand chunk meaning without requiring larger context windows.

Solves for

My LLM is confused about the context of retrieved chunks because it lacks surrounding informationI want to include document structure (sections, chapters) in my RAG system without increasing chunk sizeI need to improve answer quality by giving the LLM more context about where each chunk comes from

Best for

systems with structured documents (books, technical documentation, research papers)

applications where chunk meaning depends heavily on document structure

teams wanting to improve answer quality without increasing context window usage

Requires

Python 3.8+

Structured or semi-structured documents with clear hierarchies

LLM for header generation (optional, can use rule-based extraction)

Limitations

Header generation requires document structure analysis — works best with well-structured documents, struggles with unstructured text

Storing contextual metadata increases index size by 10-30% depending on context depth

Contextual compression (removing redundant context) requires additional LLM calls, adding latency

What makes it unique

Automatically enriches chunks with hierarchical context and semantic headers during indexing, allowing the LLM to understand chunk meaning from context rather than requiring larger chunks or longer context windows — a preprocessing approach rather than prompt-engineering

vs alternatives

More efficient than increasing chunk size because it preserves semantic context without proportionally increasing embedding costs or context window usage, whereas naive approaches just make chunks larger

fusion-retrieval-with-multi-strategy-ranking

Medium confidence

Combines results from multiple retrieval strategies (dense semantic search, sparse BM25 keyword search, hypothetical document embeddings) using fusion algorithms (Reciprocal Rank Fusion, weighted scoring) to produce a unified ranked result set. Each retrieval strategy is executed independently, then results are merged using configurable fusion methods that balance semantic relevance (from dense retrieval) with keyword matching (from sparse retrieval). This approach captures both semantic and lexical relevance without requiring a single unified index.

Solves for

I want to combine semantic and keyword-based retrieval to get better coverage of relevant documentsMy dense retrieval misses documents with exact keyword matches that are relevant to the queryI need a retrieval strategy that handles both semantic queries and specific factual lookups

Best for

applications requiring both semantic understanding and keyword precision (technical documentation, legal search)

systems where document relevance depends on both meaning and specific terminology

teams wanting to improve recall without sacrificing precision

Requires

Python 3.8+

Vector database supporting dense retrieval

BM25 or similar sparse retrieval implementation (Elasticsearch, Lucene, or library)

Limitations

Fusion requires running multiple retrieval strategies, multiplying latency (typically 2-3x slower than single-strategy retrieval)

Fusion algorithm tuning is empirical — optimal weights depend on query distribution and document collection

Requires maintaining both dense (vector) and sparse (BM25) indices, doubling index storage and update complexity

What makes it unique

Implements Reciprocal Rank Fusion and weighted scoring to combine dense semantic retrieval with sparse keyword retrieval, allowing developers to balance semantic understanding with exact-match precision without choosing one strategy — a hybrid approach that's more robust than single-strategy retrieval

vs alternatives

More comprehensive than pure semantic search because it captures both meaning and keywords, and more practical than pure BM25 because it includes semantic understanding; fusion is more maintainable than building a custom unified ranking function

intelligent-reranking-with-cross-encoders

Medium confidence

Implements a two-stage retrieval pipeline where an initial retriever (fast, approximate) returns candidate chunks, then a cross-encoder reranker (slower, more accurate) scores and reorders results based on query-document relevance. The reranker uses transformer models that jointly encode the query and document to compute relevance scores, providing more accurate ranking than embedding-based similarity. This approach maintains retrieval speed (initial retrieval is still fast) while improving result quality through expensive but accurate reranking on a smaller candidate set.

Solves for

My initial retrieval returns many candidates but the top results aren't always most relevantI want to improve ranking quality without slowing down the initial retrieval stepI need to use more sophisticated relevance models without the computational cost of applying them to all documents

Best for

production RAG systems where ranking quality significantly impacts answer quality

applications with large document collections where initial retrieval must be fast

teams willing to trade reranking latency for improved result quality

Requires

Python 3.8+

Initial retriever (vector database with embedding model)

Cross-encoder model (HuggingFace, Cohere, or custom-trained)

Limitations

Reranking adds latency (cross-encoder inference typically 50-200ms per query depending on candidate set size)

Cross-encoder models are computationally expensive; GPU acceleration recommended for production use

Reranking effectiveness depends on initial retriever quality — if initial retrieval misses relevant documents, reranking can't recover them

What makes it unique

Implements a two-stage retrieval pipeline with cross-encoder reranking that jointly encodes query-document pairs for more accurate relevance scoring than embedding similarity, allowing developers to use expensive but accurate models on a small candidate set rather than all documents

vs alternatives

More accurate than single-stage embedding-based retrieval because cross-encoders directly model query-document relevance, but more efficient than applying cross-encoders to all documents because reranking only operates on initial retrieval candidates

hierarchical-index-construction-and-traversal

Medium confidence

Builds multi-level document indices where documents are recursively summarized into hierarchies (leaf chunks → summaries → higher-level summaries) and retrieval traverses this hierarchy top-down. The system first retrieves relevant high-level summaries, then recursively retrieves more detailed chunks from relevant branches, reducing the number of embeddings needed and improving retrieval efficiency. This approach is particularly effective for large document collections where flat indices become inefficient, enabling both faster retrieval and better handling of documents with varying levels of detail.

Solves for

I have very large documents and need to retrieve relevant sections efficiently without embedding every chunkI want to understand document structure hierarchically and retrieve at appropriate levels of detailI need to reduce the number of embeddings computed during retrieval while maintaining quality

Best for

systems with large document collections (100k+ chunks) where flat indices become inefficient

applications with hierarchically-structured documents (books, technical documentation, codebases)

teams optimizing for retrieval latency and embedding costs

Requires

Python 3.8+

LLM for recursive summarization (OpenAI, Anthropic, or local model)

Vector database supporting hierarchical organization

Limitations

Hierarchical index construction requires recursive summarization, adding significant preprocessing time (10-50x slower than flat indexing)

Summary quality depends on summarization model capability — poor summaries degrade retrieval quality

Traversal strategy (how many levels to retrieve, when to stop) requires tuning; no universal optimal strategy

What makes it unique

Implements recursive document summarization to build multi-level hierarchies that enable top-down retrieval traversal, reducing embedding computations and improving efficiency for large collections — a structural approach to retrieval efficiency rather than algorithmic optimization

vs alternatives

More efficient than flat indices for large collections because it reduces embeddings computed per query, and more effective than simple filtering because it uses semantic hierarchies rather than metadata-based pruning

adaptive-retrieval-with-query-routing

Medium confidence

Implements dynamic retrieval strategies that adapt based on query characteristics, routing different query types to different retrieval methods. The system analyzes incoming queries to determine optimal retrieval strategy (e.g., simple keyword search for factual lookups, semantic search for conceptual questions, graph-based retrieval for relationship queries) and applies the appropriate method. This routing can be rule-based (query classification) or learned (trained classifier), enabling the system to use the most efficient and effective retrieval method for each query type without requiring all queries to use the same strategy.

Solves for

Different types of queries need different retrieval strategies but I don't want to manually choose for each queryI want to optimize retrieval efficiency by using simple methods for simple queries and complex methods only when neededI need to handle diverse query types (factual, conceptual, relational) with appropriate retrieval methods

Best for

systems handling diverse query types (factual, conceptual, relational questions)

applications where query complexity varies widely

teams optimizing for both retrieval quality and latency across heterogeneous workloads

Requires

Python 3.8+

Query classifier (rule-based, ML model, or LLM-based)

Multiple retrieval implementations (keyword, semantic, graph-based, etc.)

Limitations

Query routing requires classification overhead (typically 50-200ms per query for LLM-based routing)

Routing strategy effectiveness depends on query type distribution — requires empirical evaluation on representative queries

Maintaining multiple retrieval strategies increases system complexity and operational overhead

What makes it unique

Implements query-aware routing that dynamically selects retrieval strategies based on query characteristics, allowing different query types to use optimized methods rather than forcing all queries through a single pipeline — an adaptive approach that improves both efficiency and quality

vs alternatives

More efficient than applying all retrieval strategies to every query (fusion) because it selects the most appropriate strategy, and more effective than single-strategy systems because it adapts to query type

retrieval-with-feedback-loops-and-iteration

Medium confidence

Implements iterative retrieval where initial retrieval results are evaluated, and based on evaluation (relevance feedback, answer quality assessment), the system refines queries or retrieval parameters and retrieves again. The feedback loop can be explicit (user indicates whether results are relevant) or implicit (system evaluates answer quality and decides whether to retrieve more context). This approach enables the system to improve results through iteration without requiring perfect initial retrieval, particularly useful for complex queries that may need multiple retrieval rounds to gather sufficient context.

Solves for

My initial retrieval doesn't find all relevant information and I want to iteratively refine until I have enough contextI want to implement user feedback loops where users indicate if results are relevant and the system refinesI need to handle complex queries that require multiple retrieval rounds to gather complete information

Best for

interactive systems where users can provide relevance feedback

complex query scenarios requiring multiple retrieval rounds

applications where answer quality can be assessed and used to trigger refinement

Requires

Python 3.8+

Initial retriever (vector database)

Feedback mechanism (explicit user input or implicit quality assessment)

Limitations

Iterative retrieval increases latency (each iteration adds retrieval + evaluation overhead, typically 500ms-2s per iteration)

Feedback loop termination requires clear stopping criteria; poorly designed criteria can lead to excessive iterations

Explicit user feedback requires user interaction; implicit feedback requires reliable quality assessment

What makes it unique

Implements explicit feedback loops where retrieval results are evaluated and used to trigger query refinement and re-retrieval, enabling iterative improvement without requiring perfect initial retrieval — a feedback-driven approach that's more robust for complex queries

vs alternatives

More effective for complex queries than single-shot retrieval because it allows refinement based on intermediate results, and more practical than requiring users to formulate perfect queries upfront

graph-based-rag-with-knowledge-graphs

Medium confidence

Implements RAG using knowledge graphs (GraphRAG, RAPTOR) where documents are converted into structured knowledge graphs with entities and relationships, and retrieval operates on graph structure rather than flat chunks. The system extracts entities and relationships from documents, builds a graph index, and retrieves relevant subgraphs based on query entities and relationship patterns. This approach enables relationship-aware retrieval (finding documents about related entities) and supports complex queries that depend on understanding connections between concepts, not just individual chunks.

Solves for

I need to retrieve information about relationships between entities, not just individual factsMy queries involve understanding how concepts connect and I need retrieval that respects these connectionsI want to leverage document structure and entity relationships to improve retrieval quality

Best for

domains with rich entity relationships (knowledge bases, research papers, technical documentation)

applications requiring relationship-aware retrieval (recommendation, knowledge discovery)

systems where query understanding depends on entity and relationship extraction

Requires

Python 3.8+

NER and relation extraction models (spaCy, transformer-based, or LLM-based)

Graph database (Neo4j, ArangoDB, or similar)

Limitations

Knowledge graph construction requires entity and relationship extraction, adding significant preprocessing overhead (5-10x slower than flat chunking)

Extraction quality depends on NER and relation extraction models; errors propagate through the graph

Graph-based retrieval is more complex to implement and tune than flat retrieval; requires graph database expertise

What makes it unique

Converts documents into structured knowledge graphs with entities and relationships, enabling retrieval based on graph structure and relationship patterns rather than text similarity — a structural approach that captures semantic relationships explicitly

vs alternatives

More effective for relationship-dependent queries than text-based retrieval because it explicitly models connections between entities, and more scalable than storing full documents because it stores compressed graph representations

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with RAG_Techniques, ranked by overlap. Discovered automatically through the match graph.

Model35

postgresml

Postgres with GPUs for ML/AI apps.

text chunking and preprocessing for rag pipelinesend-to-end rag pipeline construction with retrieval and generation

2 shared capabilities

Model41

AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

multi-stage rag pipeline evaluation with pluggable node typesdocument parsing and intelligent chunking with multiple backend support

2 shared capabilities

Framework19

LlamaIndex

A data framework for building LLM applications over external data.

rag-pipeline-with-enterprise-chunking-and-embeddingdocument-chunking-and-semantic-splitting

2 shared capabilities

Framework46

Crawl4AI

AI-optimized web crawler — clean markdown extraction, JS rendering, structured output for RAG.

intelligent content chunking for rag pipelines

1 shared capability

API39

LlamaParse

Document parsing API — complex PDFs with tables and charts to structured markdown for RAG.

rag-optimized document chunking and context preservation

1 shared capability

Repository27

@memberjunction/ai-vectordb

MemberJunction: AI Vector Database Module

rag-context-augmentation-pipeline

1 shared capability

Best For

✓developers building their first RAG system
✓teams evaluating RAG frameworks and needing architectural reference
✓researchers prototyping new RAG techniques within a standardized pipeline
✓teams optimizing RAG retrieval quality and cost
✓developers working with domain-specific documents (code, legal, medical) where semantic boundaries matter
✓practitioners tuning RAG systems for production deployment
✓high-stakes applications where answer correctness is critical
✓systems where hallucination detection is important

Known Limitations

⚠Pipeline assumes synchronous processing — no built-in support for streaming or async document ingestion at scale
⚠Standard pipeline doesn't handle multi-modal documents natively; multi-modal RAG is a separate technique
⚠No built-in persistence layer — requires external vector database and document store configuration
⚠Semantic chunking adds preprocessing latency (typically 2-5x slower than fixed-size splitting) due to boundary detection
⚠Optimal chunk size is workload-dependent — no universal best size; requires empirical testing with your specific queries
⚠Doesn't handle overlapping chunks natively; overlap must be implemented as a separate post-processing step

Requirements

Python 3.8+LangChain or LlamaIndex framework installedVector database (Chroma, Pinecone, Weaviate, Milvus, etc.)Embedding model API access (OpenAI, HuggingFace, local models)LLM API access (OpenAI, Anthropic, local models via Ollama)Document content in text or structured formatQuery dataset for empirical chunk size optimizationEmbedding model for semantic similarity calculations

Input / Output

Accepts: text documents (PDF, markdown, plain text), document paths or URLs, raw text content, raw text documents, structured documents (markdown, code files), query workload for optimization, query, retrieved context, generated answer, documents containing text and images, queries (text or image), test queries, expected retrieved documents or answers, RAG system outputs (retrieved chunks, generated answers), benchmark dataset (queries, expected answers, documents), RAG technique description, documents (provided in scripts), queries (provided in scripts), user query (text string), optional context about query domain, chunked documents with structural metadata, document hierarchy information (sections, subsections), user query (text), fusion algorithm configuration (weights, strategy selection), query (text), candidate chunks from initial retrieval, reranking model configuration, documents (text or structured), hierarchy configuration (summarization depth, chunk sizes at each level), optional query metadata or context, initial query, feedback (relevance judgments, quality scores, or user input), documents (text), entity and relationship extraction configuration

Produces: generated text responses, retrieved document chunks with relevance scores, structured metadata about retrieval process, chunked documents with metadata, chunk size optimization metrics (retrieval quality, latency), recommended chunk size parameters, validated answer, validation score/confidence, correction trace (if answer was corrected), retrieved text chunks and images, cross-modal relevance scores, unified ranked results across modalities, evaluation metrics (precision, recall, NDCG, faithfulness, relevance, etc.), per-query evaluation results, aggregated performance reports, benchmark results (metrics per query and aggregated), comparison with baseline implementations, per-domain and per-query-type breakdowns, LangChain implementation, LlamaIndex implementation, comparison of framework approaches, generated answers, retrieved chunks, execution logs, transformed query variants, synthetic documents (for HyDE), fused retrieval results from multiple queries, chunks enriched with contextual headers, chunks with parent/sibling context appended, metadata about chunk position in document hierarchy, fused ranked list of retrieved chunks, per-strategy scores for each result, combined relevance scores, reranked list of chunks, relevance scores from cross-encoder, top-k results after reranking, hierarchical index structure, retrieved chunks at appropriate hierarchy levels, traversal path showing which summaries were used, query classification/routing decision, retrieved results using selected strategy, routing explanation (which strategy was used and why), refined queries, accumulated retrieved chunks across iterations, iteration history and feedback trace, knowledge graph with entities and relationships, retrieved subgraphs relevant to query, entity-relationship paths connecting query concepts

UnfragileRank

Adoption40%(40% weight)

Quality53%(20% weight)

Ecosystem80%(15% weight)

Match Graph10%(20% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: Model

16 capabilities

Visit RAG_Techniques→

Repository Details

26,920

Stars

3,221

Forks

Jupyter Notebook

Language

NOASSERTION

License

Topics

aiembeddingslangchainllama-indexllmllmsnlpopenaipythonragretrieval-augmented-generationtutorialsvector-database

Last commit: Apr 15, 2026

About

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.

Alternatives to RAG_Techniques

vitest-llm-reporter30Repository

A Vitest reporter optimized for LLM parsing with structured, concise output

Compare →

vectra41Repository

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Compare →

@tanstack/ai37API

Core TanStack AI library - Open source AI SDK

Compare →

strapi-plugin-embeddings32Repository

AI embeddings and semantic search plugin for Strapi v5 with pgvector support

Compare →

Are you the builder of RAG_Techniques?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

github

Looking for something else?

Search →

Capabilities16 decomposed

foundational-rag-pipeline-implementation

Medium confidence

Solves for

Best for

developers building their first RAG system

teams evaluating RAG frameworks and needing architectural reference

researchers prototyping new RAG techniques within a standardized pipeline

Requires

Python 3.8+

LangChain or LlamaIndex framework installed

Vector database (Chroma, Pinecone, Weaviate, Milvus, etc.)

Limitations

Pipeline assumes synchronous processing — no built-in support for streaming or async document ingestion at scale

Standard pipeline doesn't handle multi-modal documents natively; multi-modal RAG is a separate technique

No built-in persistence layer — requires external vector database and document store configuration

What makes it unique

vs alternatives

semantic-chunking-with-size-optimization

Medium confidence

Solves for

Best for

teams optimizing RAG retrieval quality and cost

developers working with domain-specific documents (code, legal, medical) where semantic boundaries matter

practitioners tuning RAG systems for production deployment

Requires

Python 3.8+

Document content in text or structured format

Query dataset for empirical chunk size optimization

Limitations

Semantic chunking adds preprocessing latency (typically 2-5x slower than fixed-size splitting) due to boundary detection

Optimal chunk size is workload-dependent — no universal best size; requires empirical testing with your specific queries

Doesn't handle overlapping chunks natively; overlap must be implemented as a separate post-processing step

What makes it unique

vs alternatives

self-correcting-rag-with-answer-validation

Medium confidence

Solves for

Best for

high-stakes applications where answer correctness is critical

systems where hallucination detection is important

applications where self-correction can improve quality without user intervention

Requires

Python 3.8+

LLM for answer generation and validation

Retriever for initial and refinement retrieval

Limitations

Self-correction adds latency (validation + potential re-retrieval and regeneration, typically 1-3s per query)

Validator quality is critical — poor validators miss hallucinations or reject valid answers

Correction strategy must be carefully designed to avoid infinite loops or excessive iterations

What makes it unique

vs alternatives

multi-modal-rag-with-image-and-text

Medium confidence

Solves for

Best for

applications with rich media documents (technical documentation, research papers, product catalogs)

systems requiring cross-modal retrieval (find images for text queries)

teams building comprehensive document understanding systems

Requires

Python 3.8+

Multi-modal embedding model (CLIP, LLaVA, or similar)

Image processing tools (OCR, vision model for captioning)

Limitations

Multi-modal embedding models are computationally expensive; inference is slower than text-only models

Image processing (OCR, captioning) adds preprocessing overhead and introduces errors

Multi-modal models have smaller context windows and fewer options than text-only models

What makes it unique

vs alternatives

rag-evaluation-with-deepeval-framework

Medium confidence

Solves for

Best for

teams building production RAG systems where quality monitoring is critical

developers iterating on RAG techniques and needing systematic evaluation

organizations requiring quality metrics for compliance or stakeholder reporting

Requires

Python 3.8+

DeepEval framework installed

Test dataset with queries and expected answers/retrieved documents

Limitations

Evaluation requires labeled test datasets; creating high-quality evaluation sets is time-consuming

Some metrics (faithfulness, relevance) require LLM-based assessment, adding cost and latency

Metric selection is domain-dependent; no universal set of metrics works for all RAG applications

What makes it unique

vs alternatives

rag-benchmarking-with-test-datasets

Medium confidence

Solves for

Best for

researchers comparing RAG techniques

developers evaluating RAG frameworks and implementations

teams establishing baseline performance before optimization

Requires

Python 3.8+

RAG implementation to evaluate

Evaluation framework (DeepEval or similar)

Limitations

Benchmark datasets may not reflect your specific domain or query distribution

Performance on benchmarks doesn't guarantee performance on production data

Benchmarks are static; they don't evolve with new RAG techniques or domains

What makes it unique

vs alternatives

More rigorous than ad-hoc testing because it uses standardized datasets and protocols, and more practical than building custom benchmarks because datasets are pre-curated with ground truth

dual-framework-implementation-with-langchain-and-llamaindex

Medium confidence

Solves for

Best for

developers evaluating RAG frameworks

teams migrating between LangChain and LlamaIndex

learners wanting to understand RAG concepts independent of framework

Requires

Python 3.8+

Both LangChain and LlamaIndex installed

Understanding of both frameworks' abstractions and APIs

Limitations

Maintaining dual implementations increases maintenance burden; techniques may diverge between frameworks

Framework differences mean implementations aren't perfectly equivalent; some features may be framework-specific

Dual implementations may not cover all framework features; some advanced features may only be shown in one framework

What makes it unique

vs alternatives

production-ready-runnable-scripts-for-rag-techniques

Medium confidence

Solves for

Best for

developers prototyping RAG techniques quickly

teams building production RAG systems and needing reference implementations

practitioners wanting to understand techniques through working code

Requires

Python 3.8+

API keys for LLM and embedding model providers

Vector database setup (local or cloud)

Limitations

Scripts are examples; production deployment requires additional error handling, logging, monitoring

Scripts assume specific API keys and configurations; customization required for different environments

Scripts may not handle edge cases or scale to production data volumes without modification

What makes it unique

vs alternatives

query-transformation-and-enhancement

Medium confidence

Solves for

Best for

teams dealing with vocabulary mismatch between queries and documents

applications with domain-specific terminology where query expansion helps

systems where query quality is unpredictable (user-facing chatbots, search interfaces)

Requires

Python 3.8+

LLM API access for query transformation (OpenAI, Anthropic, or local model)

Embedding model for encoding transformed queries

Limitations

Query transformation adds latency (HyDE requires generating synthetic documents via LLM, typically 500-2000ms per query)

Synthetic document generation quality depends on LLM capability — weaker models produce less useful transformations

Multiple query variants increase vector database load; fusion of multiple retrievals adds computational cost

What makes it unique

vs alternatives

contextual-chunk-enrichment-with-headers

Medium confidence

Solves for

Best for

systems with structured documents (books, technical documentation, research papers)

applications where chunk meaning depends heavily on document structure

teams wanting to improve answer quality without increasing context window usage

Requires

Python 3.8+

Structured or semi-structured documents with clear hierarchies

LLM for header generation (optional, can use rule-based extraction)

Limitations

Header generation requires document structure analysis — works best with well-structured documents, struggles with unstructured text

Storing contextual metadata increases index size by 10-30% depending on context depth

Contextual compression (removing redundant context) requires additional LLM calls, adding latency

What makes it unique

vs alternatives

fusion-retrieval-with-multi-strategy-ranking

Medium confidence

Solves for

Best for

applications requiring both semantic understanding and keyword precision (technical documentation, legal search)

systems where document relevance depends on both meaning and specific terminology

teams wanting to improve recall without sacrificing precision

Requires

Python 3.8+

Vector database supporting dense retrieval

BM25 or similar sparse retrieval implementation (Elasticsearch, Lucene, or library)

Limitations

Fusion requires running multiple retrieval strategies, multiplying latency (typically 2-3x slower than single-strategy retrieval)

Fusion algorithm tuning is empirical — optimal weights depend on query distribution and document collection

Requires maintaining both dense (vector) and sparse (BM25) indices, doubling index storage and update complexity

What makes it unique

vs alternatives

intelligent-reranking-with-cross-encoders

Medium confidence

Solves for

Best for

production RAG systems where ranking quality significantly impacts answer quality

applications with large document collections where initial retrieval must be fast

teams willing to trade reranking latency for improved result quality

Requires

Python 3.8+

Initial retriever (vector database with embedding model)

Cross-encoder model (HuggingFace, Cohere, or custom-trained)

Limitations

Reranking adds latency (cross-encoder inference typically 50-200ms per query depending on candidate set size)

Cross-encoder models are computationally expensive; GPU acceleration recommended for production use

Reranking effectiveness depends on initial retriever quality — if initial retrieval misses relevant documents, reranking can't recover them

What makes it unique

vs alternatives

hierarchical-index-construction-and-traversal

Medium confidence

Solves for

Best for

systems with large document collections (100k+ chunks) where flat indices become inefficient

applications with hierarchically-structured documents (books, technical documentation, codebases)

teams optimizing for retrieval latency and embedding costs

Requires

Python 3.8+

LLM for recursive summarization (OpenAI, Anthropic, or local model)

Vector database supporting hierarchical organization

Limitations

Hierarchical index construction requires recursive summarization, adding significant preprocessing time (10-50x slower than flat indexing)

Summary quality depends on summarization model capability — poor summaries degrade retrieval quality

Traversal strategy (how many levels to retrieve, when to stop) requires tuning; no universal optimal strategy

What makes it unique

vs alternatives

adaptive-retrieval-with-query-routing

Medium confidence

Solves for

Best for

systems handling diverse query types (factual, conceptual, relational questions)

applications where query complexity varies widely

teams optimizing for both retrieval quality and latency across heterogeneous workloads

Requires

Python 3.8+

Query classifier (rule-based, ML model, or LLM-based)

Multiple retrieval implementations (keyword, semantic, graph-based, etc.)

Limitations

Query routing requires classification overhead (typically 50-200ms per query for LLM-based routing)

Routing strategy effectiveness depends on query type distribution — requires empirical evaluation on representative queries

Maintaining multiple retrieval strategies increases system complexity and operational overhead

What makes it unique

vs alternatives

retrieval-with-feedback-loops-and-iteration

Medium confidence

Solves for

Best for

interactive systems where users can provide relevance feedback

complex query scenarios requiring multiple retrieval rounds

applications where answer quality can be assessed and used to trigger refinement

Requires

Python 3.8+

Initial retriever (vector database)

Feedback mechanism (explicit user input or implicit quality assessment)

Limitations

Iterative retrieval increases latency (each iteration adds retrieval + evaluation overhead, typically 500ms-2s per iteration)

Feedback loop termination requires clear stopping criteria; poorly designed criteria can lead to excessive iterations

Explicit user feedback requires user interaction; implicit feedback requires reliable quality assessment

What makes it unique

vs alternatives

More effective for complex queries than single-shot retrieval because it allows refinement based on intermediate results, and more practical than requiring users to formulate perfect queries upfront

graph-based-rag-with-knowledge-graphs

Medium confidence

Solves for

Best for

domains with rich entity relationships (knowledge bases, research papers, technical documentation)

applications requiring relationship-aware retrieval (recommendation, knowledge discovery)

systems where query understanding depends on entity and relationship extraction

Requires

Python 3.8+

NER and relation extraction models (spaCy, transformer-based, or LLM-based)

Graph database (Neo4j, ArangoDB, or similar)

Limitations

Knowledge graph construction requires entity and relationship extraction, adding significant preprocessing overhead (5-10x slower than flat chunking)

Extraction quality depends on NER and relation extraction models; errors propagate through the graph

Graph-based retrieval is more complex to implement and tune than flat retrieval; requires graph database expertise

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to RAG_Techniques

vitest-llm-reporter30Repository

A Vitest reporter optimized for LLM parsing with structured, concise output

Compare →

vectra41Repository

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Compare →

@tanstack/ai37API

Core TanStack AI library - Open source AI SDK

Compare →

strapi-plugin-embeddings32Repository

AI embeddings and semantic search plugin for Strapi v5 with pgvector support

Compare →

RAG_Techniques

Capabilities16 decomposed

foundational-rag-pipeline-implementation

semantic-chunking-with-size-optimization

self-correcting-rag-with-answer-validation

multi-modal-rag-with-image-and-text

rag-evaluation-with-deepeval-framework

rag-benchmarking-with-test-datasets

dual-framework-implementation-with-langchain-and-llamaindex

production-ready-runnable-scripts-for-rag-techniques

query-transformation-and-enhancement

contextual-chunk-enrichment-with-headers

fusion-retrieval-with-multi-strategy-ranking

intelligent-reranking-with-cross-encoders

hierarchical-index-construction-and-traversal

adaptive-retrieval-with-query-routing

retrieval-with-feedback-loops-and-iteration

graph-based-rag-with-knowledge-graphs

Related Artifactssharing capabilities

postgresml

AutoRAG

LlamaIndex

Crawl4AI

LlamaParse

@memberjunction/ai-vectordb

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to RAG_Techniques

Are you the builder of RAG_Techniques?

Get the weekly brief

Data Sources

RAG_Techniques

Capabilities16 decomposed

foundational-rag-pipeline-implementation

semantic-chunking-with-size-optimization

self-correcting-rag-with-answer-validation

multi-modal-rag-with-image-and-text

rag-evaluation-with-deepeval-framework

rag-benchmarking-with-test-datasets

dual-framework-implementation-with-langchain-and-llamaindex

production-ready-runnable-scripts-for-rag-techniques

query-transformation-and-enhancement

contextual-chunk-enrichment-with-headers

fusion-retrieval-with-multi-strategy-ranking

intelligent-reranking-with-cross-encoders

hierarchical-index-construction-and-traversal

adaptive-retrieval-with-query-routing

retrieval-with-feedback-loops-and-iteration

graph-based-rag-with-knowledge-graphs

Related Artifactssharing capabilities

postgresml

AutoRAG

LlamaIndex

Crawl4AI

LlamaParse

@memberjunction/ai-vectordb

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to RAG_Techniques

Are you the builder of RAG_Techniques?

Get the weekly brief

Data Sources