Configurable Chunk Size And Overlap Control

1

R2RRepository51/100

via “configurable chunking strategies with semantic awareness”

SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

Unique: Supports multiple chunking strategies (fixed, semantic, code-aware) selectable via configuration, enabling optimization for different document types without code changes. Semantic chunking uses embeddings to identify natural breakpoints, preserving semantic units better than fixed-size windows.

vs others: More flexible than LangChain's fixed-size chunking because it supports semantic and code-aware strategies; more integrated than using external chunking libraries because strategy selection is built into R2R.

2

mcp-local-ragMCP Server42/100

via “configurable-document-chunking-with-overlap”

Local RAG MCP Server - Easy-to-setup document search with minimal configuration

Unique: Maintains rich chunk metadata including source offsets and document references, enabling precise source attribution and enabling clients to retrieve full context around search results if needed

vs others: More configurable than fixed-size splitting and more efficient than overlapping all documents, while providing better context preservation than non-overlapping chunks

3

RAG-chunk – A CLI to test RAG chunking strategiesCLI Tool38/100

via “sliding-window chunking with configurable stride”

Show HN: RAG-chunk – A CLI to test RAG chunking strategies

Unique: Provides explicit sliding-window implementation with independent control of window size and stride, enabling fine-grained tuning of chunk overlap and coverage without code modification

vs others: More flexible than fixed-size chunking for controlling overlap, and simpler to tune than semantic chunking while providing predictable chunk sizes

4

recursive-llm-tsRepository34/100

via “context-window-aware-chunking-with-overlap”

TypeScript bridge for recursive-llm: Recursive Language Models for unbounded context processing with structured outputs

Unique: Combines token-aware chunking with semantic boundary detection and configurable overlap, rather than naive fixed-size chunking

vs others: More sophisticated than simple character-based chunking and preserves context across boundaries, whereas most frameworks use fixed-size chunks

5

llm-splitterRepository29/100

Efficient, configurable text chunking utility for LLM vectorization. Returns rich chunk metadata.

Unique: Provides explicit, validated configuration parameters for chunk size, overlap, and strategy selection, allowing non-destructive experimentation with chunking behavior without modifying splitting logic

vs others: More flexible than fixed-strategy splitters by exposing configuration as first-class parameters, enabling easier integration into hyperparameter optimization pipelines

6

llm-chunkRepository26/100

via “configurable-chunk-size-and-overlap-management”

A super simple text splitter for LLM

Unique: Provides explicit, user-controlled overlap parameter rather than fixed or automatic overlap strategies, giving developers direct control over redundancy vs storage tradeoff without hidden heuristics

vs others: More transparent and predictable than LangChain's overlap implementation because parameters are explicit and not abstracted behind document-type detection, but requires more manual tuning

7

Private GPTProduct25/100

via “document-chunking-with-overlap”

Tool for private interaction with your documents

Unique: Implements structure-aware chunking that respects paragraph and section boundaries rather than naive token-based splitting, combined with configurable overlap to preserve context, and attaches rich metadata for source attribution

vs others: More sophisticated than simple fixed-size chunking used in basic RAG implementations; comparable to LangChain's recursive character splitter but with tighter integration to Private GPT's embedding and retrieval pipeline

Top Matches

Also Known As

Company