@rag-forge/shared
RepositoryFreeInternal shared utilities for RAG-Forge packages
Capabilities9 decomposed
rag pipeline type definitions and schema validation
Medium confidenceProvides shared TypeScript type definitions and runtime schema validators for RAG pipeline components across the RAG-Forge ecosystem. Implements a centralized type system that enforces consistency across document loaders, chunking strategies, embedding providers, and retrieval components, using TypeScript interfaces and potentially Zod or similar validation libraries for runtime safety.
Centralizes RAG-specific type definitions (Document, Chunk, EmbeddingResult, RetrievalResult) in a single shared package, eliminating type duplication across document loaders, chunking, embedding, and retrieval modules while maintaining runtime validation for configuration objects
Stronger than ad-hoc type sharing because it enforces a single source of truth for RAG data contracts, preventing silent type mismatches between loosely-coupled pipeline stages
document and chunk abstraction interfaces
Medium confidenceDefines unified interfaces for Document and Chunk objects that abstract over different source formats (PDFs, web pages, markdown, databases) and chunking strategies (fixed-size, semantic, recursive). Provides a normalized representation layer so downstream embedding and retrieval components can operate on a consistent data model regardless of input source or chunking method.
Provides a source-agnostic Document/Chunk abstraction that preserves both content and metadata (source URI, chunk index, byte offsets) while remaining flexible enough to support custom chunking strategies and document loaders without modification
More flexible than LangChain's Document abstraction because it explicitly models chunk relationships and supports arbitrary metadata preservation, enabling better traceability in retrieval results
embedding provider interface and adapter pattern
Medium confidenceDefines a standardized interface for embedding providers (OpenAI, Anthropic, local models, etc.) with an adapter pattern that allows swapping embedding backends without changing application code. Handles provider-specific API details (authentication, rate limiting, batch sizing, dimension handling) behind a unified abstraction layer.
Implements a provider-agnostic embedding interface with built-in adapters for multiple backends (OpenAI, Anthropic, local models), allowing runtime provider selection and fallback without code changes, plus explicit handling of dimension mismatches and batch optimization
More modular than LangChain's Embeddings class because it separates provider logic into discrete adapters, making it easier to add new providers and test provider-specific behavior in isolation
vector store abstraction and retrieval interface
Medium confidenceDefines a unified interface for vector stores (Pinecone, Weaviate, Milvus, in-memory) that abstracts over different storage backends and retrieval strategies. Handles similarity search, filtering, metadata queries, and result ranking through a consistent API, allowing applications to swap vector stores without changing retrieval logic.
Provides a backend-agnostic vector store interface with adapters for multiple storage systems (Pinecone, Weaviate, Milvus, in-memory), supporting both similarity search and metadata filtering through a unified query API that hides backend-specific syntax
More flexible than LangChain's VectorStore because it explicitly models metadata filtering and result ranking as first-class operations, not afterthoughts, enabling more sophisticated retrieval strategies
rag pipeline orchestration and composition
Medium confidenceProvides utilities for composing RAG pipelines from discrete components (loaders, chunkers, embedders, retrievers) with explicit data flow and error handling. Likely uses a builder pattern or functional composition to chain stages, with support for parallel processing, caching, and observability hooks at each stage.
Provides a composable pipeline abstraction that chains RAG stages (load → chunk → embed → retrieve) with explicit error handling, caching, and observability hooks, using a builder or functional composition pattern to avoid deeply nested callbacks
Simpler than full workflow orchestration tools (Airflow, Prefect) because it's purpose-built for RAG pipelines, but more flexible than monolithic RAG frameworks because stages are independently testable and swappable
configuration management and environment variable handling
Medium confidenceProvides utilities for loading, validating, and managing RAG pipeline configuration from environment variables, config files, or runtime objects. Handles secrets management (API keys, database credentials) with support for different environments (dev, staging, prod) and configuration validation against defined schemas.
Centralizes RAG-specific configuration management with schema validation, environment-specific overrides, and secrets handling, allowing different embedding providers, vector stores, and chunking strategies to be selected via configuration without code changes
More specialized than generic config libraries (dotenv, convict) because it understands RAG-specific configuration patterns (provider selection, model names, batch sizes) and validates them against RAG component schemas
logging and observability utilities
Medium confidenceProvides structured logging and observability hooks for RAG pipelines, including timing information, error tracking, and metrics collection at each stage. Likely integrates with common logging frameworks and supports different log levels, formatters, and output destinations (console, files, external services).
Provides RAG-specific logging utilities that track execution time, token consumption, and error details at each pipeline stage, with structured output compatible with common logging frameworks and optional integration with external observability services
More focused than generic logging libraries because it understands RAG pipeline stages and automatically instruments them with relevant metrics (embedding dimensions, retrieval latency, chunk count)
error handling and retry strategies
Medium confidenceProvides utilities for handling errors in RAG pipelines with configurable retry strategies, exponential backoff, and fallback mechanisms. Handles transient failures (API rate limits, network timeouts) differently from permanent failures (invalid API keys, unsupported document formats) with appropriate recovery strategies.
Implements RAG-specific error handling that distinguishes between transient failures (rate limits, timeouts) and permanent failures (invalid credentials, unsupported formats), with configurable retry strategies and optional fallback provider support
More sophisticated than basic try-catch because it understands API-specific error codes and implements exponential backoff with jitter, reducing thundering herd problems when multiple clients retry simultaneously
utility functions for text processing and normalization
Medium confidenceProvides helper functions for common text processing tasks in RAG pipelines: tokenization, text normalization (lowercasing, removing punctuation), whitespace handling, and encoding/decoding. These utilities ensure consistent text preprocessing across different document loaders and chunking strategies.
Provides RAG-specific text utilities (tokenization, normalization, encoding handling) that work consistently across different document sources and embedding models, with optional integration with model-specific tokenizers for accurate token counting
More focused than general NLP libraries (NLTK, spaCy) because it's optimized for RAG preprocessing tasks and integrates with embedding model tokenizers for accurate token counting
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with @rag-forge/shared, ranked by overlap. Discovered automatically through the match graph.
@kb-labs/mind-engine
Mind engine adapter for KB Labs Mind (RAG, embeddings, vector store integration).
Awesome RAG Production
A curated list of tools and resources for building production RAG systems.
Unstructured
Document preprocessing for RAG — parse PDFs, DOCX, images into clean structured elements.
@roadiehq/rag-ai-backend-embeddings-aws
The AWS (Bedrock) backend module for the @roadiehq/rag-ai plugin.
RAG_Techniques
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed notebook tutorial.
create-llama
LlamaIndex CLI to scaffold full-stack RAG applications.
Best For
- ✓RAG-Forge package maintainers building interconnected document processing pipelines
- ✓Teams implementing multi-stage RAG systems requiring consistent data contracts between stages
- ✓RAG systems ingesting heterogeneous document types (PDFs, web content, structured data)
- ✓Teams building pluggable chunking strategies that need to work with any document loader
- ✓RAG systems that need flexibility to change embedding providers based on cost/latency tradeoffs
- ✓Teams building multi-provider RAG systems with fallback strategies
- ✓RAG systems that need to support multiple vector store backends (cloud-hosted vs self-hosted)
- ✓Teams evaluating different vector stores and need to avoid vendor lock-in
Known Limitations
- ⚠Type definitions are TypeScript-only; non-TS consumers must rely on runtime validation or manual type mapping
- ⚠Schema changes require coordinated updates across all dependent packages in the monorepo
- ⚠No automatic migration path for breaking schema changes in production deployments
- ⚠Abstraction may lose source-specific metadata if not explicitly preserved in the interface
- ⚠Performance overhead from normalization layer when processing large document batches
- ⚠Requires careful design to balance flexibility with usability — overly generic interfaces become hard to work with
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Package Details
About
Internal shared utilities for RAG-Forge packages
Categories
Alternatives to @rag-forge/shared
Are you the builder of @rag-forge/shared?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →