general-purpose text embedding generation with 32k token context
Converts unstructured text into dense vector representations using the voyage-3.5 model, supporting up to 32K tokens of context per input. The model is optimized for retrieval-augmented generation (RAG) pipelines and produces 3x-8x shorter vectors than competing embeddings while maintaining superior accuracy on benchmark tasks. Handles arbitrary text length by chunking internally and returning normalized vector outputs compatible with any vector database.
Unique: Supports 32K token context window (claimed as longest commercial context for embeddings) and produces 3x-8x shorter vectors than competitors while maintaining benchmark-leading accuracy, enabling more efficient vector storage and faster similarity search operations.
vs alternatives: Outperforms OpenAI text-embedding-3-large and Cohere embed-english-v3.0 on MTEB benchmarks while producing significantly shorter vectors, reducing vector database storage overhead and query latency by orders of magnitude.
lightweight text embedding generation with reduced model footprint
Provides the voyage-3.5-lite variant, a compressed version of the general-purpose embedding model optimized for inference speed and reduced computational requirements. Maintains competitive accuracy on retrieval benchmarks while consuming 4x less compute resources, enabling deployment on edge devices, serverless functions, and cost-constrained environments. Produces the same vector format as voyage-3.5 for seamless integration into existing RAG pipelines.
Unique: Explicitly optimized for 4x faster inference with reduced computational footprint compared to voyage-3.5, enabling deployment in resource-constrained environments (serverless, edge, mobile) while maintaining competitive retrieval accuracy.
vs alternatives: Faster and cheaper than OpenAI text-embedding-3-small for high-volume workloads while claiming superior accuracy, making it ideal for cost-sensitive RAG systems that cannot tolerate cloud API latency.
llm-agnostic embedding and reranking for rag pipelines
Voyage AI embeddings and reranking models are designed to integrate with any large language model (OpenAI, Anthropic, Ollama, open-source LLMs, etc.) without vendor-specific adapters. The embedding and reranking outputs conform to standard formats that any LLM can consume, enabling flexible RAG pipeline composition. Organizations can combine Voyage embeddings with their choice of LLM without architectural constraints or proprietary integrations.
Unique: Embeddings and reranking designed to integrate with any LLM provider without vendor-specific adapters, enabling flexible RAG pipeline composition and LLM provider switching without architectural changes.
vs alternatives: Provides greater flexibility than LLM-specific embedding solutions (e.g., OpenAI embeddings tied to OpenAI LLMs) by working with any LLM, enabling organizations to optimize each component independently.
domain-specific embedding models for finance, legal, and code
Provides specialized embedding models fine-tuned for specific domains (finance, legal, code) that outperform general-purpose embeddings on domain-specific retrieval benchmarks. Each model is trained on domain-relevant corpora and optimized for terminology, context, and semantic relationships unique to that field. Integrates seamlessly into RAG pipelines by replacing the general-purpose embedding model while maintaining the same vector database interface.
Unique: Fine-tuned embeddings for finance, legal, and code domains that optimize for domain-specific terminology and semantic relationships, outperforming general-purpose embeddings on domain benchmarks while maintaining compatibility with standard vector database infrastructure.
vs alternatives: Outperforms general-purpose embeddings (OpenAI, Cohere) on domain-specific retrieval tasks by incorporating domain-relevant training data and terminology, reducing false positives and improving precision for specialized RAG applications.
custom company-specific embedding models via fine-tuning
Enables organizations to request custom fine-tuned embedding models tailored to their proprietary data, terminology, and domain-specific requirements. The fine-tuning process leverages Voyage AI's base models and adapts them to company-specific semantic relationships, enabling superior retrieval performance on internal knowledge bases and proprietary corpora. Custom models are deployed via the same API interface as standard models, requiring no changes to downstream RAG infrastructure.
Unique: Offers custom fine-tuning service to adapt base embedding models to proprietary company data and terminology, enabling superior retrieval performance on internal knowledge bases while maintaining API compatibility with standard Voyage models.
vs alternatives: Provides enterprise-grade customization beyond what general-purpose embedding providers offer, enabling organizations to achieve domain-specific retrieval accuracy that off-the-shelf models cannot match.
multimodal embedding generation for text and images
The voyage-multimodal-3.5 model generates embeddings for both text and images in a shared vector space, enabling cross-modal retrieval where text queries can retrieve relevant images and vice versa. The model is trained to align text and image semantics, producing vectors that preserve both modalities' semantic relationships. Integrates into RAG pipelines to support hybrid document collections containing both text and visual content.
Unique: Announced multimodal embedding model that generates vectors in a shared text-image space, enabling cross-modal retrieval where text queries retrieve images and vice versa, extending RAG capabilities beyond text-only systems.
vs alternatives: Enables true cross-modal search capabilities that text-only embedding providers (OpenAI, Cohere) cannot offer, supporting hybrid document collections with mixed content types in a single vector space.
context-aware chunk-level embeddings with global document context
The voyage-context-3 model generates embeddings that preserve both chunk-level details and global document context, addressing the limitation of standard embeddings that lose document-level semantics when chunking. The model is trained to understand how individual chunks relate to the overall document structure and meaning, improving retrieval accuracy for systems that chunk documents into smaller units. Outputs embeddings compatible with standard vector databases while maintaining awareness of document-level context.
Unique: Explicitly designed to preserve global document context in chunk-level embeddings, addressing the semantic loss that occurs when documents are chunked for vector database storage, improving retrieval accuracy for chunked document collections.
vs alternatives: Outperforms standard embeddings on chunked document retrieval by maintaining document-level context awareness, reducing false positives and improving precision compared to embeddings that treat chunks as independent units.
general-purpose reranking with instruction-following capability
The rerank-2.5 model re-orders retrieved search results to improve relevance ranking, using instruction-following capabilities to adapt reranking behavior based on user intent. The model takes a query and a list of candidate documents, scores each document's relevance to the query, and returns a ranked list optimized for precision. Integrates into RAG pipelines as a post-retrieval step to refine results from vector database queries before passing to the LLM.
Unique: Reranking model with explicit instruction-following capability, enabling dynamic reranking behavior based on query intent or custom ranking criteria, beyond simple relevance scoring.
vs alternatives: Outperforms Cohere rerank and Jina reranker on MTEB ranking benchmarks while supporting instruction-following for custom ranking logic, enabling more flexible and precise result ranking.
+3 more capabilities