Docs
Embedding Model Selection Guide

Embedding Model Selection Guide

Choose the right embedding model for your RAG system

Embedding Model Selection Guide

Overview

Choosing the right embedding model is critical for RAG system performance. This guide helps you select the optimal model based on your use case requirements, with clear recommendations and trade-off analysis.

Model Comparison

Available Models

ModelDimensionsContext LengthModel SizeQuality ScoreBest For
Stella-EN-1.5B-v51024D512 tokens~5.5GB71.19State-of-the-art performance, 1.5B parameters
BGE-Large-EN-v1.51024D512 tokens~1.34GB~64Enterprise-grade, English optimized
E5-Large-v21024D512 tokens~1.34GB~64Multilingual support, versatile
All-MPNet-Base-v2768D384 tokens~110MGoodGeneral-purpose semantic search
BGE-M31024D8192 tokens~2.24GBGoodLong documents, multilingual, dense+sparse
Jina-Embeddings-v2-Base-EN768D8192 tokens~550MBGoodLong documents, balanced performance
All-MiniLM-L6-v2384D256 tokens~90MBFairPrototyping, speed-optimized

Model Details

Characteristics:

  • State-of-the-art performance (71.19 MTEB score)
  • 1.5B parameters for deep semantic understanding
  • Excellent for complex queries and nuanced relationships

Best For:

  • Production systems requiring highest accuracy
  • Complex domain-specific applications
  • Research and academic use cases
  • Critical business applications

Limitations:

  • Larger model size (~5.5GB)
  • Requires more processing resources

BGE-Large-EN-v1.5 (1024D)

Characteristics:

  • Solid performance (~64 MTEB score)
  • Enterprise-grade reliability
  • English language optimized

Best For:

  • Enterprise deployments
  • English-only content
  • Balanced performance requirements

Limitations:

  • Limited multilingual support
  • May struggle with non-English content

E5-Large-v2 (1024D)

Characteristics:

  • Good performance (~64 MTEB score)
  • Multilingual support
  • Versatile across domains

Best For:

  • Multilingual content
  • Diverse document collections
  • General-purpose applications

Limitations:

  • May not excel in specialized domains

All-MPNet-Base-v2 (768D)

Characteristics:

  • Balanced size and performance
  • Good semantic understanding
  • Lightweight (~110M parameters)

Best For:

  • Medium-scale applications
  • Local deployment scenarios
  • Resource-constrained environments

Limitations:

  • May struggle with domain-specific terminology
  • Lower accuracy than 1024D models

BGE-M3 (1024D) 📄 Long Context

Characteristics:

  • Massive 8K context window (8192 tokens)
  • Multilingual support
  • Dense + sparse retrieval capabilities

Best For:

  • Long document processing
  • Legal and compliance documents
  • Academic papers
  • Multi-language content

Limitations:

  • Larger model size (~2.24GB)
  • May be overkill for short documents

Jina-Embeddings-v2-Base-EN (768D) 📄 Long Context

Characteristics:

  • 8K context window (8192 tokens)
  • Balanced performance
  • Lightweight (~550MB)

Best For:

  • Long documents with resource constraints
  • Cost-effective long-context processing
  • English long-form content

Limitations:

  • English-only support
  • Lower dimensionality than BGE-M3

All-MiniLM-L6-v2 (384D)

Characteristics:

  • Smallest and fastest model
  • Minimal memory footprint (~90MB)
  • Quick processing

Best For:

  • Prototyping and development
  • Small document collections (< 1,000 chunks)
  • Cost-sensitive applications
  • Simple FAQ systems

Limitations:

  • May miss nuanced semantic relationships
  • Lower accuracy for complex queries
  • Limited context window (256 tokens)

When to Use Higher Dimensions (1024D)

Indicators for Upgrade

Consider using 1024D models (Stella, BGE-Large, E5-Large) when:

  • Complex domain with nuanced semantic relationships
  • Multi-language content requiring fine-grained distinctions
  • Critical applications where retrieval quality is paramount
  • Large document collections where precision matters
  • Current accuracy below 80% with smaller models

Expected Improvement

Upgrading from 384D to 1024D:

  • 15-25% increase in retrieval accuracy
  • Better handling of complex, multi-part queries
  • More relevant similarity_score rankings
  • Improved domain-specific terminology matching

Trade-offs

FactorImpact
Processing30-50% slower
Storage2.5x more vector storage
MemoryLarger model footprint
AccuracySignificant improvement

When to Use Lower Dimensions (384D-768D)

Indicators for Lower Dimensions

Consider lower dimensions when:

  • Simple FAQ or fact-retrieval systems
  • Cost-sensitive applications
  • Real-time requirements with strict latency SLAs
  • Small to medium document collections (< 10,000 chunks)
  • Prototype/development phase
  • Long document processing (use Jina or BGE-M3 for 8K context)

Expected Behavior

Advantages:

  • 50-70% faster processing (All-MiniLM)
  • Lower memory requirements
  • Less storage for embeddings

Trade-offs:

  • Lower retrieval accuracy for complex queries
  • May struggle with domain-specific terminology
  • May miss nuanced semantic matches

Migration Path

Prototype → Production → Optimized
    ↓           ↓           ↓
  384D       768D       1024D

Stage 1: Prototype (384D)

Model: All-MiniLM-L6-v2

Goals:

  • Validate concept quickly
  • Minimize processing time during development
  • Test basic functionality

Duration: 1-4 weeks

Stage 2: Production (768D)

Model: All-MPNet-Base-v2 or Jina-Embeddings-v2-Base-EN

Goals:

  • Good balance of quality and speed
  • Suitable for most use cases
  • Production-ready performance

Duration: Ongoing or when better accuracy needed

Stage 3: Optimized (1024D)

Model: Stella-EN-1.5B-v5 or BGE-Large-EN-v1.5

Goals:

  • Maximum accuracy for production
  • Best-in-class performance
  • State-of-the-art results

Duration: When accuracy requirements demand it

Domain-Specific Recommendations

Customer Support

Recommended: BGE-Large-EN-v1.5 (1024D) or All-MPNet-Base-v2 (768D)

Why:

  • Balance of quality and processing speed for high-volume queries
  • Good understanding of support terminology
  • Fast enough for real-time responses

Consider BM25: Yes, for product names and error codes

Legal/Compliance

Recommended: Stella-EN-1.5B-v5 (1024D) + BGE-M3 for long documents

Why:

  • Precision critical for legal language
  • Nuanced understanding required
  • Long context support for lengthy legal documents

Consider BM25: Yes, for specific legal terms and case citations

Technical Documentation

Recommended: BGE-Large-EN-v1.5 (1024D) + BM25

Why:

  • Technical terms benefit from hybrid search
  • Good balance for developer queries
  • Function names match better with BM25

Consider BM25: Yes, essential for technical terms and API names

Research/Academic

Recommended: Stella-EN-1.5B-v5 (1024D) or E5-Large-v2 (1024D)

Why:

  • Complex concepts across domains
  • Cross-disciplinary relationships
  • High accuracy requirements
  • Multilingual support for international research

Consider BM25: Optional, depends on field

Healthcare

Recommended: Stella-EN-1.5B-v5 (1024D)

Why:

  • Critical accuracy requirements
  • Medical terminology precision
  • Complex semantic relationships

Consider BM25: Yes, for medical codes and drug names

Long Document Processing

Recommended: BGE-M3 (1024D) or Jina-Embeddings-v2-Base-EN (768D)

Why:

  • 8K context window support
  • Can process entire sections without losing context
  • Maintains coherence across long passages

Consider BM25: Yes, for document-specific terminology

Testing Your Model Choice

Benchmark Comparison

Run the same query set with different models:

Model: All-MiniLM-L6-v2 (384D)
Average Similarity: 0.65
Accuracy: 70%
Processing Time: Fast

Model: All-MPNet-Base-v2 (768D)
Average Similarity: 0.74
Accuracy: 78%
Processing Time: Medium

Model: BGE-Large-EN-v1.5 (1024D)
Average Similarity: 0.82
Accuracy: 86%
Processing Time: Medium-Slow

Model: Stella-EN-1.5B-v5 (1024D)
Average Similarity: 0.85
Accuracy: 89%
Processing Time: Slow

Decision Framework

If accuracy gain > 10%: Upgrade recommended

If accuracy gain < 5%: Current model sufficient

Consider cost/performance trade-off:

  • Calculate cost per accurate answer
  • Factor in user experience impact
  • Consider latency requirements

Storage Requirements

Vector storage requirements per 1M vectors:

ModelDimensionsStorage
All-MiniLM-L6-v2384D~1.5 GB
All-MPNet-Base-v2768D~3 GB
BGE-Large-EN-v1.51024D~4 GB
Stella-EN-1.5B-v51024D~4 GB
BGE-M31024D~4 GB

Processing Considerations

Model choice affects:

  • Latency: Larger models = slower processing
  • Memory: Model size ranges from 90MB to 5.5GB
  • Throughput: Smaller models process more documents per minute
  • GPU Requirements: 1024D models benefit from GPU acceleration

Quick Decision Guide

Choose 384D (All-MiniLM-L6-v2) When:

  • Prototyping or development
  • Speed is primary constraint
  • Simple queries only
  • < 1,000 documents
  • Limited compute resources

Choose 768D (All-MPNet-Base-v2 or Jina) When:

  • Balanced requirements
  • Local deployment needed
  • Medium document collection
  • Good enough accuracy
  • Long documents (Jina with 8K context)

Choose 1024D (BGE-Large, Stella, E5-Large) When:

  • Production deployment
  • Maximum accuracy needed
  • Complex domain terminology
  • Multi-language content
  • Critical business application

Choose 1024D with 8K Context (BGE-M3) When:

  • Long legal documents
  • Academic papers
  • Need to preserve long-range context
  • Multi-language support required

Tips for Success

  1. Start Small: Begin with 384D or 768D for prototyping
  2. Benchmark Early: Test multiple models with your data
  3. Measure Everything: Track accuracy, latency, and cost
  4. Consider Total Cost: Include storage and query costs
  5. Plan for Growth: Choose model that scales with needs