Embedding Model Selection Guide
Choose the right embedding model for your RAG system
Embedding Model Selection Guide
Overview
Choosing the right embedding model is critical for RAG system performance. This guide helps you select the optimal model based on your use case requirements, with clear recommendations and trade-off analysis.
Model Comparison
Available Models
| Model | Dimensions | Context Length | Model Size | Quality Score | Best For |
|---|---|---|---|---|---|
| Stella-EN-1.5B-v5 | 1024D | 512 tokens | ~5.5GB | 71.19 | State-of-the-art performance, 1.5B parameters |
| BGE-Large-EN-v1.5 | 1024D | 512 tokens | ~1.34GB | ~64 | Enterprise-grade, English optimized |
| E5-Large-v2 | 1024D | 512 tokens | ~1.34GB | ~64 | Multilingual support, versatile |
| All-MPNet-Base-v2 | 768D | 384 tokens | ~110M | Good | General-purpose semantic search |
| BGE-M3 | 1024D | 8192 tokens | ~2.24GB | Good | Long documents, multilingual, dense+sparse |
| Jina-Embeddings-v2-Base-EN | 768D | 8192 tokens | ~550MB | Good | Long documents, balanced performance |
| All-MiniLM-L6-v2 | 384D | 256 tokens | ~90MB | Fair | Prototyping, speed-optimized |
Model Details
Stella-EN-1.5B-v5 (1024D) ⭐ Recommended
Characteristics:
- State-of-the-art performance (71.19 MTEB score)
- 1.5B parameters for deep semantic understanding
- Excellent for complex queries and nuanced relationships
Best For:
- Production systems requiring highest accuracy
- Complex domain-specific applications
- Research and academic use cases
- Critical business applications
Limitations:
- Larger model size (~5.5GB)
- Requires more processing resources
BGE-Large-EN-v1.5 (1024D)
Characteristics:
- Solid performance (~64 MTEB score)
- Enterprise-grade reliability
- English language optimized
Best For:
- Enterprise deployments
- English-only content
- Balanced performance requirements
Limitations:
- Limited multilingual support
- May struggle with non-English content
E5-Large-v2 (1024D)
Characteristics:
- Good performance (~64 MTEB score)
- Multilingual support
- Versatile across domains
Best For:
- Multilingual content
- Diverse document collections
- General-purpose applications
Limitations:
- May not excel in specialized domains
All-MPNet-Base-v2 (768D)
Characteristics:
- Balanced size and performance
- Good semantic understanding
- Lightweight (~110M parameters)
Best For:
- Medium-scale applications
- Local deployment scenarios
- Resource-constrained environments
Limitations:
- May struggle with domain-specific terminology
- Lower accuracy than 1024D models
BGE-M3 (1024D) 📄 Long Context
Characteristics:
- Massive 8K context window (8192 tokens)
- Multilingual support
- Dense + sparse retrieval capabilities
Best For:
- Long document processing
- Legal and compliance documents
- Academic papers
- Multi-language content
Limitations:
- Larger model size (~2.24GB)
- May be overkill for short documents
Jina-Embeddings-v2-Base-EN (768D) 📄 Long Context
Characteristics:
- 8K context window (8192 tokens)
- Balanced performance
- Lightweight (~550MB)
Best For:
- Long documents with resource constraints
- Cost-effective long-context processing
- English long-form content
Limitations:
- English-only support
- Lower dimensionality than BGE-M3
All-MiniLM-L6-v2 (384D)
Characteristics:
- Smallest and fastest model
- Minimal memory footprint (~90MB)
- Quick processing
Best For:
- Prototyping and development
- Small document collections (< 1,000 chunks)
- Cost-sensitive applications
- Simple FAQ systems
Limitations:
- May miss nuanced semantic relationships
- Lower accuracy for complex queries
- Limited context window (256 tokens)
When to Use Higher Dimensions (1024D)
Indicators for Upgrade
Consider using 1024D models (Stella, BGE-Large, E5-Large) when:
- Complex domain with nuanced semantic relationships
- Multi-language content requiring fine-grained distinctions
- Critical applications where retrieval quality is paramount
- Large document collections where precision matters
- Current accuracy below 80% with smaller models
Expected Improvement
Upgrading from 384D to 1024D:
- 15-25% increase in retrieval accuracy
- Better handling of complex, multi-part queries
- More relevant similarity_score rankings
- Improved domain-specific terminology matching
Trade-offs
| Factor | Impact |
|---|---|
| Processing | 30-50% slower |
| Storage | 2.5x more vector storage |
| Memory | Larger model footprint |
| Accuracy | Significant improvement |
When to Use Lower Dimensions (384D-768D)
Indicators for Lower Dimensions
Consider lower dimensions when:
- Simple FAQ or fact-retrieval systems
- Cost-sensitive applications
- Real-time requirements with strict latency SLAs
- Small to medium document collections (< 10,000 chunks)
- Prototype/development phase
- Long document processing (use Jina or BGE-M3 for 8K context)
Expected Behavior
Advantages:
- 50-70% faster processing (All-MiniLM)
- Lower memory requirements
- Less storage for embeddings
Trade-offs:
- Lower retrieval accuracy for complex queries
- May struggle with domain-specific terminology
- May miss nuanced semantic matches
Migration Path
Recommended Progression
Prototype → Production → Optimized
↓ ↓ ↓
384D 768D 1024D
Stage 1: Prototype (384D)
Model: All-MiniLM-L6-v2
Goals:
- Validate concept quickly
- Minimize processing time during development
- Test basic functionality
Duration: 1-4 weeks
Stage 2: Production (768D)
Model: All-MPNet-Base-v2 or Jina-Embeddings-v2-Base-EN
Goals:
- Good balance of quality and speed
- Suitable for most use cases
- Production-ready performance
Duration: Ongoing or when better accuracy needed
Stage 3: Optimized (1024D)
Model: Stella-EN-1.5B-v5 or BGE-Large-EN-v1.5
Goals:
- Maximum accuracy for production
- Best-in-class performance
- State-of-the-art results
Duration: When accuracy requirements demand it
Domain-Specific Recommendations
Customer Support
Recommended: BGE-Large-EN-v1.5 (1024D) or All-MPNet-Base-v2 (768D)
Why:
- Balance of quality and processing speed for high-volume queries
- Good understanding of support terminology
- Fast enough for real-time responses
Consider BM25: Yes, for product names and error codes
Legal/Compliance
Recommended: Stella-EN-1.5B-v5 (1024D) + BGE-M3 for long documents
Why:
- Precision critical for legal language
- Nuanced understanding required
- Long context support for lengthy legal documents
Consider BM25: Yes, for specific legal terms and case citations
Technical Documentation
Recommended: BGE-Large-EN-v1.5 (1024D) + BM25
Why:
- Technical terms benefit from hybrid search
- Good balance for developer queries
- Function names match better with BM25
Consider BM25: Yes, essential for technical terms and API names
Research/Academic
Recommended: Stella-EN-1.5B-v5 (1024D) or E5-Large-v2 (1024D)
Why:
- Complex concepts across domains
- Cross-disciplinary relationships
- High accuracy requirements
- Multilingual support for international research
Consider BM25: Optional, depends on field
Healthcare
Recommended: Stella-EN-1.5B-v5 (1024D)
Why:
- Critical accuracy requirements
- Medical terminology precision
- Complex semantic relationships
Consider BM25: Yes, for medical codes and drug names
Long Document Processing
Recommended: BGE-M3 (1024D) or Jina-Embeddings-v2-Base-EN (768D)
Why:
- 8K context window support
- Can process entire sections without losing context
- Maintains coherence across long passages
Consider BM25: Yes, for document-specific terminology
Testing Your Model Choice
Benchmark Comparison
Run the same query set with different models:
Model: All-MiniLM-L6-v2 (384D)
Average Similarity: 0.65
Accuracy: 70%
Processing Time: Fast
Model: All-MPNet-Base-v2 (768D)
Average Similarity: 0.74
Accuracy: 78%
Processing Time: Medium
Model: BGE-Large-EN-v1.5 (1024D)
Average Similarity: 0.82
Accuracy: 86%
Processing Time: Medium-Slow
Model: Stella-EN-1.5B-v5 (1024D)
Average Similarity: 0.85
Accuracy: 89%
Processing Time: Slow
Decision Framework
If accuracy gain > 10%: Upgrade recommended
If accuracy gain < 5%: Current model sufficient
Consider cost/performance trade-off:
- Calculate cost per accurate answer
- Factor in user experience impact
- Consider latency requirements
Storage Requirements
Vector storage requirements per 1M vectors:
| Model | Dimensions | Storage |
|---|---|---|
| All-MiniLM-L6-v2 | 384D | ~1.5 GB |
| All-MPNet-Base-v2 | 768D | ~3 GB |
| BGE-Large-EN-v1.5 | 1024D | ~4 GB |
| Stella-EN-1.5B-v5 | 1024D | ~4 GB |
| BGE-M3 | 1024D | ~4 GB |
Processing Considerations
Model choice affects:
- Latency: Larger models = slower processing
- Memory: Model size ranges from 90MB to 5.5GB
- Throughput: Smaller models process more documents per minute
- GPU Requirements: 1024D models benefit from GPU acceleration
Quick Decision Guide
Choose 384D (All-MiniLM-L6-v2) When:
- Prototyping or development
- Speed is primary constraint
- Simple queries only
- < 1,000 documents
- Limited compute resources
Choose 768D (All-MPNet-Base-v2 or Jina) When:
- Balanced requirements
- Local deployment needed
- Medium document collection
- Good enough accuracy
- Long documents (Jina with 8K context)
Choose 1024D (BGE-Large, Stella, E5-Large) When:
- Production deployment
- Maximum accuracy needed
- Complex domain terminology
- Multi-language content
- Critical business application
Choose 1024D with 8K Context (BGE-M3) When:
- Long legal documents
- Academic papers
- Need to preserve long-range context
- Multi-language support required
Tips for Success
- Start Small: Begin with 384D or 768D for prototyping
- Benchmark Early: Test multiple models with your data
- Measure Everything: Track accuracy, latency, and cost
- Consider Total Cost: Include storage and query costs
- Plan for Growth: Choose model that scales with needs
