Embedding Model Selection Guide

Docs

Choose the right embedding model for your RAG system

Embedding Model Selection Guide

Overview

Choosing the right embedding model is critical for RAG system performance. This guide helps you select the optimal model based on your use case requirements, with clear recommendations and trade-off analysis.

Model Comparison

Available Models

Model	Dimensions	Context Length	Model Size	Quality Score	Best For
Stella-EN-1.5B-v5	1024D	512 tokens	~5.5GB	71.19	State-of-the-art performance, 1.5B parameters
BGE-Large-EN-v1.5	1024D	512 tokens	~1.34GB	~64	Enterprise-grade, English optimized
E5-Large-v2	1024D	512 tokens	~1.34GB	~64	Multilingual support, versatile
All-MPNet-Base-v2	768D	384 tokens	~110M	Good	General-purpose semantic search
BGE-M3	1024D	8192 tokens	~2.24GB	Good	Long documents, multilingual, dense+sparse
Jina-Embeddings-v2-Base-EN	768D	8192 tokens	~550MB	Good	Long documents, balanced performance
All-MiniLM-L6-v2	384D	256 tokens	~90MB	Fair	Prototyping, speed-optimized

Model Details

Stella-EN-1.5B-v5 (1024D) ⭐ Recommended

Characteristics:

State-of-the-art performance (71.19 MTEB score)
1.5B parameters for deep semantic understanding
Excellent for complex queries and nuanced relationships

Best For:

Production systems requiring highest accuracy
Complex domain-specific applications
Research and academic use cases
Critical business applications

Limitations:

Larger model size (~5.5GB)
Requires more processing resources

BGE-Large-EN-v1.5 (1024D)

Characteristics:

Solid performance (~64 MTEB score)
Enterprise-grade reliability
English language optimized

Best For:

Enterprise deployments
English-only content
Balanced performance requirements

Limitations:

Limited multilingual support
May struggle with non-English content

E5-Large-v2 (1024D)

Characteristics:

Good performance (~64 MTEB score)
Multilingual support
Versatile across domains

Best For:

Multilingual content
Diverse document collections
General-purpose applications

Limitations:

May not excel in specialized domains

All-MPNet-Base-v2 (768D)

Characteristics:

Balanced size and performance
Good semantic understanding
Lightweight (~110M parameters)

Best For:

Medium-scale applications
Local deployment scenarios
Resource-constrained environments

Limitations:

May struggle with domain-specific terminology
Lower accuracy than 1024D models

BGE-M3 (1024D) 📄 Long Context

Characteristics:

Massive 8K context window (8192 tokens)
Multilingual support
Dense + sparse retrieval capabilities

Best For:

Long document processing
Legal and compliance documents
Academic papers
Multi-language content

Limitations:

Larger model size (~2.24GB)
May be overkill for short documents

Jina-Embeddings-v2-Base-EN (768D) 📄 Long Context

Characteristics:

8K context window (8192 tokens)
Balanced performance
Lightweight (~550MB)

Best For:

Long documents with resource constraints
Cost-effective long-context processing
English long-form content

Limitations:

English-only support
Lower dimensionality than BGE-M3

All-MiniLM-L6-v2 (384D)

Characteristics:

Smallest and fastest model
Minimal memory footprint (~90MB)
Quick processing

Best For:

Prototyping and development
Small document collections (< 1,000 chunks)
Cost-sensitive applications
Simple FAQ systems

Limitations:

May miss nuanced semantic relationships
Lower accuracy for complex queries
Limited context window (256 tokens)

When to Use Higher Dimensions (1024D)

Indicators for Upgrade

Consider using 1024D models (Stella, BGE-Large, E5-Large) when:

Complex domain with nuanced semantic relationships
Multi-language content requiring fine-grained distinctions
Critical applications where retrieval quality is paramount
Large document collections where precision matters
Current accuracy below 80% with smaller models

Expected Improvement

Upgrading from 384D to 1024D:

15-25% increase in retrieval accuracy
Better handling of complex, multi-part queries
More relevant similarity_score rankings
Improved domain-specific terminology matching

Trade-offs

Factor	Impact
Processing	30-50% slower
Storage	2.5x more vector storage
Memory	Larger model footprint
Accuracy	Significant improvement

When to Use Lower Dimensions (384D-768D)

Indicators for Lower Dimensions

Consider lower dimensions when:

Simple FAQ or fact-retrieval systems
Cost-sensitive applications
Real-time requirements with strict latency SLAs
Small to medium document collections (< 10,000 chunks)
Prototype/development phase
Long document processing (use Jina or BGE-M3 for 8K context)

Expected Behavior

Advantages:

50-70% faster processing (All-MiniLM)
Lower memory requirements
Less storage for embeddings

Trade-offs:

Lower retrieval accuracy for complex queries
May struggle with domain-specific terminology
May miss nuanced semantic matches

Migration Path

Recommended Progression

Prototype → Production → Optimized
    ↓           ↓           ↓
  384D       768D       1024D

Stage 1: Prototype (384D)

Model: All-MiniLM-L6-v2

Goals:

Validate concept quickly
Minimize processing time during development
Test basic functionality

Duration: 1-4 weeks

Stage 2: Production (768D)

Model: All-MPNet-Base-v2 or Jina-Embeddings-v2-Base-EN

Goals:

Good balance of quality and speed
Suitable for most use cases
Production-ready performance

Duration: Ongoing or when better accuracy needed

Stage 3: Optimized (1024D)

Model: Stella-EN-1.5B-v5 or BGE-Large-EN-v1.5

Goals:

Maximum accuracy for production
Best-in-class performance
State-of-the-art results

Duration: When accuracy requirements demand it

Domain-Specific Recommendations

Customer Support

Recommended: BGE-Large-EN-v1.5 (1024D) or All-MPNet-Base-v2 (768D)

Why:

Balance of quality and processing speed for high-volume queries
Good understanding of support terminology
Fast enough for real-time responses

Consider BM25: Yes, for product names and error codes

Legal/Compliance

Recommended: Stella-EN-1.5B-v5 (1024D) + BGE-M3 for long documents

Why:

Precision critical for legal language
Nuanced understanding required
Long context support for lengthy legal documents

Consider BM25: Yes, for specific legal terms and case citations

Technical Documentation

Recommended: BGE-Large-EN-v1.5 (1024D) + BM25

Why:

Technical terms benefit from hybrid search
Good balance for developer queries
Function names match better with BM25

Consider BM25: Yes, essential for technical terms and API names

Research/Academic

Recommended: Stella-EN-1.5B-v5 (1024D) or E5-Large-v2 (1024D)

Why:

Complex concepts across domains
Cross-disciplinary relationships
High accuracy requirements
Multilingual support for international research

Consider BM25: Optional, depends on field

Healthcare

Recommended: Stella-EN-1.5B-v5 (1024D)

Why:

Critical accuracy requirements
Medical terminology precision
Complex semantic relationships

Consider BM25: Yes, for medical codes and drug names

Long Document Processing

Recommended: BGE-M3 (1024D) or Jina-Embeddings-v2-Base-EN (768D)

Why:

8K context window support
Can process entire sections without losing context
Maintains coherence across long passages

Consider BM25: Yes, for document-specific terminology

Testing Your Model Choice

Benchmark Comparison

Run the same query set with different models:

Model: All-MiniLM-L6-v2 (384D)
Average Similarity: 0.65
Accuracy: 70%
Processing Time: Fast

Model: All-MPNet-Base-v2 (768D)
Average Similarity: 0.74
Accuracy: 78%
Processing Time: Medium

Model: BGE-Large-EN-v1.5 (1024D)
Average Similarity: 0.82
Accuracy: 86%
Processing Time: Medium-Slow

Model: Stella-EN-1.5B-v5 (1024D)
Average Similarity: 0.85
Accuracy: 89%
Processing Time: Slow

Decision Framework

If accuracy gain > 10%: Upgrade recommended

If accuracy gain < 5%: Current model sufficient

Consider cost/performance trade-off:

Calculate cost per accurate answer
Factor in user experience impact
Consider latency requirements

Storage Requirements

Vector storage requirements per 1M vectors:

Model	Dimensions	Storage
All-MiniLM-L6-v2	384D	~1.5 GB
All-MPNet-Base-v2	768D	~3 GB
BGE-Large-EN-v1.5	1024D	~4 GB
Stella-EN-1.5B-v5	1024D	~4 GB
BGE-M3	1024D	~4 GB

Processing Considerations

Model choice affects:

Latency: Larger models = slower processing
Memory: Model size ranges from 90MB to 5.5GB
Throughput: Smaller models process more documents per minute
GPU Requirements: 1024D models benefit from GPU acceleration

Quick Decision Guide

Choose 384D (All-MiniLM-L6-v2) When:

Choose 768D (All-MPNet-Base-v2 or Jina) When:

Choose 1024D (BGE-Large, Stella, E5-Large) When:

Choose 1024D with 8K Context (BGE-M3) When:

Long legal documents
Academic papers
Need to preserve long-range context
Multi-language support required

Tips for Success

Start Small: Begin with 384D or 768D for prototyping
Benchmark Early: Test multiple models with your data
Measure Everything: Track accuracy, latency, and cost
Consider Total Cost: Include storage and query costs
Plan for Growth: Choose model that scales with needs

Embedding Search AI Agent Integration