Step 2 - Test Embedding Search
Verify your embedding model produces good similarity scores before deployment
Step 2: Test Embedding Search
Purpose
Before deploying your RAG system, verify that your chosen embedding model produces good similarity scores for your typical queries. This testing step is critical because:
- Catches poor configurations early before users experience bad results
- Validates embedding model choice for your specific content domain
- Provides baseline metrics for future benchmarking
- Prevents costly rework after deployment
Entry Point: Document Processing tab → "Test Embedding Search" button
Prerequisites: Documents must be processed (Step 1 complete)
Expected Outcome: Confirmed similarity scores are acceptable (0.7+)
Accessing Embedding Search
Location
Navigate to your RAG project and find the Document Processing tab. Click the "Test Embedding Search" button to open the testing interface.
UI Overview
The Embedding Search interface includes:
- Query Input Box - Enter your test queries
- Results Panel - Displays retrieved chunks with similarity scores
- Score Indicator - Visual representation of score quality
- Document Filter - Optional filter by specific documents
How Embedding Search Works
User enters query → System converts to vector → Searches chunks → Returns similarity scores
- Your query is converted to a vector using the selected embedding model
- The system searches all processed chunks for similar vectors
- Results are ranked by similarity score (0.0 to 1.0)
- Top results are displayed with their scores and source information
Understanding Similarity Scores
Similarity scores (0.0 to 1.0) indicate how well each chunk matches your query:
| Score Range | Quality | Action Required |
|---|---|---|
| 0.8 - 1.0 | Excellent | Ready to proceed |
| 0.7 - 0.8 | Good | Acceptable for most use cases |
| 0.5 - 0.7 | Fair | Consider different embedding model |
| Below 0.5 | Poor | Change embedding model required |
Score Interpretation
Score: 0.8 - 1.0 (Excellent)
The chunk is highly relevant to your query. API responses using this configuration will return accurate, on-topic results.
Score: 0.7 - 0.8 (Good)
The chunk is moderately relevant. Acceptable for most production use cases.
Score: 0.5 - 0.7 (Fair)
The chunk has some relevance but may not be what users expect. Consider adjusting your embedding model or chunking settings.
Score: Below 0.5 (Poor)
The chunk is not relevant. Your RAG system will return poor quality responses with this configuration. Change embedding model immediately.
Testing Workflow
Step 1: Prepare Test Queries
Create 5-10 representative queries your users will ask:
| Query Type | Description | Example |
|---|---|---|
| Simple Factual | Direct question with single answer | "What is the return policy?" |
| Multi-Part | Question with multiple components | "What are the return policy and refund timeline?" |
| Domain-Specific | Uses industry terminology | "What is the SLA for enterprise tier?" |
| Edge Case | Unusual or boundary query | "Can I return opened software?" |
Tips for Test Queries:
- Include queries from each category
- Use actual user questions if available
- Cover the full range of expected query types
Step 2: Run Tests
For each query:
- Enter query in the Embedding Search input box
- Press Enter or click Search
- Review top 5 results for relevance
- Note similarity scores for each result
- Mark results as relevant or irrelevant
Step 3: Evaluate Results
Ask yourself these questions:
Relevance Check:
- Are the top 3 results actually relevant to the query?
- Do similarity scores match your intuition about relevance?
- Are important documents appearing in results?
Score Distribution:
- What is the average similarity score across all queries?
- Are scores consistently above 0.7?
- Are there any queries with all scores below 0.5?
Coverage Check:
- Do results cover all your key documents?
- Are critical documents appearing for relevant queries?
Example Test Session
Query: "What is the return policy for electronics?"
Results:
─────────────────────────────────────────────────
1. [Score: 0.89] "Electronics returns accepted within 30 days..."
Source: policy.pdf, Chunk 3
2. [Score: 0.85] "Return policy overview: All products..."
Source: policy.pdf, Chunk 1
3. [Score: 0.82] "Electronics category specific rules..."
Source: electronics-faq.md, Chunk 2
4. [Score: 0.78] "Refund processing timeline..."
Source: policy.pdf, Chunk 5
5. [Score: 0.75] "Exception items: Software, DVDs..."
Source: returns.md, Chunk 4
Assessment: ✓ PASS
- All top 5 results are relevant
- Average score: 0.82 (Excellent)
- Key documents appearing in results
Tips for Better Results
If Scores Are Too Low
Option 1: Try a Larger Embedding Model
| Current Model | Upgrade To | Expected Improvement |
|---|---|---|
| all-MiniLM-L6-v2 (384D) | text-embedding-3-small (1536D) | +0.1-0.15 scores |
| text-embedding-3-small (1536D) | text-embedding-3-large (3072D) | +0.05-0.1 scores |
Expected API Improvement:
- 10-20% increase in retrieval accuracy
- Better handling of complex, multi-part queries
- More relevant similarity_score rankings
Option 2: Adjust Chunk Size
| Chunk Size | Effect | Recommendation |
|---|---|---|
| Too small (< 256 tokens) | May lose context | Increase to 512 |
| Too large (> 1024 tokens) | Diluted embeddings | Decrease to 768 |
| Sweet spot (512-768 tokens) | Balanced | Use for most cases |
Option 3: Enable BM25 Hybrid Search
Enable BM25 when:
- Technical documents with proper nouns
- Code repositories with function names
- Legal documents with specific terms
Benefits:
- Combines keyword + semantic matching
- Helps when exact terms matter
- Improves results for terminology-heavy content
If Results Are Inconsistent
Problem: Some queries get good scores, others get poor scores
Solutions:
- Check if poor-scoring queries use different terminology
- Verify all relevant documents are processed
- Consider using a more general embedding model
- Add synonyms or alternative phrasings to documents
Expected API Behavior
The similarity scores you see in Embedding Search directly translate to API responses:
API Response Example
{
"query": "What is the return policy?",
"results": [
{
"content": "Returns accepted within 30 days...",
"similarity_score": 0.87,
"source": "policy.pdf"
},
{
"content": "Return policy overview...",
"similarity_score": 0.82,
"source": "policy.pdf"
}
]
}Key Points:
similarity_scorein API matches Embedding Search scores- Same ranking algorithm is used
- What you see in testing is what API returns
Decision Criteria
When to Proceed to Step 3
Proceed to Pipeline Configuration when:
- ✅ Average similarity score > 0.7 across all test queries
- ✅ Top 3 results are relevant for most queries
- ✅ Key documents appear in results for relevant queries
- ✅ No queries have all scores below 0.5
When to Iterate
Stay in Step 2 and iterate when:
- ❌ Average similarity score < 0.5
- ❌ Top results frequently irrelevant
- ❌ Important documents missing from results
- ❌ Scores vary wildly between queries
Common Issues and Solutions
| Issue | Possible Cause | Solution |
|---|---|---|
| All scores below 0.5 | Wrong embedding model | Upgrade to larger model |
| Irrelevant top results | Chunk size too large | Reduce to 512 tokens |
| Missing key documents | Document not processed | Check processing status |
| Inconsistent scores | Mixed content types | Enable BM25 hybrid |
| Good scores but wrong answers | Chunk too small | Increase chunk size |
Next Step: Configure RAG Pipeline
Once similarity scores are acceptable, proceed to Step 3: Configure RAG Pipeline to test how embedding settings work together with retrieval configuration.
What to Bring to Step 3
- Your test queries from this step
- Baseline scores for comparison
- Any notes on problematic queries
- Selected embedding model confirmation
Tips for Success
- Test with Real Queries: Use actual user questions when available
- Document Your Baseline: Record scores for future comparison
- Test Edge Cases: Include unusual queries in your test set
- Iterate Quickly: Don't hesitate to try different models
- Trust the Scores: Low scores predict poor API performance
