Step 3 - Configure RAG Pipeline
Test how embedding settings work with retrieval configuration to produce final answers
Step 3: Configure RAG Pipeline
Purpose
Test how your embedding settings work together with retrieval configuration to produce final answers. While Step 2 tested embedding search in isolation, this step validates the complete pipeline from query to answer.
Entry Point: Pipeline Configuration tab → "Test Pipeline" button
Prerequisites: Embedding search completed with acceptable scores (Step 2)
Expected Outcome: Optimized Top-K and retrieval settings for your use case
Accessing Pipeline Test
Location
Navigate to your RAG project and find the Pipeline Configuration tab. Click the "Test Pipeline" button to open the testing interface.
UI Overview
The Pipeline Test interface includes:
- Query Input Box - Enter test queries
- Configuration Panel - Adjust Top-K, BM25, retrieval method
- Results Panel - Shows retrieved chunks and generated answer
- Processing Metrics - Display latency and token usage
Pipeline Components
User Query → [Query Processing] → [Vector Search] → [BM25 (optional)] → [Re-ranking] → [Top-K Selection] → [Context Assembly] → LLM → Answer
Component Overview
| Component | Purpose | Configurable |
|---|---|---|
| Query Processing | Prepares user query | No |
| Vector Search | Semantic similarity search | Embedding model |
| BM25 Search | Keyword matching | On/Off toggle |
| Re-ranking | Sorts combined results | Ranking method |
| Top-K Selection | Selects chunks for context | K value |
| Context Assembly | Formats chunks for LLM | Template |
| LLM | Generates final answer | Model, temperature |
Configuration Options
Top-K Results
Top-K determines how many chunks are included in the context sent to the LLM.
| Top-K Value | Use Case | Trade-offs |
|---|---|---|
| 3-5 | Simple Q&A, concise answers | Faster, may miss context |
| 5-10 | Standard use case | Balanced |
| 10-20 | Complex analysis, research | More context, slower, higher token cost |
Recommendations:
- Start with Top-K = 5 for most use cases
- Increase if answers lack context or detail
- Decrease if answers are verbose or off-topic
BM25 Hybrid Search
BM25 combines keyword matching with semantic search.
| Scenario | BM25 Recommended | Why |
|---|---|---|
| Technical documentation | ✅ Yes | Proper nouns, version numbers |
| Code repositories | ✅ Yes | Exact function names |
| Legal documents | ✅ Yes | Specific terms matter |
| General FAQ | ❌ No | Semantic search sufficient |
| Creative content | ❌ No | Meaning over exact terms |
Expected Impact:
- Improves results for term-specific queries
- May slightly increase processing time
- Adds keyword_score to results
Retrieval Methods
| Method | Description | Best For |
|---|---|---|
| Standard Vector | Pure semantic search | General purpose |
| Hybrid (BM25 + Vector) | Keyword + semantic combined | Technical, specific terms |
| Contextual Retrieval | LLM-enhanced context | Complex documents |
| ML-Optimized | Multi-level summaries | Hierarchical content |
Testing Workflow
Step 1: Use Same Test Queries from Step 2
Using the same queries enables:
- Direct comparison (embedding-only vs. full pipeline)
- Identification of pipeline-specific issues
- Validation that pipeline improves results
Step 2: Adjust Top-K
Start with Top-K = 5:
- Run test query with Top-K = 5
- Review answer quality
- Note processing time
Increase if:
- Answers lack context or detail
- Important information is missing
- Users would need follow-up questions
Decrease if:
- Answers are verbose or repetitive
- Processing time is too slow (> 3s)
- Irrelevant information is included
Step 3: Toggle BM25
Test with BM25 Off:
- Run test query
- Note results and scores
- Review answer
Test with BM25 On:
- Run same test query
- Compare results and scores
- Note any changes in answer quality
Decision Criteria:
- If BM25 improves technical query results → Enable
- If BM25 adds no value → Disable
- If BM25 slows processing significantly → Consider disabling
Step 4: Review Full Response
Check these aspects:
Answer Accuracy:
- Does the answer correctly address the query?
- Are facts accurate based on source documents?
- Is the answer complete?
Source Attribution:
- Are sources correctly identified?
- Do sources actually contain the information?
- Is the source list comprehensive?
Response Coherence:
- Is the answer well-structured?
- Does it flow logically?
- Is the tone appropriate?
Processing Time:
- Is response time acceptable (< 2s typical)?
- Does time increase with higher Top-K?
- Is BM25 impact acceptable?
Expected Output
Complete Pipeline Test Output
Query: "What is the return policy for electronics?"
Settings: Top-K=5, BM25=Enabled
─────────────────────────────────────────────────
Retrieved Chunks:
1. [Score: 0.89] "Electronics returns accepted within 30 days..."
Source: policy.pdf, Chunk 3
2. [Score: 0.85] "Return policy overview: All products..."
Source: policy.pdf, Chunk 1
3. [Score: 0.82] "Electronics category specific rules..."
Source: electronics-faq.md, Chunk 2
4. [Score: 0.78] "Refund processing timeline..."
Source: policy.pdf, Chunk 5
5. [Score: 0.75] "Exception items: Software, DVDs..."
Source: returns.md, Chunk 4
Generated Answer:
"Electronics can be returned within 30 days of purchase.
Refunds are processed within 5-7 business days. Note that
opened software and DVDs are exceptions and cannot be returned."
Sources:
- policy.pdf (page 3)
- electronics-faq.md
Processing Time: 1.23s
Token Usage: 156 tokens
How Settings Affect API Output
Top-K Impact
Top-K = 3:
{
"results": [/* 3 chunks */],
"response": "Concise answer with limited context",
"processing_time": 0.8
}Top-K = 10:
{
"results": [/* 10 chunks */],
"response": "Comprehensive answer with more detail",
"processing_time": 2.1
}BM25 Impact
BM25 Disabled:
{
"results": [
{ "similarity_score": 0.85, "content": "..." }
]
}BM25 Enabled:
{
"results": [
{
"similarity_score": 0.82,
"keyword_score": 0.91,
"combined_score": 0.87,
"content": "..."
}
]
}Optimization Tips
If Answers Lack Context
Solutions:
- Increase Top-K (5 → 10)
- Enable Contextual Retrieval
- Check chunk overlap settings
- Verify all relevant documents are processed
If Answers Are Too Verbose
Solutions:
- Decrease Top-K (10 → 5)
- Reduce max_tokens in API settings
- Check for duplicate chunks
- Review chunk size (may be too large)
If Wrong Documents Retrieved
Solutions:
- Revisit Step 2 (Embedding Search)
- Enable BM25 for term-specific queries
- Check document metadata filters
- Consider different embedding model
If Processing Is Too Slow
Solutions:
- Decrease Top-K
- Disable BM25 if not needed
- Use smaller embedding model
- Check server resources
Decision Criteria
When to Proceed to Step 4
Proceed to API Deployment when:
- ✅ Answers are accurate and coherent
- ✅ Top-K optimized for your use case
- ✅ BM25 setting appropriate
- ✅ Processing time acceptable (< 2s typical)
- ✅ Source attribution is correct
When to Iterate
Stay in Step 3 and iterate when:
- ❌ Answers frequently inaccurate
- ❌ Processing time too slow (> 5s)
- ❌ Sources don't match answer
- ❌ Top-K not optimized
Common Issues and Solutions
| Issue | Possible Cause | Solution |
|---|---|---|
| Answer doesn't match sources | Wrong Top-K | Adjust Top-K value |
| Slow processing | High Top-K | Reduce to 5-10 |
| Missing key info | BM25 disabled | Enable for technical terms |
| Verbose answers | Top-K too high | Decrease to 3-5 |
| Inconsistent answers | Retrieval method mismatch | Try different method |
Next Step: Deploy API Endpoint
Once pipeline configuration is optimized, proceed to Step 4: Deploy API Endpoint to configure and test your production API.
What to Bring to Step 4
- Optimized Top-K value for your use case
- BM25 setting (enabled/disabled)
- Selected retrieval method
- Baseline processing time for monitoring
Tips for Success
- Start Conservative: Begin with Top-K = 5, BM25 off
- Test Incrementally: Change one setting at a time
- Measure Everything: Record processing times and scores
- Use Real Queries: Test with actual user questions
- Document Decisions: Note why you chose each setting
