Step 3 - Configure RAG Pipeline

Docs

Test how embedding settings work with retrieval configuration to produce final answers

Step 3: Configure RAG Pipeline

Purpose

Test how your embedding settings work together with retrieval configuration to produce final answers. While Step 2 tested embedding search in isolation, this step validates the complete pipeline from query to answer.

Entry Point: Pipeline Configuration tab → "Test Pipeline" button

Prerequisites: Embedding search completed with acceptable scores (Step 2)

Expected Outcome: Optimized Top-K and retrieval settings for your use case

Accessing Pipeline Test

Location

Navigate to your RAG project and find the Pipeline Configuration tab. Click the "Test Pipeline" button to open the testing interface.

UI Overview

The Pipeline Test interface includes:

Query Input Box - Enter test queries
Configuration Panel - Adjust Top-K, BM25, retrieval method
Results Panel - Shows retrieved chunks and generated answer
Processing Metrics - Display latency and token usage

Pipeline Components

User Query → [Query Processing] → [Vector Search] → [BM25 (optional)] → [Re-ranking] → [Top-K Selection] → [Context Assembly] → LLM → Answer

Component Overview

Component	Purpose	Configurable
Query Processing	Prepares user query	No
Vector Search	Semantic similarity search	Embedding model
BM25 Search	Keyword matching	On/Off toggle
Re-ranking	Sorts combined results	Ranking method
Top-K Selection	Selects chunks for context	K value
Context Assembly	Formats chunks for LLM	Template
LLM	Generates final answer	Model, temperature

Configuration Options

Top-K Results

Top-K determines how many chunks are included in the context sent to the LLM.

Top-K Value	Use Case	Trade-offs
3-5	Simple Q&A, concise answers	Faster, may miss context
5-10	Standard use case	Balanced
10-20	Complex analysis, research	More context, slower, higher token cost

Recommendations:

Start with Top-K = 5 for most use cases
Increase if answers lack context or detail
Decrease if answers are verbose or off-topic

BM25 Hybrid Search

BM25 combines keyword matching with semantic search.

Scenario	BM25 Recommended	Why
Technical documentation	✅ Yes	Proper nouns, version numbers
Code repositories	✅ Yes	Exact function names
Legal documents	✅ Yes	Specific terms matter
General FAQ	❌ No	Semantic search sufficient
Creative content	❌ No	Meaning over exact terms

Expected Impact:

Improves results for term-specific queries
May slightly increase processing time
Adds keyword_score to results

Retrieval Methods

Method	Description	Best For
Standard Vector	Pure semantic search	General purpose
Hybrid (BM25 + Vector)	Keyword + semantic combined	Technical, specific terms
Contextual Retrieval	LLM-enhanced context	Complex documents
ML-Optimized	Multi-level summaries	Hierarchical content

Testing Workflow

Step 1: Use Same Test Queries from Step 2

Using the same queries enables:

Direct comparison (embedding-only vs. full pipeline)
Identification of pipeline-specific issues
Validation that pipeline improves results

Step 2: Adjust Top-K

Start with Top-K = 5:

Run test query with Top-K = 5
Review answer quality
Note processing time

Increase if:

Answers lack context or detail
Important information is missing
Users would need follow-up questions

Decrease if:

Answers are verbose or repetitive
Processing time is too slow (> 3s)
Irrelevant information is included

Step 3: Toggle BM25

Test with BM25 Off:

Run test query
Note results and scores
Review answer

Test with BM25 On:

Run same test query
Compare results and scores
Note any changes in answer quality

Decision Criteria:

If BM25 improves technical query results → Enable
If BM25 adds no value → Disable
If BM25 slows processing significantly → Consider disabling

Step 4: Review Full Response

Check these aspects:

Answer Accuracy:

Does the answer correctly address the query?
Are facts accurate based on source documents?
Is the answer complete?

Source Attribution:

Are sources correctly identified?
Do sources actually contain the information?
Is the source list comprehensive?

Response Coherence:

Is the answer well-structured?
Does it flow logically?
Is the tone appropriate?

Processing Time:

Is response time acceptable (< 2s typical)?
Does time increase with higher Top-K?
Is BM25 impact acceptable?

Expected Output

Complete Pipeline Test Output

Query: "What is the return policy for electronics?"

Settings: Top-K=5, BM25=Enabled
─────────────────────────────────────────────────

Retrieved Chunks:
1. [Score: 0.89] "Electronics returns accepted within 30 days..."
   Source: policy.pdf, Chunk 3
   
2. [Score: 0.85] "Return policy overview: All products..."
   Source: policy.pdf, Chunk 1
   
3. [Score: 0.82] "Electronics category specific rules..."
   Source: electronics-faq.md, Chunk 2
   
4. [Score: 0.78] "Refund processing timeline..."
   Source: policy.pdf, Chunk 5
   
5. [Score: 0.75] "Exception items: Software, DVDs..."
   Source: returns.md, Chunk 4

Generated Answer:
"Electronics can be returned within 30 days of purchase.
Refunds are processed within 5-7 business days. Note that
opened software and DVDs are exceptions and cannot be returned."

Sources:
- policy.pdf (page 3)
- electronics-faq.md

Processing Time: 1.23s
Token Usage: 156 tokens

How Settings Affect API Output

Top-K Impact

Top-K = 3:

{
  "results": [/* 3 chunks */],
  "response": "Concise answer with limited context",
  "processing_time": 0.8
}

Top-K = 10:

{
  "results": [/* 10 chunks */],
  "response": "Comprehensive answer with more detail",
  "processing_time": 2.1
}

BM25 Impact

BM25 Disabled:

{
  "results": [
    { "similarity_score": 0.85, "content": "..." }
  ]
}

BM25 Enabled:

{
  "results": [
    { 
      "similarity_score": 0.82,
      "keyword_score": 0.91,
      "combined_score": 0.87,
      "content": "..."
    }
  ]
}

Optimization Tips

If Answers Lack Context

Solutions:

Increase Top-K (5 → 10)
Enable Contextual Retrieval
Check chunk overlap settings
Verify all relevant documents are processed

If Answers Are Too Verbose

Solutions:

Decrease Top-K (10 → 5)
Reduce max_tokens in API settings
Check for duplicate chunks
Review chunk size (may be too large)

If Wrong Documents Retrieved

Solutions:

Revisit Step 2 (Embedding Search)
Enable BM25 for term-specific queries
Check document metadata filters
Consider different embedding model

If Processing Is Too Slow

Solutions:

Decrease Top-K
Disable BM25 if not needed
Use smaller embedding model
Check server resources

Decision Criteria

When to Proceed to Step 4

Proceed to API Deployment when:

✅ Answers are accurate and coherent
✅ Top-K optimized for your use case
✅ BM25 setting appropriate
✅ Processing time acceptable (< 2s typical)
✅ Source attribution is correct

When to Iterate

Stay in Step 3 and iterate when:

❌ Answers frequently inaccurate
❌ Processing time too slow (> 5s)
❌ Sources don't match answer
❌ Top-K not optimized

Common Issues and Solutions

Issue	Possible Cause	Solution
Answer doesn't match sources	Wrong Top-K	Adjust Top-K value
Slow processing	High Top-K	Reduce to 5-10
Missing key info	BM25 disabled	Enable for technical terms
Verbose answers	Top-K too high	Decrease to 3-5
Inconsistent answers	Retrieval method mismatch	Try different method

Next Step: Deploy API Endpoint

Once pipeline configuration is optimized, proceed to Step 4: Deploy API Endpoint to configure and test your production API.

What to Bring to Step 4

Optimized Top-K value for your use case
BM25 setting (enabled/disabled)
Selected retrieval method
Baseline processing time for monitoring

Tips for Success

Start Conservative: Begin with Top-K = 5, BM25 off
Test Incrementally: Change one setting at a time
Measure Everything: Record processing times and scores
Use Real Queries: Test with actual user questions
Document Decisions: Note why you chose each setting

Step 2: Test Embedding Step 4: Deploy API