Docs
Step 3 - Configure RAG Pipeline

Step 3 - Configure RAG Pipeline

Test how embedding settings work with retrieval configuration to produce final answers

Step 3: Configure RAG Pipeline

Purpose

Test how your embedding settings work together with retrieval configuration to produce final answers. While Step 2 tested embedding search in isolation, this step validates the complete pipeline from query to answer.

Entry Point: Pipeline Configuration tab → "Test Pipeline" button

Prerequisites: Embedding search completed with acceptable scores (Step 2)

Expected Outcome: Optimized Top-K and retrieval settings for your use case

Accessing Pipeline Test

Location

Navigate to your RAG project and find the Pipeline Configuration tab. Click the "Test Pipeline" button to open the testing interface.

UI Overview

The Pipeline Test interface includes:

  1. Query Input Box - Enter test queries
  2. Configuration Panel - Adjust Top-K, BM25, retrieval method
  3. Results Panel - Shows retrieved chunks and generated answer
  4. Processing Metrics - Display latency and token usage

Pipeline Components

User Query → [Query Processing] → [Vector Search] → [BM25 (optional)] → [Re-ranking] → [Top-K Selection] → [Context Assembly] → LLM → Answer

Component Overview

ComponentPurposeConfigurable
Query ProcessingPrepares user queryNo
Vector SearchSemantic similarity searchEmbedding model
BM25 SearchKeyword matchingOn/Off toggle
Re-rankingSorts combined resultsRanking method
Top-K SelectionSelects chunks for contextK value
Context AssemblyFormats chunks for LLMTemplate
LLMGenerates final answerModel, temperature

Configuration Options

Top-K Results

Top-K determines how many chunks are included in the context sent to the LLM.

Top-K ValueUse CaseTrade-offs
3-5Simple Q&A, concise answersFaster, may miss context
5-10Standard use caseBalanced
10-20Complex analysis, researchMore context, slower, higher token cost

Recommendations:

  • Start with Top-K = 5 for most use cases
  • Increase if answers lack context or detail
  • Decrease if answers are verbose or off-topic

BM25 combines keyword matching with semantic search.

ScenarioBM25 RecommendedWhy
Technical documentation✅ YesProper nouns, version numbers
Code repositories✅ YesExact function names
Legal documents✅ YesSpecific terms matter
General FAQ❌ NoSemantic search sufficient
Creative content❌ NoMeaning over exact terms

Expected Impact:

  • Improves results for term-specific queries
  • May slightly increase processing time
  • Adds keyword_score to results

Retrieval Methods

MethodDescriptionBest For
Standard VectorPure semantic searchGeneral purpose
Hybrid (BM25 + Vector)Keyword + semantic combinedTechnical, specific terms
Contextual RetrievalLLM-enhanced contextComplex documents
ML-OptimizedMulti-level summariesHierarchical content

Testing Workflow

Step 1: Use Same Test Queries from Step 2

Using the same queries enables:

  • Direct comparison (embedding-only vs. full pipeline)
  • Identification of pipeline-specific issues
  • Validation that pipeline improves results

Step 2: Adjust Top-K

Start with Top-K = 5:

  1. Run test query with Top-K = 5
  2. Review answer quality
  3. Note processing time

Increase if:

  • Answers lack context or detail
  • Important information is missing
  • Users would need follow-up questions

Decrease if:

  • Answers are verbose or repetitive
  • Processing time is too slow (> 3s)
  • Irrelevant information is included

Step 3: Toggle BM25

Test with BM25 Off:

  1. Run test query
  2. Note results and scores
  3. Review answer

Test with BM25 On:

  1. Run same test query
  2. Compare results and scores
  3. Note any changes in answer quality

Decision Criteria:

  • If BM25 improves technical query results → Enable
  • If BM25 adds no value → Disable
  • If BM25 slows processing significantly → Consider disabling

Step 4: Review Full Response

Check these aspects:

Answer Accuracy:

  • Does the answer correctly address the query?
  • Are facts accurate based on source documents?
  • Is the answer complete?

Source Attribution:

  • Are sources correctly identified?
  • Do sources actually contain the information?
  • Is the source list comprehensive?

Response Coherence:

  • Is the answer well-structured?
  • Does it flow logically?
  • Is the tone appropriate?

Processing Time:

  • Is response time acceptable (< 2s typical)?
  • Does time increase with higher Top-K?
  • Is BM25 impact acceptable?

Expected Output

Complete Pipeline Test Output

Query: "What is the return policy for electronics?"

Settings: Top-K=5, BM25=Enabled
─────────────────────────────────────────────────

Retrieved Chunks:
1. [Score: 0.89] "Electronics returns accepted within 30 days..."
   Source: policy.pdf, Chunk 3
   
2. [Score: 0.85] "Return policy overview: All products..."
   Source: policy.pdf, Chunk 1
   
3. [Score: 0.82] "Electronics category specific rules..."
   Source: electronics-faq.md, Chunk 2
   
4. [Score: 0.78] "Refund processing timeline..."
   Source: policy.pdf, Chunk 5
   
5. [Score: 0.75] "Exception items: Software, DVDs..."
   Source: returns.md, Chunk 4

Generated Answer:
"Electronics can be returned within 30 days of purchase.
Refunds are processed within 5-7 business days. Note that
opened software and DVDs are exceptions and cannot be returned."

Sources:
- policy.pdf (page 3)
- electronics-faq.md

Processing Time: 1.23s
Token Usage: 156 tokens

How Settings Affect API Output

Top-K Impact

Top-K = 3:

{
  "results": [/* 3 chunks */],
  "response": "Concise answer with limited context",
  "processing_time": 0.8
}

Top-K = 10:

{
  "results": [/* 10 chunks */],
  "response": "Comprehensive answer with more detail",
  "processing_time": 2.1
}

BM25 Impact

BM25 Disabled:

{
  "results": [
    { "similarity_score": 0.85, "content": "..." }
  ]
}

BM25 Enabled:

{
  "results": [
    { 
      "similarity_score": 0.82,
      "keyword_score": 0.91,
      "combined_score": 0.87,
      "content": "..."
    }
  ]
}

Optimization Tips

If Answers Lack Context

Solutions:

  1. Increase Top-K (5 → 10)
  2. Enable Contextual Retrieval
  3. Check chunk overlap settings
  4. Verify all relevant documents are processed

If Answers Are Too Verbose

Solutions:

  1. Decrease Top-K (10 → 5)
  2. Reduce max_tokens in API settings
  3. Check for duplicate chunks
  4. Review chunk size (may be too large)

If Wrong Documents Retrieved

Solutions:

  1. Revisit Step 2 (Embedding Search)
  2. Enable BM25 for term-specific queries
  3. Check document metadata filters
  4. Consider different embedding model

If Processing Is Too Slow

Solutions:

  1. Decrease Top-K
  2. Disable BM25 if not needed
  3. Use smaller embedding model
  4. Check server resources

Decision Criteria

When to Proceed to Step 4

Proceed to API Deployment when:

  • ✅ Answers are accurate and coherent
  • ✅ Top-K optimized for your use case
  • ✅ BM25 setting appropriate
  • ✅ Processing time acceptable (< 2s typical)
  • ✅ Source attribution is correct

When to Iterate

Stay in Step 3 and iterate when:

  • ❌ Answers frequently inaccurate
  • ❌ Processing time too slow (> 5s)
  • ❌ Sources don't match answer
  • ❌ Top-K not optimized

Common Issues and Solutions

IssuePossible CauseSolution
Answer doesn't match sourcesWrong Top-KAdjust Top-K value
Slow processingHigh Top-KReduce to 5-10
Missing key infoBM25 disabledEnable for technical terms
Verbose answersTop-K too highDecrease to 3-5
Inconsistent answersRetrieval method mismatchTry different method

Next Step: Deploy API Endpoint

Once pipeline configuration is optimized, proceed to Step 4: Deploy API Endpoint to configure and test your production API.

What to Bring to Step 4

  1. Optimized Top-K value for your use case
  2. BM25 setting (enabled/disabled)
  3. Selected retrieval method
  4. Baseline processing time for monitoring

Tips for Success

  1. Start Conservative: Begin with Top-K = 5, BM25 off
  2. Test Incrementally: Change one setting at a time
  3. Measure Everything: Record processing times and scores
  4. Use Real Queries: Test with actual user questions
  5. Document Decisions: Note why you chose each setting