Pipeline Configuration
Configure embeddings, retrieval methods, and RAG pipeline settings
Pipeline Configuration
The Pipeline Configuration step sets up the core components of your RAG system: embedding models, retrieval methods, similarity functions, and query processing templates. This step transforms your processed chunks into a searchable knowledge base.
Embedding Configuration
Embedding Model Selection
text-embedding-ada-002 (OpenAI)
- Vector Size: 1536 dimensions
- Context Length: 8192 tokens
- Best Use Cases: General-purpose embedding with strong semantic understanding
- Performance: High quality, moderate speed
- Cost: Medium
- Notes: Excellent for most applications, well-balanced performance
text-embedding-3-small (OpenAI)
- Vector Size: 1536 dimensions
- Context Length: 8192 tokens
- Best Use Cases: Cost-effective embedding for large-scale applications
- Performance: Good quality, faster processing
- Cost: Lower
- Notes: Optimized for efficiency while maintaining quality
text-embedding-3-large (OpenAI)
- Vector Size: 3072 dimensions
- Context Length: 8192 tokens
- Best Use Cases: High-precision applications requiring maximum accuracy
- Performance: Highest quality, slower processing
- Cost: Higher
- Notes: Best-in-class embedding model for critical applications
all-MiniLM-L6-v2 (Sentence Transformers)
- Vector Size: 384 dimensions
- Context Length: 512 tokens
- Best Use Cases: Lightweight applications, local deployment
- Performance: Good quality for size, very fast
- Cost: Low (can run locally)
- Notes: Excellent for resource-constrained environments
all-mpnet-base-v2 (Sentence Transformers)
- Vector Size: 768 dimensions
- Context Length: 512 tokens
- Best Use Cases: Balanced performance and efficiency
- Performance: High quality, good speed
- Cost: Low (can run locally)
- Notes: Strong semantic understanding with reasonable compute requirements
e5-large-v2 (Microsoft)
- Vector Size: 1024 dimensions
- Context Length: 512 tokens
- Best Use Cases: Multilingual applications, diverse content types
- Performance: High quality, good multilingual support
- Cost: Medium
- Notes: Excellent for international or multi-language content
Similarity Methods
Cosine Similarity
- Use Case: Most common choice for text embeddings
- Strengths: Normalizes for vector magnitude, focuses on direction
- Best For: General text similarity, semantic search
- Performance: Fast computation, widely supported
Euclidean Distance
- Use Case: When absolute magnitude matters
- Strengths: Intuitive distance measurement
- Best For: Numerical data, specific domain applications
- Performance: Fast computation, simple interpretation
Dot Product
- Use Case: When vector magnitudes carry meaning
- Strengths: Fastest computation, preserves magnitude information
- Best For: Specialized applications, performance-critical systems
- Performance: Fastest option, minimal computation
Manhattan Distance
- Use Case: Robust to outliers and noise
- Strengths: Less sensitive to individual dimension spikes
- Best For: Noisy data, high-dimensional spaces
- Performance: Moderate computation, good stability
Retrieval Methods
Custom Document Template
Description: Simple template-based retrieval using document chunks directly.
When to Use:
- Straightforward question-answering systems
- Well-structured, consistent documents
- Direct information retrieval needs
- Performance-critical applications
Configuration:
- Document Template: Customize how chunks are formatted for retrieval
- BM25 Integration: Optional hybrid search combining semantic and lexical matching
- Template Variables: Use
{context}
for chunk content, plus custom metadata fields
Template Example:
Document: {title}
Section: {section}
Content: {context}
Source: {filename}
BM25 Hybrid Search:
- Benefits: Combines exact term matching with semantic similarity
- Use Cases: Technical documents, proper nouns, specific terminology
- Performance: Slightly slower but more comprehensive results
Contextual Retrieval
Description: Enhances chunks with surrounding document context using LLM processing.
When to Use:
- Documents where context significantly affects meaning
- Complex narrative or analytical content
- Multi-step reasoning requirements
- High-quality response prioritization over speed
Configuration:
- LLM Model Selection: Choose model for context generation
- Context Template: Define how context is incorporated
- Processing Parameters: Control context generation quality vs. speed
Required Template Variables:
{full_document}
: Complete document content for context{chunk_context}
: Specific chunk content being processed
Template Example:
Document Context: {full_document}
Relevant Section: {chunk_context}
Context Summary: This section relates to [LLM-generated context]
LLM Model Options:
- GPT-4: Highest quality context generation, slower processing
- GPT-3.5-turbo: Balanced quality and speed
- Claude-3-sonnet: Strong reasoning, good context understanding
- Llama-2-70B: Open-source option, good for specific domains
ML-Optimized Contextual Retrieval
Description: Advanced contextual retrieval using machine learning to optimize context generation.
When to Use:
- High-volume applications needing contextual understanding
- Complex documents with intricate relationships
- Applications where context quality directly impacts business outcomes
- Systems requiring consistent context generation
Configuration:
- ML Context Templates: Sophisticated templates with multiple summary levels
- Optimization Parameters: Quality vs. performance trade-offs
- Batch Processing: Efficient handling of large document collections
Available Template Variables:
{chunk_context}
: Direct chunk content{full_document_summary}
: AI-generated document summary{parent_section_summary}
: Summary of containing section{parent_paragraph_summary}
: Summary of containing paragraph{paragraph_summary_topic_sentence_only}
: Key topic sentences only
Template Example:
Context: {chunk_context}
Document Summary: {full_document_summary}
Section Context: {parent_section_summary}
Key Points: {paragraph_summary_topic_sentence_only}
Query Configuration
Query Template System
Query templates control how user queries are processed and formatted for retrieval.
Basic Template Structure:
Query: {query}
Instructions: [Your specific instructions for how to handle the query]
Advanced Template Options:
User Question: {query}
Context Instructions: Search for information that directly answers the user's question.
Response Guidelines: Provide factual, concise answers based on the retrieved documents.
Fallback: If no relevant information is found, indicate this clearly.
Pre-built Query Templates
Question Answering
Question: {query}
Instructions: Find the most relevant information to answer this question directly and concisely.
Research Assistant
Research Query: {query}
Instructions: Gather comprehensive information on this topic, including multiple perspectives and supporting evidence.
Technical Support
Technical Issue: {query}
Instructions: Find troubleshooting steps, solutions, or relevant documentation for this technical problem.
Content Summarization
Summarization Request: {query}
Instructions: Locate and synthesize key information to provide a comprehensive summary of the requested topic.
Comparative Analysis
Comparison Query: {query}
Instructions: Find information that allows for comparison and analysis of the specified topics or items.
Advanced Configuration
Metadata Integration
Custom Metadata Fields:
- Use extracted or custom metadata in templates
- Filter retrieval based on metadata criteria
- Enhance context with structured information
Example with Metadata:
Document: {title}
Author: {author}
Date: {date}
Department: {department}
Content: {context}
Performance Optimization
Embedding Batch Size:
- Configure batch processing for efficient embedding generation
- Balance memory usage with processing speed
- Optimize for your hardware capabilities
Caching Strategy:
- Cache embeddings for frequently accessed content
- Implement smart cache invalidation
- Balance storage costs with retrieval speed
Index Optimization:
- Configure vector database index settings
- Optimize for your query patterns
- Balance index build time with search performance
Quality Assurance
Retrieval Testing:
- Test configurations with sample queries
- Validate retrieval quality and relevance
- A/B test different configurations
Performance Monitoring:
- Track retrieval accuracy metrics
- Monitor response times and system load
- Identify optimization opportunities
Feedback Integration:
- Incorporate user feedback into system improvements
- Implement relevance scoring and adjustment
- Continuous learning and optimization
Template Best Practices
Variable Usage
- Always include required variables (
{query}
for query templates,{context}
for document templates) - Use descriptive context around variables to guide processing
- Test templates with representative queries and documents
- Keep templates focused - avoid overly complex instructions
Context Optimization
- Match retrieval method complexity to your use case requirements
- Consider processing costs vs. quality benefits
- Test with real data to validate template effectiveness
- Monitor performance and adjust based on usage patterns
Metadata Enhancement
- Identify valuable metadata fields for your domain
- Use metadata consistently across your document collection
- Balance metadata richness with template complexity
- Validate metadata quality before relying on it
Validation and Testing
Configuration Validation
Automatic Checks:
- ✅ Required template variables present
- ✅ Embedding model compatibility
- ✅ Retrieval method configuration complete
- ✅ Performance requirements feasible
Manual Validation:
- Test queries with sample documents
- Verify template rendering and variable substitution
- Validate retrieval quality and relevance
- Confirm performance meets requirements
Performance Testing
Retrieval Speed:
- Measure query response times
- Test with varying query complexity
- Validate under expected load conditions
Quality Assessment:
- Evaluate relevance of retrieved results
- Test edge cases and complex queries
- Compare different configuration options
Prerequisites for Next Step
Before proceeding to API Setup & Endpoints:
- ✅ Embedding model selected and configured
- ✅ Similarity method chosen
- ✅ Retrieval method configured with templates
- ✅ Query templates created and tested
- ✅ Configuration validation passed
- ✅ Performance testing completed satisfactorily
The pipeline configuration created in this step will be exposed through API endpoints in the final step, where you'll set up authentication, rate limiting, and integration documentation.