AI Agent Integration Patterns

Docs

Best practices for integrating RAG with AI agents

AI Agent Integration Patterns

Overview

Integrating RAG functionality with AI agents enhances their capabilities with domain-specific knowledge. This guide covers integration patterns, examples, and optimization tips for building RAG-augmented agents.

Why Integrate RAG with AI Agents

Benefits

Grounded Responses: Agents answer based on your documents, not just training data
Reduced Hallucinations: Facts come from verified sources
Domain Expertise: Agents become experts in your specific domain
Up-to-Date Knowledge: Update documents without retraining

Common Use Cases

Use Case	Description
Customer Support Bot	Answer support questions from knowledge base
Research Assistant	Summarize and analyze research papers
Documentation Helper	Navigate and explain technical docs
Business Analyst	Query business reports and metrics

Integration Patterns

Pattern 1: Direct Query Passthrough

Use Case: Simple Q&A with document context

Flow:

User asks agent → Agent calls RAG API → Agent formats response

Example:

def handle_user_query(user_query: str) -> str:
    # Call RAG API
    response = rag_client.query(user_query)
    
    # Use retrieved context to answer
    context = "\n".join([r.content for r in response.results])
    
    # Generate answer using context
    prompt = f"""Answer the user's question using this context:
    
Context:
{context}
 
Question: {user_query}
 
Answer:"""
    
    return llm.generate(prompt)

When to Use:

Factual Q&A systems
Customer support bots
Knowledge base queries

Pros:

Simple implementation
Fast response time
Easy to debug

Cons:

Limited reasoning capability
No multi-hop queries

Pattern 2: Structured Graph Reasoning

Use Case: GraphRAG with entity/relationship analysis

Flow:

User asks agent → Agent calls RAG API → Agent extracts graph data → Multi-hop reasoning

Example:

def handle_graph_query(user_query: str) -> str:
    response = rag_client.query(user_query)
    
    # Extract entities and relationships
    entities = response.graph_results.entities
    relationships = response.graph_results.relationships
    
    # Use graph structure for multi-hop reasoning
    if len(entities) > 1:
        path = find_shortest_path(entities[0], entities[1], relationships)
        return f"Connection: {' → '.join(path)}"
    
    # Fall back to standard response
    return format_standard_response(response)

When to Use:

Organizational analysis
Relationship discovery
Complex entity queries

Pros:

Multi-hop reasoning
Relationship discovery
Structured output

Cons:

Requires GraphRAG setup
More complex implementation

Pattern 3: Community-Focused Queries

Use Case: Large knowledge graphs with community detection

Flow:

User asks agent → Agent finds community → Agent queries within community

Example:

def handle_community_query(user_query: str, topic: str) -> str:
    # First, find relevant community
    communities = rag_client.search_communities(topic)
    
    if not communities:
        return "No relevant community found."
    
    # Query within community context
    response = rag_client.query(
        user_query,
        filters={"community_id": communities[0].id}
    )
    
    return format_answer(response)

When to Use:

Department-specific queries
Topic-focused analysis
Large graph navigation

Pros:

Focused results
Reduced noise
Faster queries

Cons:

Requires community detection
May miss cross-community info

MCP Integration

What is MCP

Model Context Protocol (MCP) enables AI agents to discover and use tools automatically.

Tool Discovery

Agents automatically discover RAG tools via MCP manifest:

{
  "tools": [
    {
      "name": "rag_query",
      "description": "Query the RAG knowledge base",
      "inputSchema": {
        "query": {"type": "string"},
        "project_id": {"type": "string"}
      }
    },
    {
      "name": "search_communities",
      "description": "Search for communities in the knowledge graph",
      "inputSchema": {
        "query": {"type": "string"},
        "project_id": {"type": "string"}
      }
    }
  ]
}

Available Tools

Tool	Description	Parameters
`rag_query`	Standard RAG query	query, project_id, options
`search_communities`	Community search (GraphRAG)	query, project_id
`edit_graph`	Graph editing (GraphRAG)	command, project_id

MCP Example with Claude

User: "What is the return policy?"

Claude: [Automatically discovers and calls rag_query tool]
Claude: "Based on the documents, the return policy states..."

Response Handling

Parsing API Responses

def parse_rag_response(response) -> dict:
    return {
        "answer": response.response,
        "sources": [
            {
                "title": s.title,
                "score": s.similarity_score,
                "content": s.content[:200],  # Preview
                "document_id": s.document_id
            }
            for s in response.sources
        ],
        "confidence": calculate_confidence(response),
        "processing_time": response.processing_time
    }
 
def calculate_confidence(response) -> str:
    if not response.sources:
        return "no_sources"
    
    avg_score = sum(s.similarity_score for s in response.sources) / len(response.sources)
    
    if avg_score > 0.8:
        return "high"
    elif avg_score > 0.6:
        return "medium"
    else:
        return "low"

Confidence-Based Responses

def format_response_with_confidence(parsed_response) -> str:
    confidence = parsed_response["confidence"]
    
    if confidence == "high":
        prefix = "Based on the documents, "
    elif confidence == "medium":
        prefix = "According to available information, "
    elif confidence == "low":
        prefix = "I found limited information, but "
    else:
        prefix = "I don't have specific information about this, but "
    
    return f"{prefix}{parsed_response['answer']}"

Error Handling

Common Errors

Error	Agent Response
401 Unauthorized	"API authentication failed. Please check your API key."
429 Rate Limit	"Too many requests. Please try again in a moment."
No results	"I couldn't find relevant information in the documents."
Timeout	"The query is taking longer than expected. Please try again."

Graceful Degradation

def handle_rag_error(error, user_query) -> str:
    if error.status == 429:
        return f"""I'm experiencing high traffic right now. Let me answer from my general knowledge:
        
[LLM generates answer without RAG context]
 
Note: This answer is from my general knowledge, not your documents."""
    
    elif error.status == 401:
        return "I'm having trouble accessing the knowledge base. Please contact support to resolve this issue."
    
    elif error.status == 500:
        return "There seems to be a technical issue. Please try again in a moment."
    
    else:
        return f"""I couldn't access the documents for this query. Based on my general knowledge:
 
[LLM generates answer without RAG context]"""

Optimization Tips

Caching

from functools import lru_cache
import hashlib
 
def get_cache_key(query: str) -> str:
    return hashlib.md5(query.encode()).hexdigest()
 
@lru_cache(maxsize=1000)
def cached_rag_query(query: str) -> Response:
    return rag_client.query(query)
 
def smart_query(query: str) -> Response:
    # Check cache first
    cached = cached_rag_query(query)
    if cached:
        return cached
    
    # Fall back to live query
    return rag_client.query(query)

Query Reformulation

def reformulate_query(user_query: str, conversation_history: list) -> str:
    """Add context from conversation to improve retrieval."""
    
    if not conversation_history:
        return user_query
    
    # Extract relevant context from history
    context = extract_context(conversation_history)
    
    # Reformulate with context
    return f"{context}. {user_query}"
 
def extract_context(history: list) -> str:
    # Extract entities, topics from recent messages
    # Use to provide context for current query
    pass

Response Streaming

async def stream_rag_response(query: str):
    # Stream the response for better UX
    yield "Searching documents..."
    
    response = await rag_client.query_async(query)
    
    yield f"Found {len(response.sources)} relevant documents.\n\n"
    
    # Stream the answer
    async for chunk in stream_llm_response(response):
        yield chunk

Security Considerations

API Key Management

Do:

Store API keys in environment variables
Use server-side proxy for client apps
Rotate keys periodically

Don't:

Expose API keys to end users
Commit keys to version control
Share keys across projects

# Good: Environment variable
import os
API_KEY = os.environ.get("RAG_API_KEY")
 
# Bad: Hardcoded
API_KEY = "sk-..."  # Never do this

Query Validation

def validate_query(query: str) -> bool:
    # Check length
    if len(query) > 4000:
        return False
    
    # Check for injection attempts
    dangerous_patterns = ["DROP TABLE", "DELETE FROM", "--", ";"]
    for pattern in dangerous_patterns:
        if pattern in query.upper():
            return False
    
    return True

Rate Limiting

from datetime import datetime, timedelta
 
class RateLimiter:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window = timedelta(seconds=window_seconds)
        self.requests = []
    
    def is_allowed(self) -> bool:
        now = datetime.now()
        self.requests = [r for r in self.requests if now - r < self.window]
        
        if len(self.requests) >= self.max_requests:
            return False
        
        self.requests.append(now)
        return True
 
# Usage
limiter = RateLimiter(max_requests=100, window_seconds=60)
 
if not limiter.is_allowed():
    raise RateLimitExceeded()

Integration Examples

Python Client

from guidedmind import RAGClient
 
client = RAGClient(
    api_key="your-api-key",
    project_id="your-project-id"
)
 
response = client.query(
    query="What are the main features?",
    options={
        "max_results": 3,
        "include_sources": True,
        "temperature": 0.7
    }
)
 
print(f"Answer: {response.response}")
for source in response.sources:
    print(f"Source: {source.title} ({source.similarity_score:.2f})")

JavaScript/TypeScript

import { RAGClient } from "@guidedmind/rag-client";
 
const client = new RAGClient({
  apiKey: "your-api-key",
  projectId: "your-project-id",
});
 
async function queryRAG(query: string) {
  try {
    const response = await client.query({
      query,
      options: {
        maxResults: 5,
        includeSources: true,
      },
    });
 
    console.log("Answer:", response.response);
    console.log("Sources:", response.sources);
  } catch (error) {
    console.error("Query failed:", error);
  }
}

LangChain Integration

from langchain_community.utilities import RAGAPIWrapper
 
rag = RAGAPIWrapper(
    api_key="your-api-key",
    project_id="your-project-id"
)
 
# Use as a retriever
retriever = rag.as_retriever()
 
# Use in a chain
from langchain.chains import RetrievalQA
 
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever
)
 
result = qa_chain.run("What is the return policy?")

Tips for Success

Start Simple: Begin with direct query passthrough
Add Confidence: Show users when answers are uncertain
Handle Errors Gracefully: Never expose raw errors to users
Cache Frequently: Reduce API calls and latency
Monitor Usage: Track query patterns and errors
Test Thoroughly: Validate with real user queries
Document Limitations: Be clear about what the agent can do

Embedding Model Selection Analytics Overview