Docs
AI Agent Integration Patterns

AI Agent Integration Patterns

Best practices for integrating RAG with AI agents

AI Agent Integration Patterns

Overview

Integrating RAG functionality with AI agents enhances their capabilities with domain-specific knowledge. This guide covers integration patterns, examples, and optimization tips for building RAG-augmented agents.

Why Integrate RAG with AI Agents

Benefits

  • Grounded Responses: Agents answer based on your documents, not just training data
  • Reduced Hallucinations: Facts come from verified sources
  • Domain Expertise: Agents become experts in your specific domain
  • Up-to-Date Knowledge: Update documents without retraining

Common Use Cases

Use CaseDescription
Customer Support BotAnswer support questions from knowledge base
Research AssistantSummarize and analyze research papers
Documentation HelperNavigate and explain technical docs
Business AnalystQuery business reports and metrics

Integration Patterns

Pattern 1: Direct Query Passthrough

Use Case: Simple Q&A with document context

Flow:

User asks agent → Agent calls RAG API → Agent formats response

Example:

def handle_user_query(user_query: str) -> str:
    # Call RAG API
    response = rag_client.query(user_query)
    
    # Use retrieved context to answer
    context = "\n".join([r.content for r in response.results])
    
    # Generate answer using context
    prompt = f"""Answer the user's question using this context:
    
Context:
{context}
 
Question: {user_query}
 
Answer:"""
    
    return llm.generate(prompt)

When to Use:

  • Factual Q&A systems
  • Customer support bots
  • Knowledge base queries

Pros:

  • Simple implementation
  • Fast response time
  • Easy to debug

Cons:

  • Limited reasoning capability
  • No multi-hop queries

Pattern 2: Structured Graph Reasoning

Use Case: GraphRAG with entity/relationship analysis

Flow:

User asks agent → Agent calls RAG API → Agent extracts graph data → Multi-hop reasoning

Example:

def handle_graph_query(user_query: str) -> str:
    response = rag_client.query(user_query)
    
    # Extract entities and relationships
    entities = response.graph_results.entities
    relationships = response.graph_results.relationships
    
    # Use graph structure for multi-hop reasoning
    if len(entities) > 1:
        path = find_shortest_path(entities[0], entities[1], relationships)
        return f"Connection: {' → '.join(path)}"
    
    # Fall back to standard response
    return format_standard_response(response)

When to Use:

  • Organizational analysis
  • Relationship discovery
  • Complex entity queries

Pros:

  • Multi-hop reasoning
  • Relationship discovery
  • Structured output

Cons:

  • Requires GraphRAG setup
  • More complex implementation

Pattern 3: Community-Focused Queries

Use Case: Large knowledge graphs with community detection

Flow:

User asks agent → Agent finds community → Agent queries within community

Example:

def handle_community_query(user_query: str, topic: str) -> str:
    # First, find relevant community
    communities = rag_client.search_communities(topic)
    
    if not communities:
        return "No relevant community found."
    
    # Query within community context
    response = rag_client.query(
        user_query,
        filters={"community_id": communities[0].id}
    )
    
    return format_answer(response)

When to Use:

  • Department-specific queries
  • Topic-focused analysis
  • Large graph navigation

Pros:

  • Focused results
  • Reduced noise
  • Faster queries

Cons:

  • Requires community detection
  • May miss cross-community info

MCP Integration

What is MCP

Model Context Protocol (MCP) enables AI agents to discover and use tools automatically.

Tool Discovery

Agents automatically discover RAG tools via MCP manifest:

{
  "tools": [
    {
      "name": "rag_query",
      "description": "Query the RAG knowledge base",
      "inputSchema": {
        "query": {"type": "string"},
        "project_id": {"type": "string"}
      }
    },
    {
      "name": "search_communities",
      "description": "Search for communities in the knowledge graph",
      "inputSchema": {
        "query": {"type": "string"},
        "project_id": {"type": "string"}
      }
    }
  ]
}

Available Tools

ToolDescriptionParameters
rag_queryStandard RAG queryquery, project_id, options
search_communitiesCommunity search (GraphRAG)query, project_id
edit_graphGraph editing (GraphRAG)command, project_id

MCP Example with Claude

User: "What is the return policy?"

Claude: [Automatically discovers and calls rag_query tool]
Claude: "Based on the documents, the return policy states..."

Response Handling

Parsing API Responses

def parse_rag_response(response) -> dict:
    return {
        "answer": response.response,
        "sources": [
            {
                "title": s.title,
                "score": s.similarity_score,
                "content": s.content[:200],  # Preview
                "document_id": s.document_id
            }
            for s in response.sources
        ],
        "confidence": calculate_confidence(response),
        "processing_time": response.processing_time
    }
 
def calculate_confidence(response) -> str:
    if not response.sources:
        return "no_sources"
    
    avg_score = sum(s.similarity_score for s in response.sources) / len(response.sources)
    
    if avg_score > 0.8:
        return "high"
    elif avg_score > 0.6:
        return "medium"
    else:
        return "low"

Confidence-Based Responses

def format_response_with_confidence(parsed_response) -> str:
    confidence = parsed_response["confidence"]
    
    if confidence == "high":
        prefix = "Based on the documents, "
    elif confidence == "medium":
        prefix = "According to available information, "
    elif confidence == "low":
        prefix = "I found limited information, but "
    else:
        prefix = "I don't have specific information about this, but "
    
    return f"{prefix}{parsed_response['answer']}"

Error Handling

Common Errors

ErrorAgent Response
401 Unauthorized"API authentication failed. Please check your API key."
429 Rate Limit"Too many requests. Please try again in a moment."
No results"I couldn't find relevant information in the documents."
Timeout"The query is taking longer than expected. Please try again."

Graceful Degradation

def handle_rag_error(error, user_query) -> str:
    if error.status == 429:
        return f"""I'm experiencing high traffic right now. Let me answer from my general knowledge:
        
[LLM generates answer without RAG context]
 
Note: This answer is from my general knowledge, not your documents."""
    
    elif error.status == 401:
        return "I'm having trouble accessing the knowledge base. Please contact support to resolve this issue."
    
    elif error.status == 500:
        return "There seems to be a technical issue. Please try again in a moment."
    
    else:
        return f"""I couldn't access the documents for this query. Based on my general knowledge:
 
[LLM generates answer without RAG context]"""

Optimization Tips

Caching

from functools import lru_cache
import hashlib
 
def get_cache_key(query: str) -> str:
    return hashlib.md5(query.encode()).hexdigest()
 
@lru_cache(maxsize=1000)
def cached_rag_query(query: str) -> Response:
    return rag_client.query(query)
 
def smart_query(query: str) -> Response:
    # Check cache first
    cached = cached_rag_query(query)
    if cached:
        return cached
    
    # Fall back to live query
    return rag_client.query(query)

Query Reformulation

def reformulate_query(user_query: str, conversation_history: list) -> str:
    """Add context from conversation to improve retrieval."""
    
    if not conversation_history:
        return user_query
    
    # Extract relevant context from history
    context = extract_context(conversation_history)
    
    # Reformulate with context
    return f"{context}. {user_query}"
 
def extract_context(history: list) -> str:
    # Extract entities, topics from recent messages
    # Use to provide context for current query
    pass

Response Streaming

async def stream_rag_response(query: str):
    # Stream the response for better UX
    yield "Searching documents..."
    
    response = await rag_client.query_async(query)
    
    yield f"Found {len(response.sources)} relevant documents.\n\n"
    
    # Stream the answer
    async for chunk in stream_llm_response(response):
        yield chunk

Security Considerations

API Key Management

Do:

  • Store API keys in environment variables
  • Use server-side proxy for client apps
  • Rotate keys periodically

Don't:

  • Expose API keys to end users
  • Commit keys to version control
  • Share keys across projects
# Good: Environment variable
import os
API_KEY = os.environ.get("RAG_API_KEY")
 
# Bad: Hardcoded
API_KEY = "sk-..."  # Never do this

Query Validation

def validate_query(query: str) -> bool:
    # Check length
    if len(query) > 4000:
        return False
    
    # Check for injection attempts
    dangerous_patterns = ["DROP TABLE", "DELETE FROM", "--", ";"]
    for pattern in dangerous_patterns:
        if pattern in query.upper():
            return False
    
    return True

Rate Limiting

from datetime import datetime, timedelta
 
class RateLimiter:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window = timedelta(seconds=window_seconds)
        self.requests = []
    
    def is_allowed(self) -> bool:
        now = datetime.now()
        self.requests = [r for r in self.requests if now - r < self.window]
        
        if len(self.requests) >= self.max_requests:
            return False
        
        self.requests.append(now)
        return True
 
# Usage
limiter = RateLimiter(max_requests=100, window_seconds=60)
 
if not limiter.is_allowed():
    raise RateLimitExceeded()

Integration Examples

Python Client

from guidedmind import RAGClient
 
client = RAGClient(
    api_key="your-api-key",
    project_id="your-project-id"
)
 
response = client.query(
    query="What are the main features?",
    options={
        "max_results": 3,
        "include_sources": True,
        "temperature": 0.7
    }
)
 
print(f"Answer: {response.response}")
for source in response.sources:
    print(f"Source: {source.title} ({source.similarity_score:.2f})")

JavaScript/TypeScript

import { RAGClient } from "@guidedmind/rag-client";
 
const client = new RAGClient({
  apiKey: "your-api-key",
  projectId: "your-project-id",
});
 
async function queryRAG(query: string) {
  try {
    const response = await client.query({
      query,
      options: {
        maxResults: 5,
        includeSources: true,
      },
    });
 
    console.log("Answer:", response.response);
    console.log("Sources:", response.sources);
  } catch (error) {
    console.error("Query failed:", error);
  }
}

LangChain Integration

from langchain_community.utilities import RAGAPIWrapper
 
rag = RAGAPIWrapper(
    api_key="your-api-key",
    project_id="your-project-id"
)
 
# Use as a retriever
retriever = rag.as_retriever()
 
# Use in a chain
from langchain.chains import RetrievalQA
 
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever
)
 
result = qa_chain.run("What is the return policy?")

Tips for Success

  1. Start Simple: Begin with direct query passthrough
  2. Add Confidence: Show users when answers are uncertain
  3. Handle Errors Gracefully: Never expose raw errors to users
  4. Cache Frequently: Reduce API calls and latency
  5. Monitor Usage: Track query patterns and errors
  6. Test Thoroughly: Validate with real user queries
  7. Document Limitations: Be clear about what the agent can do