Docs
AI Agent Integration Patterns
AI Agent Integration Patterns
Best practices for integrating RAG with AI agents
AI Agent Integration Patterns
Overview
Integrating RAG functionality with AI agents enhances their capabilities with domain-specific knowledge. This guide covers integration patterns, examples, and optimization tips for building RAG-augmented agents.
Why Integrate RAG with AI Agents
Benefits
- Grounded Responses: Agents answer based on your documents, not just training data
- Reduced Hallucinations: Facts come from verified sources
- Domain Expertise: Agents become experts in your specific domain
- Up-to-Date Knowledge: Update documents without retraining
Common Use Cases
| Use Case | Description |
|---|---|
| Customer Support Bot | Answer support questions from knowledge base |
| Research Assistant | Summarize and analyze research papers |
| Documentation Helper | Navigate and explain technical docs |
| Business Analyst | Query business reports and metrics |
Integration Patterns
Pattern 1: Direct Query Passthrough
Use Case: Simple Q&A with document context
Flow:
User asks agent → Agent calls RAG API → Agent formats response
Example:
def handle_user_query(user_query: str) -> str:
# Call RAG API
response = rag_client.query(user_query)
# Use retrieved context to answer
context = "\n".join([r.content for r in response.results])
# Generate answer using context
prompt = f"""Answer the user's question using this context:
Context:
{context}
Question: {user_query}
Answer:"""
return llm.generate(prompt)When to Use:
- Factual Q&A systems
- Customer support bots
- Knowledge base queries
Pros:
- Simple implementation
- Fast response time
- Easy to debug
Cons:
- Limited reasoning capability
- No multi-hop queries
Pattern 2: Structured Graph Reasoning
Use Case: GraphRAG with entity/relationship analysis
Flow:
User asks agent → Agent calls RAG API → Agent extracts graph data → Multi-hop reasoning
Example:
def handle_graph_query(user_query: str) -> str:
response = rag_client.query(user_query)
# Extract entities and relationships
entities = response.graph_results.entities
relationships = response.graph_results.relationships
# Use graph structure for multi-hop reasoning
if len(entities) > 1:
path = find_shortest_path(entities[0], entities[1], relationships)
return f"Connection: {' → '.join(path)}"
# Fall back to standard response
return format_standard_response(response)When to Use:
- Organizational analysis
- Relationship discovery
- Complex entity queries
Pros:
- Multi-hop reasoning
- Relationship discovery
- Structured output
Cons:
- Requires GraphRAG setup
- More complex implementation
Pattern 3: Community-Focused Queries
Use Case: Large knowledge graphs with community detection
Flow:
User asks agent → Agent finds community → Agent queries within community
Example:
def handle_community_query(user_query: str, topic: str) -> str:
# First, find relevant community
communities = rag_client.search_communities(topic)
if not communities:
return "No relevant community found."
# Query within community context
response = rag_client.query(
user_query,
filters={"community_id": communities[0].id}
)
return format_answer(response)When to Use:
- Department-specific queries
- Topic-focused analysis
- Large graph navigation
Pros:
- Focused results
- Reduced noise
- Faster queries
Cons:
- Requires community detection
- May miss cross-community info
MCP Integration
What is MCP
Model Context Protocol (MCP) enables AI agents to discover and use tools automatically.
Tool Discovery
Agents automatically discover RAG tools via MCP manifest:
{
"tools": [
{
"name": "rag_query",
"description": "Query the RAG knowledge base",
"inputSchema": {
"query": {"type": "string"},
"project_id": {"type": "string"}
}
},
{
"name": "search_communities",
"description": "Search for communities in the knowledge graph",
"inputSchema": {
"query": {"type": "string"},
"project_id": {"type": "string"}
}
}
]
}Available Tools
| Tool | Description | Parameters |
|---|---|---|
rag_query | Standard RAG query | query, project_id, options |
search_communities | Community search (GraphRAG) | query, project_id |
edit_graph | Graph editing (GraphRAG) | command, project_id |
MCP Example with Claude
User: "What is the return policy?"
Claude: [Automatically discovers and calls rag_query tool]
Claude: "Based on the documents, the return policy states..."
Response Handling
Parsing API Responses
def parse_rag_response(response) -> dict:
return {
"answer": response.response,
"sources": [
{
"title": s.title,
"score": s.similarity_score,
"content": s.content[:200], # Preview
"document_id": s.document_id
}
for s in response.sources
],
"confidence": calculate_confidence(response),
"processing_time": response.processing_time
}
def calculate_confidence(response) -> str:
if not response.sources:
return "no_sources"
avg_score = sum(s.similarity_score for s in response.sources) / len(response.sources)
if avg_score > 0.8:
return "high"
elif avg_score > 0.6:
return "medium"
else:
return "low"Confidence-Based Responses
def format_response_with_confidence(parsed_response) -> str:
confidence = parsed_response["confidence"]
if confidence == "high":
prefix = "Based on the documents, "
elif confidence == "medium":
prefix = "According to available information, "
elif confidence == "low":
prefix = "I found limited information, but "
else:
prefix = "I don't have specific information about this, but "
return f"{prefix}{parsed_response['answer']}"Error Handling
Common Errors
| Error | Agent Response |
|---|---|
| 401 Unauthorized | "API authentication failed. Please check your API key." |
| 429 Rate Limit | "Too many requests. Please try again in a moment." |
| No results | "I couldn't find relevant information in the documents." |
| Timeout | "The query is taking longer than expected. Please try again." |
Graceful Degradation
def handle_rag_error(error, user_query) -> str:
if error.status == 429:
return f"""I'm experiencing high traffic right now. Let me answer from my general knowledge:
[LLM generates answer without RAG context]
Note: This answer is from my general knowledge, not your documents."""
elif error.status == 401:
return "I'm having trouble accessing the knowledge base. Please contact support to resolve this issue."
elif error.status == 500:
return "There seems to be a technical issue. Please try again in a moment."
else:
return f"""I couldn't access the documents for this query. Based on my general knowledge:
[LLM generates answer without RAG context]"""Optimization Tips
Caching
from functools import lru_cache
import hashlib
def get_cache_key(query: str) -> str:
return hashlib.md5(query.encode()).hexdigest()
@lru_cache(maxsize=1000)
def cached_rag_query(query: str) -> Response:
return rag_client.query(query)
def smart_query(query: str) -> Response:
# Check cache first
cached = cached_rag_query(query)
if cached:
return cached
# Fall back to live query
return rag_client.query(query)Query Reformulation
def reformulate_query(user_query: str, conversation_history: list) -> str:
"""Add context from conversation to improve retrieval."""
if not conversation_history:
return user_query
# Extract relevant context from history
context = extract_context(conversation_history)
# Reformulate with context
return f"{context}. {user_query}"
def extract_context(history: list) -> str:
# Extract entities, topics from recent messages
# Use to provide context for current query
passResponse Streaming
async def stream_rag_response(query: str):
# Stream the response for better UX
yield "Searching documents..."
response = await rag_client.query_async(query)
yield f"Found {len(response.sources)} relevant documents.\n\n"
# Stream the answer
async for chunk in stream_llm_response(response):
yield chunkSecurity Considerations
API Key Management
Do:
- Store API keys in environment variables
- Use server-side proxy for client apps
- Rotate keys periodically
Don't:
- Expose API keys to end users
- Commit keys to version control
- Share keys across projects
# Good: Environment variable
import os
API_KEY = os.environ.get("RAG_API_KEY")
# Bad: Hardcoded
API_KEY = "sk-..." # Never do thisQuery Validation
def validate_query(query: str) -> bool:
# Check length
if len(query) > 4000:
return False
# Check for injection attempts
dangerous_patterns = ["DROP TABLE", "DELETE FROM", "--", ";"]
for pattern in dangerous_patterns:
if pattern in query.upper():
return False
return TrueRate Limiting
from datetime import datetime, timedelta
class RateLimiter:
def __init__(self, max_requests: int, window_seconds: int):
self.max_requests = max_requests
self.window = timedelta(seconds=window_seconds)
self.requests = []
def is_allowed(self) -> bool:
now = datetime.now()
self.requests = [r for r in self.requests if now - r < self.window]
if len(self.requests) >= self.max_requests:
return False
self.requests.append(now)
return True
# Usage
limiter = RateLimiter(max_requests=100, window_seconds=60)
if not limiter.is_allowed():
raise RateLimitExceeded()Integration Examples
Python Client
from guidedmind import RAGClient
client = RAGClient(
api_key="your-api-key",
project_id="your-project-id"
)
response = client.query(
query="What are the main features?",
options={
"max_results": 3,
"include_sources": True,
"temperature": 0.7
}
)
print(f"Answer: {response.response}")
for source in response.sources:
print(f"Source: {source.title} ({source.similarity_score:.2f})")JavaScript/TypeScript
import { RAGClient } from "@guidedmind/rag-client";
const client = new RAGClient({
apiKey: "your-api-key",
projectId: "your-project-id",
});
async function queryRAG(query: string) {
try {
const response = await client.query({
query,
options: {
maxResults: 5,
includeSources: true,
},
});
console.log("Answer:", response.response);
console.log("Sources:", response.sources);
} catch (error) {
console.error("Query failed:", error);
}
}LangChain Integration
from langchain_community.utilities import RAGAPIWrapper
rag = RAGAPIWrapper(
api_key="your-api-key",
project_id="your-project-id"
)
# Use as a retriever
retriever = rag.as_retriever()
# Use in a chain
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever
)
result = qa_chain.run("What is the return policy?")Tips for Success
- Start Simple: Begin with direct query passthrough
- Add Confidence: Show users when answers are uncertain
- Handle Errors Gracefully: Never expose raw errors to users
- Cache Frequently: Reduce API calls and latency
- Monitor Usage: Track query patterns and errors
- Test Thoroughly: Validate with real user queries
- Document Limitations: Be clear about what the agent can do
