Getting Started
RAG Introduction
Learn Retrieval-Augmented Generation — how to give LLMs access to your own documents.
What is RAG?
Retrieval-Augmented Generation (RAG) combines two powerful techniques:
- Retrieval: Find relevant information from a knowledge base
- Generation: Use an LLM to generate answers based on the retrieved information
Why RAG?
LLMs have limitations:
- Knowledge cutoff: Don't know about recent events
- No private data: Don't have access to your company's documents
- Hallucinations: May confidently state false information
RAG solves these by grounding the LLM's responses in real, retrieved documents.
The RAG Pipeline
text
Documents → Chunking → Embedding → Vector Store
↓
Query → Embedding → Similar Chunks → LLM → AnswerCommon Use Cases
- Customer support chatbots that know your product documentation
- Legal research assistants
- Internal knowledge base Q&A
- Medical information systems
- Code documentation assistants
Example
python
# Simple RAG pipeline from scratch
from anthropic import Anthropic
import numpy as np
client = Anthropic()
# Step 1: Your knowledge base (documents)
documents = [
"Python was created by Guido van Rossum and first released in 1991.",
"Python supports multiple programming paradigms including procedural, object-oriented, and functional.",
"Python's design philosophy emphasizes code readability with significant indentation.",
"Python has a large standard library and a vibrant ecosystem of third-party packages.",
"Django is a high-level Python web framework that encourages rapid development.",
"NumPy provides support for large multi-dimensional arrays and mathematical functions.",
"TensorFlow and PyTorch are the most popular deep learning frameworks in Python.",
]
# Step 2: Create embeddings for documents (simplified with OpenAI client)
# In practice, use openai or sentence-transformers
def simple_tfidf_embed(texts):
"""Simplified: use character n-gram frequency as a naive 'embedding'"""
vocab = set()
for text in texts:
for i in range(len(text) - 2):
vocab.add(text[i:i+3])
vocab = sorted(vocab)
embeddings = []
for text in texts:
vec = [text.count(ngram) for ngram in vocab]
norm = np.linalg.norm(vec)
embeddings.append(np.array(vec) / (norm + 1e-10))
return np.array(embeddings), vocab
doc_embeddings, vocab = simple_tfidf_embed(documents)
# Step 3: Retrieve relevant documents
def retrieve(query, top_k=3):
query_emb, _ = simple_tfidf_embed([query])
scores = np.dot(doc_embeddings, query_emb[0])
top_indices = np.argsort(scores)[::-1][:top_k]
return [documents[i] for i in top_indices]
# Step 4: Generate answer with context
def rag_query(question):
context_docs = retrieve(question)
context = "\n".join(f"- {doc}" for doc in context_docs)
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Answer the question based ONLY on the provided context.
Context:
{context}
Question: {question}
If the context doesn't contain the answer, say "I don't have that information."
"""
}]
)
return response.content[0].text
print(rag_query("Who created Python?"))Try it yourself — PYTHON