Getting Started

RAG Introduction

Learn Retrieval-Augmented Generation — how to give LLMs access to your own documents.

What is RAG?

Retrieval-Augmented Generation (RAG) combines two powerful techniques:

  1. Retrieval: Find relevant information from a knowledge base
  2. Generation: Use an LLM to generate answers based on the retrieved information

Why RAG?

LLMs have limitations:

  • Knowledge cutoff: Don't know about recent events
  • No private data: Don't have access to your company's documents
  • Hallucinations: May confidently state false information

RAG solves these by grounding the LLM's responses in real, retrieved documents.

The RAG Pipeline

text
Documents → Chunking → Embedding → Vector Store
                                         ↓
Query → Embedding → Similar Chunks → LLM → Answer

Common Use Cases

  • Customer support chatbots that know your product documentation
  • Legal research assistants
  • Internal knowledge base Q&A
  • Medical information systems
  • Code documentation assistants

Example

python
# Simple RAG pipeline from scratch
from anthropic import Anthropic
import numpy as np

client = Anthropic()

# Step 1: Your knowledge base (documents)
documents = [
    "Python was created by Guido van Rossum and first released in 1991.",
    "Python supports multiple programming paradigms including procedural, object-oriented, and functional.",
    "Python's design philosophy emphasizes code readability with significant indentation.",
    "Python has a large standard library and a vibrant ecosystem of third-party packages.",
    "Django is a high-level Python web framework that encourages rapid development.",
    "NumPy provides support for large multi-dimensional arrays and mathematical functions.",
    "TensorFlow and PyTorch are the most popular deep learning frameworks in Python.",
]

# Step 2: Create embeddings for documents (simplified with OpenAI client)
# In practice, use openai or sentence-transformers
def simple_tfidf_embed(texts):
    """Simplified: use character n-gram frequency as a naive 'embedding'"""
    vocab = set()
    for text in texts:
        for i in range(len(text) - 2):
            vocab.add(text[i:i+3])
    vocab = sorted(vocab)

    embeddings = []
    for text in texts:
        vec = [text.count(ngram) for ngram in vocab]
        norm = np.linalg.norm(vec)
        embeddings.append(np.array(vec) / (norm + 1e-10))
    return np.array(embeddings), vocab

doc_embeddings, vocab = simple_tfidf_embed(documents)

# Step 3: Retrieve relevant documents
def retrieve(query, top_k=3):
    query_emb, _ = simple_tfidf_embed([query])
    scores = np.dot(doc_embeddings, query_emb[0])
    top_indices = np.argsort(scores)[::-1][:top_k]
    return [documents[i] for i in top_indices]

# Step 4: Generate answer with context
def rag_query(question):
    context_docs = retrieve(question)
    context = "\n".join(f"- {doc}" for doc in context_docs)

    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Answer the question based ONLY on the provided context.

Context:
{context}

Question: {question}

If the context doesn't contain the answer, say "I don't have that information."
"""
        }]
    )
    return response.content[0].text

print(rag_query("Who created Python?"))
Try it yourself — PYTHON