Building Your First RAG Pipeline: A Step-by-Step Tutorial
Build a complete Retrieval-Augmented Generation system from scratch. Learn embeddings, chunking, vector search, and how to make your AI answers grounded in real data.

DevForge Team
AI Development Educators

What is RAG and Why Do You Need It?
Large language models know a lot, but they don't know YOUR data. They don't know your company's documentation, your codebase's specifics, this week's news, or the contents of that 200-page PDF you need to query.
Retrieval-Augmented Generation (RAG) solves this by dynamically injecting relevant information into the LLM's context at query time. Instead of the model guessing, it reasons over documents you've selected as relevant to the user's question.
The result: AI that can accurately answer questions about your specific data, with citations, without hallucinating facts that aren't in your documents.
In this tutorial, you'll build a complete RAG pipeline from scratch using:
- Supabase (vector database via pgvector)
- OpenAI text-embedding-3-small (embeddings)
- Claude claude-opus-4-5 (generation)
- TypeScript
By the end, you'll have a system that can answer questions about any set of documents you provide.
Understanding the RAG Architecture
A RAG system has two phases:
Phase 1: Indexing (offline)
- Load your documents
- Split them into chunks
- Convert chunks to embedding vectors
- Store vectors in a vector database
Phase 2: Retrieval + Generation (online, per query)
- Convert user query to embedding vector
- Find most similar document chunks (vector search)
- Inject relevant chunks into LLM context
- LLM generates answer grounded in retrieved context
Setting Up Your Environment
mkdir rag-tutorial && cd rag-tutorial
npm init -y
npm install @anthropic-ai/sdk openai @supabase/supabase-js
npm install -D typescript @types/node ts-nodeCreate your .env:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_SERVICE_ROLE_KEY=eyJ...Step 1: Setting Up the Vector Database
In Supabase, enable the pgvector extension and create your documents table:
-- Enable pgvector
create extension if not exists vector;
-- Documents table
create table documents (
id uuid primary key default gen_random_uuid(),
content text not null,
metadata jsonb default '{}',
embedding vector(1536), -- OpenAI text-embedding-3-small
created_at timestamptz default now()
);
-- Create index for fast similarity search
create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
-- Similarity search function
create or replace function match_documents(
query_embedding vector(1536),
match_threshold float default 0.7,
match_count int default 5
)
returns table (
id uuid,
content text,
metadata jsonb,
similarity float
)
language sql stable
as $$
select
id,
content,
metadata,
1 - (embedding <=> query_embedding) as similarity
from documents
where 1 - (embedding <=> query_embedding) > match_threshold
order by embedding <=> query_embedding
limit match_count;
$$;Step 2: Document Chunking
Good chunking strategy dramatically affects RAG quality. Too large = diluted relevance. Too small = insufficient context.
interface Chunk {
content: string;
metadata: {
source: string;
chunkIndex: number;
totalChunks: number;
};
}
function chunkDocument(
text: string,
source: string,
chunkSize = 500,
overlap = 50
): Chunk[] {
const chunks: Chunk[] = [];
const sentences = text.split(/(?<=[.!?])\s+/);
let currentChunk = '';
let chunkIndex = 0;
for (const sentence of sentences) {
if ((currentChunk + sentence).length > chunkSize && currentChunk) {
chunks.push({
content: currentChunk.trim(),
metadata: { source, chunkIndex, totalChunks: 0 }
});
// Overlap: keep last part of chunk
const words = currentChunk.split(' ');
currentChunk = words.slice(-overlap).join(' ') + ' ' + sentence;
chunkIndex++;
} else {
currentChunk += (currentChunk ? ' ' : '') + sentence;
}
}
if (currentChunk.trim()) {
chunks.push({
content: currentChunk.trim(),
metadata: { source, chunkIndex, totalChunks: 0 }
});
}
// Update totalChunks
chunks.forEach(chunk => {
chunk.metadata.totalChunks = chunks.length;
});
return chunks;
}Step 3: Generating Embeddings
import OpenAI from 'openai';
const openai = new OpenAI();
async function generateEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text.replace(/\n/g, ' '),
});
return response.data[0].embedding;
}
async function generateEmbeddingsBatch(
texts: string[]
): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts.map(t => t.replace(/\n/g, ' ')),
});
return response.data.map(item => item.embedding);
}Step 4: Indexing Documents
import { createClient } from '@supabase/supabase-js';
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!
);
async function indexDocument(
text: string,
source: string
): Promise<void> {
console.log(`Indexing: ${source}`);
const chunks = chunkDocument(text, source);
console.log(`Created ${chunks.length} chunks`);
// Generate embeddings in batch (more efficient)
const contents = chunks.map(c => c.content);
const embeddings = await generateEmbeddingsBatch(contents);
// Insert into Supabase
const rows = chunks.map((chunk, i) => ({
content: chunk.content,
metadata: chunk.metadata,
embedding: embeddings[i],
}));
const { error } = await supabase
.from('documents')
.insert(rows);
if (error) throw error;
console.log(`Indexed ${chunks.length} chunks from ${source}`);
}Step 5: Retrieval
interface RetrievedDocument {
id: string;
content: string;
metadata: Record<string, unknown>;
similarity: number;
}
async function retrieveRelevantDocs(
query: string,
threshold = 0.7,
limit = 5
): Promise<RetrievedDocument[]> {
const queryEmbedding = await generateEmbedding(query);
const { data, error } = await supabase.rpc('match_documents', {
query_embedding: queryEmbedding,
match_threshold: threshold,
match_count: limit,
});
if (error) throw error;
return data || [];
}Step 6: Generation with Context
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
async function answerQuestion(
question: string
): Promise<{ answer: string; sources: string[] }> {
// Retrieve relevant context
const docs = await retrieveRelevantDocs(question);
if (docs.length === 0) {
return {
answer: "I don't have enough relevant information to answer that question.",
sources: []
};
}
// Build context from retrieved docs
const context = docs
.map((doc, i) => `[Source ${i + 1}: ${doc.metadata.source}]\n${doc.content}`)
.join('\n\n---\n\n');
// Generate answer with Claude
const response = await anthropic.messages.create({
model: 'claude-opus-4-5',
max_tokens: 2048,
system: `You are a helpful assistant that answers questions based on provided context.
Rules:
- Only use information from the provided context
- If the context doesn't contain the answer, say so explicitly
- Cite your sources using [Source N] notation
- Be concise and accurate`,
messages: [{
role: 'user',
content: `Context:
${context}
Question: ${question}`
}]
});
const answer = response.content[0].type === 'text'
? response.content[0].text
: '';
const sources = [...new Set(docs.map(d => d.metadata.source as string))];
return { answer, sources };
}Step 7: Putting It All Together
async function main() {
// Index some sample documents
const docs = [
{
text: `DevForge Academy offers 40+ programming tutorials covering web development,
data science, and AI. Courses include HTML, CSS, JavaScript, Python, SQL,
React, and our exclusive AI development curriculum.`,
source: 'about.txt'
},
{
text: `Our AI development courses cover prompt engineering, the Claude API,
RAG pipelines, vector databases, AI agents, and deploying AI apps.
These are DevForge exclusive courses not found elsewhere.`,
source: 'ai-courses.txt'
}
];
for (const doc of docs) {
await indexDocument(doc.text, doc.source);
}
// Query
const result = await answerQuestion(
"What AI topics does DevForge Academy cover?"
);
console.log('Answer:', result.answer);
console.log('Sources:', result.sources);
}
main().catch(console.error);Key RAG Optimizations
Hybrid Search
Combine vector search with keyword search (BM25) for better retrieval:
-- Postgres full-text search + vector search
SELECT *, ts_rank(to_tsvector(content), plainto_tsquery($1)) as text_rank,
1 - (embedding <=> $2) as vector_rank
FROM documents
WHERE to_tsvector(content) @@ plainto_tsquery($1)
OR 1 - (embedding <=> $2) > 0.7
ORDER BY (0.3 * text_rank + 0.7 * vector_rank) DESC
LIMIT 10;Reranking
After initial retrieval, use a cross-encoder model to rerank results by actual relevance.
Query Expansion
Expand the user query before retrieval:
const expandedQuery = await expandQuery(userQuery);
// "RAG" → "Retrieval-Augmented Generation, vector search, document retrieval"
const docs = await retrieveRelevantDocs(expandedQuery);Metadata Filtering
Add filters to narrow search space:
const docs = await retrieveRelevantDocs(query, {
filter: { source: 'documentation', date: { gte: '2024-01-01' } }
});What to Build Next
Now that you have a working RAG pipeline, try these extensions:
- Document Q&A App — Upload any PDF, ask questions about it
- Codebase Chat — Index your codebase, ask architectural questions
- Customer Support Bot — Index your help docs, answer user tickets
- Research Assistant — Index papers, synthesize findings across documents
RAG is the foundation of most practical AI applications. Master it and you'll be able to build a huge class of genuinely useful AI-powered products.