Back to Blog
Tutorials 18 min read February 8, 2025

Building Your First RAG Pipeline: A Step-by-Step Tutorial

Build a complete Retrieval-Augmented Generation system from scratch. Learn embeddings, chunking, vector search, and how to make your AI answers grounded in real data.

DevForge Team

DevForge Team

AI Development Educators

Glowing data network pipeline representing a RAG retrieval system

What is RAG and Why Do You Need It?

Large language models know a lot, but they don't know YOUR data. They don't know your company's documentation, your codebase's specifics, this week's news, or the contents of that 200-page PDF you need to query.

Retrieval-Augmented Generation (RAG) solves this by dynamically injecting relevant information into the LLM's context at query time. Instead of the model guessing, it reasons over documents you've selected as relevant to the user's question.

The result: AI that can accurately answer questions about your specific data, with citations, without hallucinating facts that aren't in your documents.

In this tutorial, you'll build a complete RAG pipeline from scratch using:

  • Supabase (vector database via pgvector)
  • OpenAI text-embedding-3-small (embeddings)
  • Claude claude-opus-4-5 (generation)
  • TypeScript

By the end, you'll have a system that can answer questions about any set of documents you provide.

Understanding the RAG Architecture

A RAG system has two phases:

Phase 1: Indexing (offline)

  1. Load your documents
  2. Split them into chunks
  3. Convert chunks to embedding vectors
  4. Store vectors in a vector database

Phase 2: Retrieval + Generation (online, per query)

  1. Convert user query to embedding vector
  2. Find most similar document chunks (vector search)
  3. Inject relevant chunks into LLM context
  4. LLM generates answer grounded in retrieved context

Setting Up Your Environment

bash
mkdir rag-tutorial && cd rag-tutorial
npm init -y
npm install @anthropic-ai/sdk openai @supabase/supabase-js
npm install -D typescript @types/node ts-node

Create your .env:

text
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_SERVICE_ROLE_KEY=eyJ...

Step 1: Setting Up the Vector Database

In Supabase, enable the pgvector extension and create your documents table:

sql
-- Enable pgvector
create extension if not exists vector;

-- Documents table
create table documents (
  id uuid primary key default gen_random_uuid(),
  content text not null,
  metadata jsonb default '{}',
  embedding vector(1536), -- OpenAI text-embedding-3-small
  created_at timestamptz default now()
);

-- Create index for fast similarity search
create index on documents
  using ivfflat (embedding vector_cosine_ops)
  with (lists = 100);

-- Similarity search function
create or replace function match_documents(
  query_embedding vector(1536),
  match_threshold float default 0.7,
  match_count int default 5
)
returns table (
  id uuid,
  content text,
  metadata jsonb,
  similarity float
)
language sql stable
as $$
  select
    id,
    content,
    metadata,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  where 1 - (embedding <=> query_embedding) > match_threshold
  order by embedding <=> query_embedding
  limit match_count;
$$;

Step 2: Document Chunking

Good chunking strategy dramatically affects RAG quality. Too large = diluted relevance. Too small = insufficient context.

typescript
interface Chunk {
  content: string;
  metadata: {
    source: string;
    chunkIndex: number;
    totalChunks: number;
  };
}

function chunkDocument(
  text: string,
  source: string,
  chunkSize = 500,
  overlap = 50
): Chunk[] {
  const chunks: Chunk[] = [];
  const sentences = text.split(/(?<=[.!?])\s+/);

  let currentChunk = '';
  let chunkIndex = 0;

  for (const sentence of sentences) {
    if ((currentChunk + sentence).length > chunkSize && currentChunk) {
      chunks.push({
        content: currentChunk.trim(),
        metadata: { source, chunkIndex, totalChunks: 0 }
      });

      // Overlap: keep last part of chunk
      const words = currentChunk.split(' ');
      currentChunk = words.slice(-overlap).join(' ') + ' ' + sentence;
      chunkIndex++;
    } else {
      currentChunk += (currentChunk ? ' ' : '') + sentence;
    }
  }

  if (currentChunk.trim()) {
    chunks.push({
      content: currentChunk.trim(),
      metadata: { source, chunkIndex, totalChunks: 0 }
    });
  }

  // Update totalChunks
  chunks.forEach(chunk => {
    chunk.metadata.totalChunks = chunks.length;
  });

  return chunks;
}

Step 3: Generating Embeddings

typescript
import OpenAI from 'openai';

const openai = new OpenAI();

async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text.replace(/\n/g, ' '),
  });
  return response.data[0].embedding;
}

async function generateEmbeddingsBatch(
  texts: string[]
): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts.map(t => t.replace(/\n/g, ' ')),
  });
  return response.data.map(item => item.embedding);
}

Step 4: Indexing Documents

typescript
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
);

async function indexDocument(
  text: string,
  source: string
): Promise<void> {
  console.log(`Indexing: ${source}`);

  const chunks = chunkDocument(text, source);
  console.log(`Created ${chunks.length} chunks`);

  // Generate embeddings in batch (more efficient)
  const contents = chunks.map(c => c.content);
  const embeddings = await generateEmbeddingsBatch(contents);

  // Insert into Supabase
  const rows = chunks.map((chunk, i) => ({
    content: chunk.content,
    metadata: chunk.metadata,
    embedding: embeddings[i],
  }));

  const { error } = await supabase
    .from('documents')
    .insert(rows);

  if (error) throw error;
  console.log(`Indexed ${chunks.length} chunks from ${source}`);
}

Step 5: Retrieval

typescript
interface RetrievedDocument {
  id: string;
  content: string;
  metadata: Record<string, unknown>;
  similarity: number;
}

async function retrieveRelevantDocs(
  query: string,
  threshold = 0.7,
  limit = 5
): Promise<RetrievedDocument[]> {
  const queryEmbedding = await generateEmbedding(query);

  const { data, error } = await supabase.rpc('match_documents', {
    query_embedding: queryEmbedding,
    match_threshold: threshold,
    match_count: limit,
  });

  if (error) throw error;
  return data || [];
}

Step 6: Generation with Context

typescript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

async function answerQuestion(
  question: string
): Promise<{ answer: string; sources: string[] }> {
  // Retrieve relevant context
  const docs = await retrieveRelevantDocs(question);

  if (docs.length === 0) {
    return {
      answer: "I don't have enough relevant information to answer that question.",
      sources: []
    };
  }

  // Build context from retrieved docs
  const context = docs
    .map((doc, i) => `[Source ${i + 1}: ${doc.metadata.source}]\n${doc.content}`)
    .join('\n\n---\n\n');

  // Generate answer with Claude
  const response = await anthropic.messages.create({
    model: 'claude-opus-4-5',
    max_tokens: 2048,
    system: `You are a helpful assistant that answers questions based on provided context.

Rules:
- Only use information from the provided context
- If the context doesn't contain the answer, say so explicitly
- Cite your sources using [Source N] notation
- Be concise and accurate`,
    messages: [{
      role: 'user',
      content: `Context:
${context}

Question: ${question}`
    }]
  });

  const answer = response.content[0].type === 'text'
    ? response.content[0].text
    : '';

  const sources = [...new Set(docs.map(d => d.metadata.source as string))];

  return { answer, sources };
}

Step 7: Putting It All Together

typescript
async function main() {
  // Index some sample documents
  const docs = [
    {
      text: `DevForge Academy offers 40+ programming tutorials covering web development,
             data science, and AI. Courses include HTML, CSS, JavaScript, Python, SQL,
             React, and our exclusive AI development curriculum.`,
      source: 'about.txt'
    },
    {
      text: `Our AI development courses cover prompt engineering, the Claude API,
             RAG pipelines, vector databases, AI agents, and deploying AI apps.
             These are DevForge exclusive courses not found elsewhere.`,
      source: 'ai-courses.txt'
    }
  ];

  for (const doc of docs) {
    await indexDocument(doc.text, doc.source);
  }

  // Query
  const result = await answerQuestion(
    "What AI topics does DevForge Academy cover?"
  );

  console.log('Answer:', result.answer);
  console.log('Sources:', result.sources);
}

main().catch(console.error);

Key RAG Optimizations

Hybrid Search

Combine vector search with keyword search (BM25) for better retrieval:

sql
-- Postgres full-text search + vector search
SELECT *, ts_rank(to_tsvector(content), plainto_tsquery($1)) as text_rank,
  1 - (embedding <=> $2) as vector_rank
FROM documents
WHERE to_tsvector(content) @@ plainto_tsquery($1)
   OR 1 - (embedding <=> $2) > 0.7
ORDER BY (0.3 * text_rank + 0.7 * vector_rank) DESC
LIMIT 10;

Reranking

After initial retrieval, use a cross-encoder model to rerank results by actual relevance.

Query Expansion

Expand the user query before retrieval:

typescript
const expandedQuery = await expandQuery(userQuery);
// "RAG" → "Retrieval-Augmented Generation, vector search, document retrieval"
const docs = await retrieveRelevantDocs(expandedQuery);

Metadata Filtering

Add filters to narrow search space:

typescript
const docs = await retrieveRelevantDocs(query, {
  filter: { source: 'documentation', date: { gte: '2024-01-01' } }
});

What to Build Next

Now that you have a working RAG pipeline, try these extensions:

  1. Document Q&A App — Upload any PDF, ask questions about it
  2. Codebase Chat — Index your codebase, ask architectural questions
  3. Customer Support Bot — Index your help docs, answer user tickets
  4. Research Assistant — Index papers, synthesize findings across documents

RAG is the foundation of most practical AI applications. Master it and you'll be able to build a huge class of genuinely useful AI-powered products.

#RAG#Vector Databases#Embeddings#Claude API#AI