RAG Pipelines Exercises — Practice Coding

1

Name the process of splitting documents into smaller segments

// Preparing documents for vector storage

// Called document

2

Complete the text splitter instantiation

from langchain.text_splitter import CharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=512)

3

Name the database type used to store embedding vectors

// Stores high-dimensional vectors for similarity search

// Called a database

4

Identify what RAG stands for

// Combines retrieval with language model generation

// RAG = Augmented Generation

5

Name the similarity metric commonly used for text vectors

// Measures angle between vectors (range -1 to 1)

// Called similarity

6

Complete the ChromaDB similarity search call

results = collection.(query_embeddings=[q_vec], n_results=5)

7

Identify the component that converts text to vectors

// Used both at index time and query time

// Called an model

8

Name the second-pass ranking model type used to improve RAG results

// Scores (query, document) pairs more accurately

// Called a -encoder