RAG Pipelines Exercises
Fill in the blanks to test your knowledge.
Name the process of splitting documents into smaller segments
// Preparing documents for vector storage
// Called document
Complete the text splitter instantiation
from langchain.text_splitter import CharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=512)
Name the database type used to store embedding vectors
// Stores high-dimensional vectors for similarity search
// Called a database
Identify what RAG stands for
// Combines retrieval with language model generation
// RAG = Augmented Generation
Name the similarity metric commonly used for text vectors
// Measures angle between vectors (range -1 to 1)
// Called similarity
Complete the ChromaDB similarity search call
results = collection.(query_embeddings=[q_vec], n_results=5)
Identify the component that converts text to vectors
// Used both at index time and query time
// Called an model
Name the second-pass ranking model type used to improve RAG results
// Scores (query, document) pairs more accurately
// Called a -encoder