Advanced Features

Embeddings with Gemini

Generate text embeddings with Gemini's embedding model for semantic search, clustering, and RAG pipelines.

Text Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts produce vectors that are close in vector space, enabling semantic search, clustering, and retrieval-augmented generation (RAG).

Gemini Embedding Models

Model	Dimensions	Best For
text-embedding-004	768	General text embedding, semantic search
embedding-001	768	Legacy, use text-embedding-004

Task Types

Gemini embeddings support task-type hints that optimize the vector for specific use cases:

Task Type	Use When
RETRIEVAL_DOCUMENT	Embedding documents to be stored in an index
RETRIEVAL_QUERY	Embedding queries against a document index
SEMANTIC_SIMILARITY	Comparing similarity between pairs of texts
CLASSIFICATION	Classifying texts into categories
CLUSTERING	Grouping similar texts together

Batch Embedding

For large datasets, use batchEmbedContents() to embed multiple texts in one API call — much more efficient than individual calls.

Cosine Similarity

The standard similarity metric for embedding vectors. A value of 1.0 means identical, 0 means orthogonal (unrelated), -1 means opposite.

Building a Semantic Search System

The typical flow for embedding-based search:

Embed each document with task type RETRIEVAL_DOCUMENT
Store vectors in a vector database (pgvector, ChromaDB, Pinecone)
At query time, embed the query with task type RETRIEVAL_QUERY
Find the k nearest vectors using cosine similarity
Return the original documents as context for the LLM

Example

typescript

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const embedModel = genAI.getGenerativeModel({ model: "text-embedding-004" });

// --- Single embedding ---
const docResult = await embedModel.embedContent({
  content: { parts: [{ text: "Gemini has a 1 million token context window." }], role: "user" },
  taskType: "RETRIEVAL_DOCUMENT",
});
console.log("Vector dimensions:", docResult.embedding.values.length); // 768

// --- Batch embedding for a document corpus ---
const documents = [
  "Gemini 1.5 Pro supports 1M token context",
  "Claude API uses the messages.create() method",
  "GPT-4o is OpenAI's multimodal flagship model",
  "Vector databases store embeddings for similarity search",
];

const batchResult = await embedModel.batchEmbedContents({
  requests: documents.map(text => ({
    content: { parts: [{ text }], role: "user" },
    taskType: "RETRIEVAL_DOCUMENT",
  })),
});

const vectors = batchResult.embeddings.map(e => e.values);

// --- Cosine similarity ---
function cosineSimilarity(a: number[], b: number[]): number {
  const dot = a.reduce((sum, v, i) => sum + v * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, v) => sum + v * v, 0));
  const magB = Math.sqrt(b.reduce((sum, v) => sum + v * v, 0));
  return dot / (magA * magB);
}

// --- Query ---
const queryResult = await embedModel.embedContent({
  content: { parts: [{ text: "What model has the largest context window?" }], role: "user" },
  taskType: "RETRIEVAL_QUERY",
});

const queryVec = queryResult.embedding.values;
const scores = vectors.map((vec, i) => ({
  document: documents[i],
  score: cosineSimilarity(queryVec, vec),
}));

scores.sort((a, b) => b.score - a.score);
console.log("Top result:", scores[0].document);
// "Gemini 1.5 Pro supports 1M token context"

Try it yourself — TYPESCRIPT

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const embedModel = genAI.getGenerativeModel({ model: "text-embedding-004" });

// --- Single embedding ---
const docResult = await embedModel.embedContent({
  content: { parts: [{ text: "Gemini has a 1 million token context window." }], role: "user" },
  taskType: "RETRIEVAL_DOCUMENT",
});
console.log("Vector dimensions:", docResult.embedding.values.length); // 768

// --- Batch embedding for a document corpus ---
const documents = [
  "Gemini 1.5 Pro supports 1M token context",
  "Claude API uses the messages.create() method",
  "GPT-4o is OpenAI's multimodal flagship model",
  "Vector databases store embeddings for similarity search",
];

const batchResult = await embedModel.batchEmbedContents({
  requests: documents.map(text => ({
    content: { parts: [{ text }], role: "user" },
    taskType: "RETRIEVAL_DOCUMENT",
  })),
});

const vectors = batchResult.embeddings.map(e => e.values);

// --- Cosine similarity ---
function cosineSimilarity(a: number[], b: number[]): number {
  const dot = a.reduce((sum, v, i) => sum + v * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, v) => sum + v * v, 0));
  const magB = Math.sqrt(b.reduce((sum, v) => sum + v * v, 0));
  return dot / (magA * magB);
}

// --- Query ---
const queryResult = await embedModel.embedContent({
  content: { parts: [{ text: "What model has the largest context window?" }], role: "user" },
  taskType: "RETRIEVAL_QUERY",
});

const queryVec = queryResult.embedding.values;
const scores = vectors.map((vec, i) => ({
  document: documents[i],
  score: cosineSimilarity(queryVec, vec),
}));

scores.sort((a, b) => b.score - a.score);
console.log("Top result:", scores[0].document);
// "Gemini 1.5 Pro supports 1M token context"