Tools

ChromaDB

Use ChromaDB for local development and prototyping with a simple Python-first API.

ChromaDB

Chroma is an open-source embedding database designed for AI applications. It's the easiest vector database to get started with — no external services needed.

Key Features

  • Embedded: Runs in-process (no server needed for dev)
  • Persistent: Save to disk
  • Built-in embeddings: Auto-embed with sentence-transformers or OpenAI
  • Simple API: Designed for Python-first development
  • Filtering: Rich metadata filtering

When to Use Chroma

  • Prototyping and development
  • Small to medium datasets (< 1M vectors)
  • Local applications
  • When you want zero infrastructure overhead

Example

python
# pip install chromadb

import chromadb
from chromadb.utils import embedding_functions

# In-memory (for testing)
client = chromadb.Client()

# Persistent storage
client = chromadb.PersistentClient(path="./my_vector_db")

# Built-in OpenAI embeddings
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-openai-api-key",
    model_name="text-embedding-3-small"
)

# Or local sentence-transformers (no API key needed)
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

# Create collection
collection = client.create_collection(
    name="research_papers",
    embedding_function=openai_ef,
    metadata={"description": "AI research papers"}
)

# Add documents
collection.add(
    ids=["paper1", "paper2", "paper3"],
    documents=[
        "Attention mechanisms revolutionized natural language processing",
        "Diffusion models generate high-quality images through iterative denoising",
        "Reinforcement learning from human feedback aligns LLMs with human values",
    ],
    metadatas=[
        {"year": 2017, "topic": "transformers"},
        {"year": 2020, "topic": "image_generation"},
        {"year": 2022, "topic": "alignment"},
    ]
)

# Basic query
results = collection.query(
    query_texts=["how do large language models work?"],
    n_results=2
)
print(results['documents'])

# Query with metadata filter
results = collection.query(
    query_texts=["neural networks for generation"],
    n_results=3,
    where={"year": {"$gte": 2020}},
    where_document={"$contains": "model"}  # document content filter
)

# Get by ID
result = collection.get(
    ids=["paper1"],
    include=["documents", "metadatas", "embeddings"]
)

# Delete
collection.delete(ids=["paper1"])
collection.delete(where={"year": {"$lt": 2020}})

# Collection info
print(f"Count: {collection.count()}")
print(f"Collections: {[c.name for c in client.list_collections()]}")
Try it yourself — PYTHON