Tools
ChromaDB
Use ChromaDB for local development and prototyping with a simple Python-first API.
ChromaDB
Chroma is an open-source embedding database designed for AI applications. It's the easiest vector database to get started with — no external services needed.
Key Features
- Embedded: Runs in-process (no server needed for dev)
- Persistent: Save to disk
- Built-in embeddings: Auto-embed with sentence-transformers or OpenAI
- Simple API: Designed for Python-first development
- Filtering: Rich metadata filtering
When to Use Chroma
- Prototyping and development
- Small to medium datasets (< 1M vectors)
- Local applications
- When you want zero infrastructure overhead
Example
python
# pip install chromadb
import chromadb
from chromadb.utils import embedding_functions
# In-memory (for testing)
client = chromadb.Client()
# Persistent storage
client = chromadb.PersistentClient(path="./my_vector_db")
# Built-in OpenAI embeddings
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-openai-api-key",
model_name="text-embedding-3-small"
)
# Or local sentence-transformers (no API key needed)
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
# Create collection
collection = client.create_collection(
name="research_papers",
embedding_function=openai_ef,
metadata={"description": "AI research papers"}
)
# Add documents
collection.add(
ids=["paper1", "paper2", "paper3"],
documents=[
"Attention mechanisms revolutionized natural language processing",
"Diffusion models generate high-quality images through iterative denoising",
"Reinforcement learning from human feedback aligns LLMs with human values",
],
metadatas=[
{"year": 2017, "topic": "transformers"},
{"year": 2020, "topic": "image_generation"},
{"year": 2022, "topic": "alignment"},
]
)
# Basic query
results = collection.query(
query_texts=["how do large language models work?"],
n_results=2
)
print(results['documents'])
# Query with metadata filter
results = collection.query(
query_texts=["neural networks for generation"],
n_results=3,
where={"year": {"$gte": 2020}},
where_document={"$contains": "model"} # document content filter
)
# Get by ID
result = collection.get(
ids=["paper1"],
include=["documents", "metadatas", "embeddings"]
)
# Delete
collection.delete(ids=["paper1"])
collection.delete(where={"year": {"$lt": 2020}})
# Collection info
print(f"Count: {collection.count()}")
print(f"Collections: {[c.name for c in client.list_collections()]}")Try it yourself — PYTHON