ChromaDB Plugin
Embeddable vector database supporting local, persistent, and client-server modes. Built on chromadb.
Installation
pip install chromadb
Quick Start
from daita import SubstrateAgent
from daita.plugins import chroma
# Create plugin
vector_db = chroma(
path="./chroma_data",
collection="embeddings"
)
# Agent uses vector database tools autonomously
agent = SubstrateAgent(
name="Vector Search Agent",
prompt="You are a semantic search assistant. Help users find relevant information.",
tools=[vector_db]
)
await agent.start()
result = await agent.run("Find documents similar to 'machine learning'")
Direct Usage
The plugin can be used directly without agents for programmatic access. For comprehensive ChromaDB API documentation, see the official ChromaDB docs. The main value of this plugin is agent integration - enabling LLMs to autonomously perform semantic search operations.
Connection Parameters
chroma(
path: Optional[str] = None,
host: Optional[str] = None,
port: int = 8000,
collection: str = "default",
**kwargs
)
Parameters
path(str, optional): Path for persistent local storagehost(str, optional): Host for remote Chroma serverport(int): Port for remote Chroma server (default: 8000)collection(str): Collection name to use (default: "default")- **
**kwargs**: Additional Chroma configuration
Connection Modes
ChromaDB supports three operational modes:
In-Memory (Ephemeral)
# No persistence - data lost when process ends
async with chroma(collection="temp") as db:
await db.upsert(
ids=["id1"],
vectors=[[0.1, 0.2, 0.3]]
)
Persistent Local Storage
# Data persisted to disk
async with chroma(
path="./chroma_data",
collection="documents"
) as db:
await db.upsert(
ids=["id1"],
vectors=[[0.1, 0.2, 0.3]],
metadata=[{"source": "doc1.pdf"}]
)
Remote Client-Server
# Connect to remote Chroma server
async with chroma(
host="chroma-server.example.com",
port=8000,
collection="production"
) as db:
results = await db.query(
vector=[0.1, 0.2, 0.3],
top_k=10
)
Using with Agents
ChromaDB plugin exposes vector operations as tools that agents can use autonomously:
from daita import SubstrateAgent
from daita.plugins import chroma
import os
# Create ChromaDB plugin
vector_db = chroma(
path="./embeddings",
collection="knowledge_base"
)
# Pass plugin to agent - agent can now use vector tools autonomously
agent = SubstrateAgent(
name="Semantic Search Agent",
prompt="You are a semantic search assistant. Help users find relevant documents.",
llm_provider="openai",
model="gpt-4",
tools=[vector_db]
)
await agent.start()
# Agent autonomously uses ChromaDB tools to answer questions
result = await agent.run("Find documents about machine learning")
# The agent will autonomously:
# 1. Use chroma_search to find relevant vectors
# 2. Analyze and present results in natural language
await agent.stop()
Available Tools
The ChromaDB plugin exposes these tools to LLM agents:
| Tool | Description | Parameters |
|---|---|---|
| chroma_search | Search for similar vectors | vector (required), top_k (int), filter (object) |
| chroma_upsert | Insert or update vectors | ids (required), vectors (required), metadata (array), documents (array) |
| chroma_delete | Delete vectors | ids (array), filter (object) |
| chroma_collections | List all collections | None |
Tool Categories: vector_db
Tool Source: plugin
Filter Format: Simple dict (e.g., {"category": "tech"})
Tool Usage Example
from daita import SubstrateAgent
from daita.plugins import chroma
# Setup ChromaDB with tool integration
vector_db = chroma(path="./data", collection="articles")
agent = SubstrateAgent(
name="Research Assistant",
prompt="You are a research assistant. Help users find and organize information.",
llm_provider="openai",
model="gpt-4",
tools=[vector_db]
)
await agent.start()
# Natural language command - agent uses tools autonomously
result = await agent.run("""
Search for articles about:
1. Neural networks
2. Deep learning
3. Computer vision
Return the top 5 most relevant results.
""")
# Agent orchestrates ChromaDB tool calls to fulfill the request
print(result)
await agent.stop()
Error Handling
try:
async with chroma(path="./data", collection="docs") as db:
results = await db.query(
vector=[0.1, 0.2, 0.3],
top_k=5
)
except ImportError as e:
if "chromadb" in str(e):
print("Install chromadb: pip install chromadb")
except Exception as e:
print(f"ChromaDB error: {e}")
Best Practices
Connection Management:
- Use context managers (
async with) for automatic cleanup - Choose appropriate mode (ephemeral, persistent, or client-server) based on needs
- Use persistent storage for production workloads
- Use client-server mode for distributed systems
Performance:
- Batch upsert operations when inserting multiple vectors
- Use filters to narrow search scope and improve speed
- Limit
top_kto only what you need - Store frequently accessed metadata with vectors
Data Organization:
- Use meaningful collection names for different use cases
- Include descriptive metadata for better filtering
- Store original documents when you need context
- Use consistent vector dimensions within a collection
Security:
- Protect persistent storage paths with appropriate permissions
- Use authentication when running Chroma server in production
- Validate vector dimensions before upserting
- Sanitize user input in metadata
Filtering
ChromaDB supports simple metadata filtering:
async with chroma(path="./data", collection="docs") as db:
# Simple equality filter
results = await db.query(
vector=[0.1, 0.2, 0.3],
top_k=10,
filter={"category": "tech"}
)
# Note: For advanced filtering (operators like $gt, $in, etc.),
# refer to ChromaDB documentation
Troubleshooting
| Issue | Solution |
|---|---|
chromadb not installed | pip install chromadb |
| Collection doesn't exist | It will be auto-created on first upsert |
| Connection refused (client mode) | Check Chroma server is running, verify host/port |
| Dimension mismatch | Ensure all vectors have same dimensions |
| Memory issues | Use persistent storage, adjust max_stream_length |
Common Patterns
Semantic search with metadata:
async with chroma(path="./data", collection="docs") as db:
results = await db.query(
vector=query_embedding,
top_k=10,
filter={"category": "tech", "published": "2024"}
)
Document storage with embeddings:
async with chroma(path="./data", collection="docs") as db:
await db.upsert(
ids=["doc1"],
vectors=[embedding],
documents=["Full text of the document"],
metadata=[{"title": "ML Guide", "author": "Jane"}]
)
Collection exploration:
async with chroma(path="./data") as db:
collections = await db.list_collections()
print(f"Available: {collections}")
Next Steps
- Plugin Overview - All available plugins
- Pinecone Plugin - Managed cloud vector database
- Qdrant Plugin - Self-hosted vector database