Skip to main content

ChromaDB Plugin

Embeddable vector database supporting local, persistent, and client-server modes. Built on chromadb.

Installation

pip install chromadb

Quick Start

from daita import SubstrateAgent
from daita.plugins import chroma

# Create plugin
vector_db = chroma(
path="./chroma_data",
collection="embeddings"
)

# Agent uses vector database tools autonomously
agent = SubstrateAgent(
name="Vector Search Agent",
prompt="You are a semantic search assistant. Help users find relevant information.",
tools=[vector_db]
)

await agent.start()
result = await agent.run("Find documents similar to 'machine learning'")

Direct Usage

The plugin can be used directly without agents for programmatic access. For comprehensive ChromaDB API documentation, see the official ChromaDB docs. The main value of this plugin is agent integration - enabling LLMs to autonomously perform semantic search operations.

Connection Parameters

chroma(
path: Optional[str] = None,
host: Optional[str] = None,
port: int = 8000,
collection: str = "default",
**kwargs
)

Parameters

  • path (str, optional): Path for persistent local storage
  • host (str, optional): Host for remote Chroma server
  • port (int): Port for remote Chroma server (default: 8000)
  • collection (str): Collection name to use (default: "default")
  • ****kwargs**: Additional Chroma configuration

Connection Modes

ChromaDB supports three operational modes:

In-Memory (Ephemeral)

# No persistence - data lost when process ends
async with chroma(collection="temp") as db:
await db.upsert(
ids=["id1"],
vectors=[[0.1, 0.2, 0.3]]
)

Persistent Local Storage

# Data persisted to disk
async with chroma(
path="./chroma_data",
collection="documents"
) as db:
await db.upsert(
ids=["id1"],
vectors=[[0.1, 0.2, 0.3]],
metadata=[{"source": "doc1.pdf"}]
)

Remote Client-Server

# Connect to remote Chroma server
async with chroma(
host="chroma-server.example.com",
port=8000,
collection="production"
) as db:
results = await db.query(
vector=[0.1, 0.2, 0.3],
top_k=10
)

Using with Agents

ChromaDB plugin exposes vector operations as tools that agents can use autonomously:

from daita import SubstrateAgent
from daita.plugins import chroma
import os

# Create ChromaDB plugin
vector_db = chroma(
path="./embeddings",
collection="knowledge_base"
)

# Pass plugin to agent - agent can now use vector tools autonomously
agent = SubstrateAgent(
name="Semantic Search Agent",
prompt="You are a semantic search assistant. Help users find relevant documents.",
llm_provider="openai",
model="gpt-4",
tools=[vector_db]
)

await agent.start()

# Agent autonomously uses ChromaDB tools to answer questions
result = await agent.run("Find documents about machine learning")

# The agent will autonomously:
# 1. Use chroma_search to find relevant vectors
# 2. Analyze and present results in natural language

await agent.stop()

Available Tools

The ChromaDB plugin exposes these tools to LLM agents:

ToolDescriptionParameters
chroma_searchSearch for similar vectorsvector (required), top_k (int), filter (object)
chroma_upsertInsert or update vectorsids (required), vectors (required), metadata (array), documents (array)
chroma_deleteDelete vectorsids (array), filter (object)
chroma_collectionsList all collectionsNone

Tool Categories: vector_db Tool Source: plugin Filter Format: Simple dict (e.g., {"category": "tech"})

Tool Usage Example

from daita import SubstrateAgent
from daita.plugins import chroma

# Setup ChromaDB with tool integration
vector_db = chroma(path="./data", collection="articles")

agent = SubstrateAgent(
name="Research Assistant",
prompt="You are a research assistant. Help users find and organize information.",
llm_provider="openai",
model="gpt-4",
tools=[vector_db]
)

await agent.start()

# Natural language command - agent uses tools autonomously
result = await agent.run("""
Search for articles about:
1. Neural networks
2. Deep learning
3. Computer vision
Return the top 5 most relevant results.
""")

# Agent orchestrates ChromaDB tool calls to fulfill the request
print(result)
await agent.stop()

Error Handling

try:
async with chroma(path="./data", collection="docs") as db:
results = await db.query(
vector=[0.1, 0.2, 0.3],
top_k=5
)
except ImportError as e:
if "chromadb" in str(e):
print("Install chromadb: pip install chromadb")
except Exception as e:
print(f"ChromaDB error: {e}")

Best Practices

Connection Management:

  • Use context managers (async with) for automatic cleanup
  • Choose appropriate mode (ephemeral, persistent, or client-server) based on needs
  • Use persistent storage for production workloads
  • Use client-server mode for distributed systems

Performance:

  • Batch upsert operations when inserting multiple vectors
  • Use filters to narrow search scope and improve speed
  • Limit top_k to only what you need
  • Store frequently accessed metadata with vectors

Data Organization:

  • Use meaningful collection names for different use cases
  • Include descriptive metadata for better filtering
  • Store original documents when you need context
  • Use consistent vector dimensions within a collection

Security:

  • Protect persistent storage paths with appropriate permissions
  • Use authentication when running Chroma server in production
  • Validate vector dimensions before upserting
  • Sanitize user input in metadata

Filtering

ChromaDB supports simple metadata filtering:

async with chroma(path="./data", collection="docs") as db:
# Simple equality filter
results = await db.query(
vector=[0.1, 0.2, 0.3],
top_k=10,
filter={"category": "tech"}
)

# Note: For advanced filtering (operators like $gt, $in, etc.),
# refer to ChromaDB documentation

Troubleshooting

IssueSolution
chromadb not installedpip install chromadb
Collection doesn't existIt will be auto-created on first upsert
Connection refused (client mode)Check Chroma server is running, verify host/port
Dimension mismatchEnsure all vectors have same dimensions
Memory issuesUse persistent storage, adjust max_stream_length

Common Patterns

Semantic search with metadata:

async with chroma(path="./data", collection="docs") as db:
results = await db.query(
vector=query_embedding,
top_k=10,
filter={"category": "tech", "published": "2024"}
)

Document storage with embeddings:

async with chroma(path="./data", collection="docs") as db:
await db.upsert(
ids=["doc1"],
vectors=[embedding],
documents=["Full text of the document"],
metadata=[{"title": "ML Guide", "author": "Jane"}]
)

Collection exploration:

async with chroma(path="./data") as db:
collections = await db.list_collections()
print(f"Available: {collections}")

Next Steps