ChromaDB Plugin

Embeddable vector database supporting local, persistent, and client-server modes. Built on chromadb.

Installation

pip install chromadb

Quick Start

from daita import SubstrateAgent
from daita.plugins import chroma

# Create plugin
vector_db = chroma(
    path="./chroma_data",
    collection="embeddings"
)

# Agent uses vector database tools autonomously
agent = SubstrateAgent(
    name="Vector Search Agent",
    prompt="You are a semantic search assistant. Help users find relevant information.",
    tools=[vector_db]
)

await agent.start()
result = await agent.run("Find documents similar to 'machine learning'")

Direct Usage

The plugin can be used directly without agents for programmatic access. For comprehensive ChromaDB API documentation, see the official ChromaDB docs. The main value of this plugin is agent integration - enabling LLMs to autonomously perform semantic search operations.

Connection Parameters

chroma(
    path: Optional[str] = None,
    host: Optional[str] = None,
    port: int = 8000,
    collection: str = "default",
    **kwargs
)

Parameters

path (str, optional): Path for persistent local storage
host (str, optional): Host for remote Chroma server
port (int): Port for remote Chroma server (default: 8000)
collection (str): Collection name to use (default: "default")
****kwargs**: Additional Chroma configuration

Connection Modes

ChromaDB supports three operational modes:

In-Memory (Ephemeral)

# No persistence - data lost when process ends
async with chroma(collection="temp") as db:
    await db.upsert(
        ids=["id1"],
        vectors=[[0.1, 0.2, 0.3]]
    )

Persistent Local Storage

# Data persisted to disk
async with chroma(
    path="./chroma_data",
    collection="documents"
) as db:
    await db.upsert(
        ids=["id1"],
        vectors=[[0.1, 0.2, 0.3]],
        metadata=[{"source": "doc1.pdf"}]
    )

Remote Client-Server

# Connect to remote Chroma server
async with chroma(
    host="chroma-server.example.com",
    port=8000,
    collection="production"
) as db:
    results = await db.query(
        vector=[0.1, 0.2, 0.3],
        top_k=10
    )

Using with Agents

ChromaDB plugin exposes vector operations as tools that agents can use autonomously:

from daita import SubstrateAgent
from daita.plugins import chroma
import os

# Create ChromaDB plugin
vector_db = chroma(
    path="./embeddings",
    collection="knowledge_base"
)

# Pass plugin to agent - agent can now use vector tools autonomously
agent = SubstrateAgent(
    name="Semantic Search Agent",
    prompt="You are a semantic search assistant. Help users find relevant documents.",
    llm_provider="openai",
    model="gpt-4",
    tools=[vector_db]
)

await agent.start()

# Agent autonomously uses ChromaDB tools to answer questions
result = await agent.run("Find documents about machine learning")

# The agent will autonomously:
# 1. Use chroma_search to find relevant vectors
# 2. Analyze and present results in natural language

await agent.stop()

Available Tools

The ChromaDB plugin exposes these tools to LLM agents:

Tool	Description	Parameters
chroma_search	Search for similar vectors	`vector` (required), `top_k` (int), `filter` (object)
chroma_upsert	Insert or update vectors	`ids` (required), `vectors` (required), `metadata` (array), `documents` (array)
chroma_delete	Delete vectors	`ids` (array), `filter` (object)
chroma_collections	List all collections	None

Tool Categories: vector_db Tool Source: plugin Filter Format: Simple dict (e.g., {"category": "tech"})

Tool Usage Example

from daita import SubstrateAgent
from daita.plugins import chroma

# Setup ChromaDB with tool integration
vector_db = chroma(path="./data", collection="articles")

agent = SubstrateAgent(
    name="Research Assistant",
    prompt="You are a research assistant. Help users find and organize information.",
    llm_provider="openai",
    model="gpt-4",
    tools=[vector_db]
)

await agent.start()

# Natural language command - agent uses tools autonomously
result = await agent.run("""
Search for articles about:
1. Neural networks
2. Deep learning
3. Computer vision
Return the top 5 most relevant results.
""")

# Agent orchestrates ChromaDB tool calls to fulfill the request
print(result)
await agent.stop()

Error Handling

try:
    async with chroma(path="./data", collection="docs") as db:
        results = await db.query(
            vector=[0.1, 0.2, 0.3],
            top_k=5
        )
except ImportError as e:
    if "chromadb" in str(e):
        print("Install chromadb: pip install chromadb")
except Exception as e:
    print(f"ChromaDB error: {e}")

Best Practices

Connection Management:

Use context managers (async with) for automatic cleanup
Choose appropriate mode (ephemeral, persistent, or client-server) based on needs
Use persistent storage for production workloads
Use client-server mode for distributed systems

Performance:

Batch upsert operations when inserting multiple vectors
Use filters to narrow search scope and improve speed
Limit top_k to only what you need
Store frequently accessed metadata with vectors

Data Organization:

Use meaningful collection names for different use cases
Include descriptive metadata for better filtering
Store original documents when you need context
Use consistent vector dimensions within a collection

Security:

Protect persistent storage paths with appropriate permissions
Use authentication when running Chroma server in production
Validate vector dimensions before upserting
Sanitize user input in metadata

Filtering

ChromaDB supports simple metadata filtering:

async with chroma(path="./data", collection="docs") as db:
    # Simple equality filter
    results = await db.query(
        vector=[0.1, 0.2, 0.3],
        top_k=10,
        filter={"category": "tech"}
    )

    # Note: For advanced filtering (operators like $gt, $in, etc.),
    # refer to ChromaDB documentation

Troubleshooting

Issue	Solution
`chromadb not installed`	`pip install chromadb`
Collection doesn't exist	It will be auto-created on first upsert
Connection refused (client mode)	Check Chroma server is running, verify host/port
Dimension mismatch	Ensure all vectors have same dimensions
Memory issues	Use persistent storage, adjust `max_stream_length`

Common Patterns

Semantic search with metadata:

async with chroma(path="./data", collection="docs") as db:
    results = await db.query(
        vector=query_embedding,
        top_k=10,
        filter={"category": "tech", "published": "2024"}
    )

Document storage with embeddings:

async with chroma(path="./data", collection="docs") as db:
    await db.upsert(
        ids=["doc1"],
        vectors=[embedding],
        documents=["Full text of the document"],
        metadata=[{"title": "ML Guide", "author": "Jane"}]
    )

Collection exploration:

async with chroma(path="./data") as db:
    collections = await db.list_collections()
    print(f"Available: {collections}")

Next Steps

Plugin Overview - All available plugins
Pinecone Plugin - Managed cloud vector database
Qdrant Plugin - Self-hosted vector database

Installation​

Quick Start​

Direct Usage​

Connection Parameters​

Parameters​

Connection Modes​

In-Memory (Ephemeral)​

Persistent Local Storage​

Remote Client-Server​

Using with Agents​

Available Tools​

Tool Usage Example​

Error Handling​

Best Practices​

Filtering​

Troubleshooting​

Common Patterns​

Next Steps​