Daita Logo

Agent.from_db()

Build a fully configured analyst agent from a database connection string or plugin. Discovers schema automatically, generates a system prompt, and registers query and analysis tools — ready to use in one call.

#Overview

Agent.from_db() creates a production-ready data analyst agent from a database connection. It connects, discovers the schema, generates a rich system prompt, and returns a configured Agent with query tools pre-registered.

python
from daita import Agent
 
agent = await Agent.from_db("postgresql://user:pass@localhost/analytics")
 
await agent.start()
result = await agent.run("What were the top 5 products by revenue last quarter?")

No manual schema setup, no tool registration, no system prompt writing — from_db() handles all of it.


#Supported Databases

Pass a connection string or a plugin instance:

python
# PostgreSQL
agent = await Agent.from_db("postgresql://user:pass@host:5432/mydb")
 
# MySQL
agent = await Agent.from_db("mysql://user:pass@host:3306/mydb")
 
# SQLite (file-based)
agent = await Agent.from_db("sqlite:///./data.db")
 
# SQLite (in-memory)
agent = await Agent.from_db("sqlite:///:memory:")
 
# Existing plugin instance
from daita.plugins import postgresql
db = postgresql(host="localhost", database="analytics")
agent = await Agent.from_db(db)

#Parameters

python
await Agent.from_db(
    source,                          # connection string or plugin instance (required)
    *,
    name=None,                       # agent name (default: "{domain} database agent")
    model=None,                      # LLM model override
    api_key=None,                    # LLM API key override
    llm_provider=None,               # LLM provider override
    prompt=None,                     # user context prepended to auto-generated prompt
    db_schema=None,                  # DB schema name (e.g. "public" for PostgreSQL)
    include_sample_values=True,      # include sample numeric values in prompt
    redact_pii_columns=True,         # skip sampling PII-named columns
    lineage=None,                    # True or LineagePlugin instance
    memory=None,                     # True or MemoryPlugin instance
    history=None,                    # True or ConversationHistory instance
    cache_ttl=None,                  # schema cache TTL in seconds
    read_only=True,                  # omit write tools when True
    toolkit="analyst",               # analyst toolkit to register
    **agent_kwargs,                  # forwarded to Agent.__init__
)

#Core Parameters

  • source (str | BaseDatabasePlugin): Connection string or plugin instance. Required.
  • name (str): Agent name. Defaults to "{domain} database agent" where domain is inferred from the schema.
  • prompt (str): Additional context prepended to the auto-generated schema prompt. Use this to describe business rules, preferred query patterns, or domain terminology.
  • read_only (bool): When True (default), write tools (sqlite_execute, execute_sql, etc.) are omitted from the agent. Recommended for production.
  • toolkit (str | None): Analyst toolkit to register. "analyst" (default) adds 6 in-process analysis tools. None disables the toolkit.

#LLM Configuration

  • model, api_key, llm_provider: Override the LLM settings. When omitted, auto-detected from environment variables.
  • **agent_kwargs: Forwarded to Agent.__init__ — use this to pass display_reasoning=True, relay=..., or any other Agent parameter.

#Schema Discovery

  • db_schema (str): Target DB schema (e.g. "public" for PostgreSQL, "main" for SQLite). Discovered automatically when None.
  • include_sample_values (bool): When True, numeric columns are sampled and sample values appear in the system prompt. Helps the LLM reason about scale and units.
  • redact_pii_columns (bool): When True, columns with names matching common PII patterns (email, password, ssn, etc.) are excluded from sampling.
  • cache_ttl (int | None): Cache schema discovery results for this many seconds. None disables caching. Useful for large schemas or slow connections.

#Schema Discovery

from_db() connects to the database and uses dialect-specific discovery:

  • PostgreSQL/MySQL: Queries information_schema.columns and information_schema.table_constraints
  • SQLite: Uses PRAGMA table_info() and sqlite_master
  • Other dialects: Falls back to the plugin's tables() and describe() methods

The discovered schema is embedded in the agent's system prompt, so the LLM understands the full table structure without needing to call discovery tools. Schema discovery tools (sqlite_list_tables, postgres_inspect, etc.) are removed from the tool registry to save ~250–300 tokens per LLM call.

#Caching and Drift Detection

python
# Cache schema for 1 hour
agent = await Agent.from_db(
    "postgresql://user:pass@host/db",
    cache_ttl=3600
)

When cache_ttl is set:

  • On a cache hit, discovery is skipped entirely
  • On expiry, the schema is re-discovered and the cache is updated
  • If re-discovery fails, the expired cache is used with a warning
  • If the new schema differs from the cached one, drift is logged and the system prompt is updated

#Analyst Toolkit

With toolkit="analyst" (the default), six in-process analysis tools are registered:

ToolDescription
pivot_tablePivot rows into a cross-tabulation
correlateCompute column correlation matrix
detect_anomaliesStatistical outlier detection
compare_entitiesCompare two entities row-by-row
find_similarFind rows most similar to a reference row
forecast_trendSimple trend forecasting on time-series data

These tools run in-process (no extra LLM call) and operate on data already fetched by the query tools.


#Optional Integrations

#Lineage

Automatically creates a LineagePlugin and registers foreign-key relationships discovered during schema discovery:

python
agent = await Agent.from_db(
    "postgresql://user:pass@host/db",
    lineage=True
)
# agent._db_lineage is the LineagePlugin instance
 
# Or pass your own:
from daita.plugins import lineage
lin = lineage()
agent = await Agent.from_db("postgresql://...", lineage=lin)

#Memory

Auto-creates a MemoryPlugin and runs numeric column calibration on first use:

python
agent = await Agent.from_db(
    "postgresql://user:pass@host/db",
    memory=True
)

Calibration infers column units (e.g. "amount is in USD cents") and stores them in memory. Subsequent agent runs use this context automatically.

#Conversation History

Enables conversational drilldown across multiple run() calls:

python
agent = await Agent.from_db(
    "postgresql://user:pass@host/db",
    history=True
)
 
await agent.start()
await agent.run("How many orders last week?")
await agent.run("Break that down by region")  # references prior context

#Error Handling

python
from daita import Agent
from daita.core.exceptions import AgentError
 
try:
    agent = await Agent.from_db("postgresql://user:wrong@host/db")
except AgentError as e:
    if "Failed to connect" in str(e):
        print("Check database credentials and host")
    elif "Schema discovery failed" in str(e):
        print("Connected but schema discovery failed — check schema parameter")
except ValueError as e:
    print(f"Invalid connection string: {e}")

#Best Practices

  • Use read_only=True (the default) in production to prevent accidental writes.
  • Provide a prompt with business context (e.g. column naming conventions, currency units, time zone) to improve answer accuracy.
  • Enable cache_ttl for large schemas or when starting multiple agents from the same database — schema discovery can be slow on wide schemas.
  • Use history=True for interactive Q&A workflows where users follow up on prior questions.
  • Set name explicitly to a meaningful identifier — it appears in logs, traces, and the system prompt.

#Next Steps