DataQuality Plugin
Analytical data quality for agents — statistical profiling, anomaly detection, freshness checks, and consolidated quality reporting on top of any database plugin.
#Installation
No additional packages required beyond your database plugin. For anomaly detection with scipy z-scores, optionally install:
pip install scipy#Quick Start
from daita import Agent
from daita.plugins import postgresql, data_quality
db = postgresql(host="localhost", database="analytics")
dq = data_quality(db=db)
agent = Agent(
name="Quality Checker",
prompt="You are a data quality analyst. Profile tables and flag issues.",
tools=[db, dq]
)
await agent.start()
result = await agent.run("Profile the orders table and flag any anomalies")#Configuration
data_quality(
db=None, # Any BaseDatabasePlugin instance — required at execution time
backend=None, # Optional graph backend for persisting reports (auto-selected if None)
thresholds=None, # Anomaly detection sensitivity (see below)
)#Parameters
db(BaseDatabasePlugin): The database plugin to run quality checks against. Required when tools are called — can be omitted at construction and provided later.backend: Optional graph backend for persisting quality reports as stable nodes. Auto-selected at agent start if not provided.thresholds(dict): Anomaly detection thresholds. Defaults:{"z_score": 3.0, "iqr_multiplier": 1.5}
#Usage
#Column Profiling
Profile null rates, cardinality, and min/max/avg per column:
from daita.plugins import sqlite, data_quality
async with sqlite(path="app.db") as db:
dq = data_quality(db=db)
report = await dq.profile(db, "orders")
# Returns per-column stats: null_rate, cardinality, min, max, avg#Anomaly Detection
Detect statistical outliers in a numeric column:
from daita.plugins import sqlite, data_quality
async with sqlite(path="app.db") as db:
dq = data_quality(db=db)
result = await dq.detect_anomaly(db, "transactions", "amount")
# Returns rows where amount is a statistical outlierUses numpy by default; scipy z-scores if installed. Thresholds are configurable:
dq = data_quality(db=db, thresholds={"z_score": 2.5, "iqr_multiplier": 1.5})#Freshness Checks
Validate that a timestamp column is within an expected recency window:
from daita.plugins import sqlite, data_quality
async with sqlite(path="app.db") as db:
dq = data_quality(db=db)
result = await dq.check_freshness(
db, "events", "created_at",
expected_interval_hours=24,
)
# Returns staleness info; is_fresh=False if data is older than 24 hours#Quality Report
Generate a consolidated quality report across all columns and persist it:
from daita.plugins import sqlite, data_quality
async with sqlite(path="app.db") as db:
dq = data_quality(db=db)
report = await dq.report(db, "orders")
# Returns profiling + completeness score, persisted as a stable graph node#Using with Agents
#Tool-Based Integration (Recommended)
from daita import Agent
from daita.plugins import postgresql, data_quality
import os
db = postgresql(
host=os.getenv("DB_HOST"),
database=os.getenv("DB_NAME"),
username=os.getenv("DB_USER"),
password=os.getenv("DB_PASSWORD"),
)
dq = data_quality(db=db)
agent = Agent(
name="Quality Monitor",
prompt="You are a data quality monitor. Profile tables, detect anomalies, and report issues.",
tools=[db, dq]
)
await agent.start()
result = await agent.run("""
1. Profile the transactions table
2. Check freshness of the events table (expect data within last 6 hours)
3. Detect anomalies in the revenue column
4. Generate a full quality report
""")
await agent.stop()#Available Tools
| Tool | Description | Key Parameters |
|---|---|---|
dq_profile | Column-level profiling (null rates, cardinality, min/max/avg) | table (required) |
dq_detect_anomaly | Statistical outlier detection on a numeric column | table, column (required) |
dq_check_freshness | Validates a timestamp column is within a recency window | table, timestamp_column (required); expected_interval_hours (optional, default 24) |
dq_report | Consolidated quality report, persisted as a graph node | table (required) |
#Dialect Support
DataQuality works with all SQL database plugins:
| Plugin | Supported |
|---|---|
| SQLite | Yes |
| PostgreSQL | Yes |
| MySQL | Yes |
| Snowflake | Yes |
Column discovery uses pragma_table_info for SQLite and information_schema.columns for all other dialects.
#Combining with ItemAssertion
For enforcement at query time (rather than analytical profiling), use ItemAssertion with query_checked() directly on the database plugin:
from daita.plugins import postgresql
from daita import ItemAssertion
async with postgresql(host="localhost", database="app") as db:
rows = await db.query_checked(
"SELECT * FROM orders",
assertions=[
ItemAssertion(lambda r: r["total"] > 0, "Order total must be positive"),
ItemAssertion(lambda r: r["status"] in ("pending", "shipped", "delivered"), "Invalid status"),
],
)DataQualityPlugin is best for analytical quality checks run by an agent. ItemAssertion is best for enforcement — asserting guarantees at the point of data consumption.
#Error Handling
from daita.plugins import postgresql, data_quality
from daita import DataQualityError
db = postgresql(host="localhost", database="app")
dq = data_quality(db=db)
try:
report = await dq.profile(db, "orders")
except ValueError as e:
# No db configured
print(f"Configuration error: {e}")
except DataQualityError as e:
print(f"Quality violations: {e.violations}")#Next Steps
- SQLite Plugin — Lightweight database to pair with DataQuality
- Transformer Plugin — Versioned SQL transformations
- ItemAssertion — Row-level enforcement at query time
- Plugin Overview — Learn about other plugins