Daita Logo

Data Assertions

ItemAssertion enforces row-level data quality rules at query time. Any violation raises a DataQualityError — a permanent error that agents will not retry.

#Overview

ItemAssertion lets you declare expectations about every row returned by a query. When you pass assertions to query_checked(), the framework evaluates each rule against every row and raises DataQualityError if any rule fails — before the result ever reaches your agent or application code.

python
from daita.plugins import sqlite
from daita import ItemAssertion
 
async with sqlite(path="transactions.db") as db:
    rows = await db.query_checked(
        "SELECT * FROM transactions",
        assertions=[
            ItemAssertion(lambda r: r["amount"] > 0, "All amounts must be positive"),
            ItemAssertion(lambda r: r["customer_id"] is not None, "customer_id required"),
        ],
    )

#ItemAssertion

python
from daita import ItemAssertion
 
ItemAssertion(
    check: Callable[[dict], bool],
    description: str,
)

#Parameters

  • check (callable): A function that receives a single row as a dict and returns True if the row passes, False if it violates the assertion.
  • description (str): Human-readable description of the rule. Included in error messages and the violations list on DataQualityError.

#Examples

python
from daita import ItemAssertion
 
# Field presence
ItemAssertion(lambda r: r.get("email") is not None, "Email required")
 
# Range check
ItemAssertion(lambda r: 0 < r["amount"] <= 1_000_000, "Amount out of range")
 
# Enum check
ItemAssertion(
    lambda r: r["status"] in ("pending", "shipped", "delivered"),
    "Invalid order status"
)
 
# Type check
ItemAssertion(lambda r: isinstance(r["user_id"], int), "user_id must be an integer")
 
# Cross-field rule
ItemAssertion(
    lambda r: r["shipped_at"] is None or r["shipped_at"] >= r["created_at"],
    "shipped_at cannot be before created_at"
)

#query_checked()

query_checked() is available on all database plugins. It runs the query and evaluates assertions against every row before returning results.

python
async def query_checked(
    sql: str,
    params=None,
    assertions: list[ItemAssertion] = None,
) -> list[dict]
  • When assertions is None or empty, it behaves identically to query().
  • All assertions are evaluated against all rows before any error is raised — so the DataQualityError carries the full list of failures, not just the first.
  • Raises DataQualityError if any assertion has one or more violations.

#Supported plugins

query_checked() is available on every BaseDatabasePlugin:

  • SQLite
  • PostgreSQL
  • MySQL
  • Snowflake

#DataQualityError

DataQualityError is raised when one or more assertions fail. It is a PermanentError — agents will not retry the query, because retrying the same query against the same data won't fix bad data.

python
from daita import DataQualityError
 
try:
    rows = await db.query_checked("SELECT * FROM orders", assertions=[...])
except DataQualityError as e:
    print(e.message)      # "Data quality violations: ..."
    print(e.violations)   # list of violation dicts
    print(e.table)        # table name if provided

#Violation structure

Each entry in e.violations is a dict:

python
{
    "description": "All amounts must be positive",  # assertion description
    "violation_count": 3,                            # rows that failed
    "total_items": 100,                              # total rows checked
    "sample": [                                      # up to 3 failing rows
        {"id": 42, "amount": -10, ...},
        ...
    ]
}

#Behaviour Details

All violations are collected before raising. If multiple assertions fail, DataQualityError.violations contains an entry for each — you see the full picture in one error, not just the first failure.

Assertions that raise internally are skipped, not propagated. If a check callable throws an exception (e.g. KeyError on a missing field), that assertion is logged at DEBUG level and skipped. Other assertions still run.

DataQualityError is permanent. Agent retry policies will not re-run a tool that raises DataQualityError. The violation must be fixed at the data level.


#Usage with Agents

Assertions are most useful in agent tools that need to guarantee data quality before acting on results:

python
from daita import Agent, tool, ItemAssertion
from daita.plugins import postgresql
import os
 
db = postgresql(
    host=os.getenv("DB_HOST"),
    database=os.getenv("DB_NAME"),
    username=os.getenv("DB_USER"),
    password=os.getenv("DB_PASSWORD"),
)
 
@tool
async def get_active_orders() -> list:
    """Fetch active orders, guaranteed clean."""
    return await db.query_checked(
        "SELECT * FROM orders WHERE status = 'active'",
        assertions=[
            ItemAssertion(lambda r: r["total"] > 0, "Order total must be positive"),
            ItemAssertion(lambda r: r["customer_id"] is not None, "customer_id required"),
        ],
    )
 
agent = Agent(
    name="Order Processor",
    prompt="Process active orders. Do not proceed if data quality fails.",
    tools=[get_active_orders],
)

#Assertions vs DataQuality Plugin

ItemAssertion + query_checked()DataQualityPlugin
PurposeEnforcement — guarantee rules at consumption timeAnalysis — profile, detect anomalies, report
When it runsAt the point of the query callOn-demand, typically by an agent
On failureRaises DataQualityError (blocks execution)Returns a report (agent decides what to do)
Best forCritical data guarantees in tools and pipelinesExploratory quality checks and monitoring

Use ItemAssertion when bad data should stop execution. Use DataQualityPlugin when you want the agent to reason about data quality and decide.


#Next Steps