Daita Logo

Eval Commands

Run local and cloud agent evaluation suites from the Daita CLI.

#eval

Run an agent evaluation suite from an eval config in your project.

bash
daita eval [config]

By default, daita eval runs locally in your current Python environment. Local evals require daita-agents because the CLI imports your project code and the daita.evals runtime.

Options:

FlagDefaultDescription
--locallocal modeRun the eval in the local process
--cloudRun the registered suite in Daita cloud
--suite <name>Select a cloud suite by name
--case <id>Run one case. May be provided multiple times
--failedRerun cases that failed in the latest artifact report
--runs <count>config valueOverride the run count for selected cases
--format <format>prettyOutput format: pretty, json, markdown, or junit
--output-dir <path>config valueArtifact output directory
--record-baselineRecord this run as the baseline
--compare-baselineCompare this run against the configured or default baseline
--baseline <path>Baseline path to compare against
--include-tool-outputsInclude tool outputs in artifacts and judge inputs
--no-artifactsDo not write local eval artifacts
--judge-provider <name>config valueOverride configured judge providers
--judge-model <name>config valueOverride configured judge models
--judge-api-key <key>config valueOverride configured judge API keys
--no-judgesDisable all LLM judge expectations for the run
--timeout <seconds>900Cloud eval polling timeout

Examples:

bash
# Run the starter eval locally
daita eval evals/starter-agent.yaml
 
# Run a single case twice and print JSON
daita eval evals/starter-agent.yaml --case greeting --runs 2 --format json
 
# Rerun the last failed cases
daita eval evals/starter-agent.yaml --failed
 
# Run a deployed cloud suite by config path
daita eval evals/starter-agent.yaml --cloud
 
# Run a deployed cloud suite by name
daita eval --cloud --suite starter-agent-evals

#Eval configs

daita init creates an evals/ directory and a starter eval config. Eval configs are designed to live with your project source so they can be reviewed, versioned, and deployed alongside the agents they test.

text
my-project/
├── agents/
├── workflows/
├── skills/
├── evals/
│   └── starter-agent.yaml
└── daita-project.yaml

When you run daita push, eval configs are included in the project package and registered with the hosted eval API. The dashboard and cloud commands can then list those suites and run them against the active deployment.

#Cloud evals

Cloud eval commands use the production Daita API and require DAITA_API_KEY. There is no --environment flag; hosted evals run against the production environment.

Cloud execution is useful when you want the same infrastructure, secrets, package, and deployment context that your hosted agents use in production. Local execution is better for fast iteration while editing prompts, tools, and expectations.

#eval runs

List and inspect cloud eval run history.

#List runs

bash
daita eval runs

Options:

FlagDefaultDescription
--limit20Number of runs to return
--eval-suite-id <id>Filter by eval suite id
--project-name <name>current projectFilter by project name
--status <status>Filter by run status

#Show a run

bash
daita eval runs show <eval_run_id>

Returns the run summary, including suite name, status, score, duration, artifact location, and timestamps.

#Fetch a report

bash
daita eval runs report <eval_run_id>

Options:

FlagDefaultDescription
--format <format>prettyOutput format: pretty, json, markdown, or junit

The report command fetches the canonical report.json for a run. Use JSON mode for automation or JUnit mode for CI systems that understand test reports.

#eval suites

List and inspect registered cloud eval suites.

#List suites

bash
daita eval suites

Options:

FlagDefaultDescription
--limit20Number of suites to return
--project-name <name>current projectFilter by project name
--agent-name <name>Filter by target agent
--status <status>activeFilter by suite status

#Show a suite

bash
daita eval suites show <eval_suite_id>

Returns the registered suite snapshot, including config path, config hash, target agent or workflow, status, and deployment association.