Eval Commands
Run local and cloud agent evaluation suites from the Daita CLI.
#eval
Run an agent evaluation suite from an eval config in your project.
daita eval [config]By default, daita eval runs locally in your current Python environment. Local
evals require daita-agents because the CLI imports your project code and the
daita.evals runtime.
Options:
| Flag | Default | Description |
|---|---|---|
--local | local mode | Run the eval in the local process |
--cloud | — | Run the registered suite in Daita cloud |
--suite <name> | — | Select a cloud suite by name |
--case <id> | — | Run one case. May be provided multiple times |
--failed | — | Rerun cases that failed in the latest artifact report |
--runs <count> | config value | Override the run count for selected cases |
--format <format> | pretty | Output format: pretty, json, markdown, or junit |
--output-dir <path> | config value | Artifact output directory |
--record-baseline | — | Record this run as the baseline |
--compare-baseline | — | Compare this run against the configured or default baseline |
--baseline <path> | — | Baseline path to compare against |
--include-tool-outputs | — | Include tool outputs in artifacts and judge inputs |
--no-artifacts | — | Do not write local eval artifacts |
--judge-provider <name> | config value | Override configured judge providers |
--judge-model <name> | config value | Override configured judge models |
--judge-api-key <key> | config value | Override configured judge API keys |
--no-judges | — | Disable all LLM judge expectations for the run |
--timeout <seconds> | 900 | Cloud eval polling timeout |
Examples:
# Run the starter eval locally
daita eval evals/starter-agent.yaml
# Run a single case twice and print JSON
daita eval evals/starter-agent.yaml --case greeting --runs 2 --format json
# Rerun the last failed cases
daita eval evals/starter-agent.yaml --failed
# Run a deployed cloud suite by config path
daita eval evals/starter-agent.yaml --cloud
# Run a deployed cloud suite by name
daita eval --cloud --suite starter-agent-evals#Eval configs
daita init creates an evals/ directory and a starter eval config. Eval
configs are designed to live with your project source so they can be reviewed,
versioned, and deployed alongside the agents they test.
my-project/
├── agents/
├── workflows/
├── skills/
├── evals/
│ └── starter-agent.yaml
└── daita-project.yamlWhen you run daita push, eval configs are included in the project package and
registered with the hosted eval API. The dashboard and cloud commands can then
list those suites and run them against the active deployment.
#Cloud evals
Cloud eval commands use the production Daita API and require DAITA_API_KEY.
There is no --environment flag; hosted evals run against the production
environment.
Cloud execution is useful when you want the same infrastructure, secrets, package, and deployment context that your hosted agents use in production. Local execution is better for fast iteration while editing prompts, tools, and expectations.
#eval runs
List and inspect cloud eval run history.
#List runs
daita eval runsOptions:
| Flag | Default | Description |
|---|---|---|
--limit | 20 | Number of runs to return |
--eval-suite-id <id> | — | Filter by eval suite id |
--project-name <name> | current project | Filter by project name |
--status <status> | — | Filter by run status |
#Show a run
daita eval runs show <eval_run_id>Returns the run summary, including suite name, status, score, duration, artifact location, and timestamps.
#Fetch a report
daita eval runs report <eval_run_id>Options:
| Flag | Default | Description |
|---|---|---|
--format <format> | pretty | Output format: pretty, json, markdown, or junit |
The report command fetches the canonical report.json for a run. Use JSON mode
for automation or JUnit mode for CI systems that understand test reports.
#eval suites
List and inspect registered cloud eval suites.
#List suites
daita eval suitesOptions:
| Flag | Default | Description |
|---|---|---|
--limit | 20 | Number of suites to return |
--project-name <name> | current project | Filter by project name |
--agent-name <name> | — | Filter by target agent |
--status <status> | active | Filter by suite status |
#Show a suite
daita eval suites show <eval_suite_id>Returns the registered suite snapshot, including config path, config hash, target agent or workflow, status, and deployment association.