Eval Commands

#eval

Run an agent evaluation suite from an eval config in your project.

bash

daita eval [config]

By default, daita eval runs locally in your current Python environment. Local evals require daita-agents because the CLI imports your project code and the daita.evals runtime.

Options:

Flag	Default	Description
`--local`	local mode	Run the eval in the local process
`--cloud`	—	Run the registered suite in Daita cloud
`--suite <name>`	—	Select a cloud suite by name
`--case <id>`	—	Run one case. May be provided multiple times
`--failed`	—	Rerun cases that failed in the latest artifact report
`--runs <count>`	config value	Override the run count for selected cases
`--format <format>`	`pretty`	Output format: `pretty`, `json`, `markdown`, or `junit`
`--output-dir <path>`	config value	Artifact output directory
`--record-baseline`	—	Record this run as the baseline
`--compare-baseline`	—	Compare this run against the configured or default baseline
`--baseline <path>`	—	Baseline path to compare against
`--include-tool-outputs`	—	Include tool outputs in artifacts and judge inputs
`--no-artifacts`	—	Do not write local eval artifacts
`--judge-provider <name>`	config value	Override configured judge providers
`--judge-model <name>`	config value	Override configured judge models
`--judge-api-key <key>`	config value	Override configured judge API keys
`--no-judges`	—	Disable all LLM judge expectations for the run
`--timeout <seconds>`	`900`	Cloud eval polling timeout

Examples:

bash

# Run the starter eval locally
daita eval evals/starter-agent.yaml
 
# Run a single case twice and print JSON
daita eval evals/starter-agent.yaml --case greeting --runs 2 --format json
 
# Rerun the last failed cases
daita eval evals/starter-agent.yaml --failed
 
# Run a deployed cloud suite by config path
daita eval evals/starter-agent.yaml --cloud
 
# Run a deployed cloud suite by name
daita eval --cloud --suite starter-agent-evals

#Eval configs

daita init creates an evals/ directory and a starter eval config. Eval configs are designed to live with your project source so they can be reviewed, versioned, and deployed alongside the agents they test.

text

my-project/
├── agents/
├── workflows/
├── skills/
├── evals/
│   └── starter-agent.yaml
└── daita-project.yaml

When you run daita push, eval configs are included in the project package and registered with the hosted eval API. The dashboard and cloud commands can then list those suites and run them against the active deployment.

#Cloud evals

Cloud eval commands use the production Daita API and require DAITA_API_KEY. There is no --environment flag; hosted evals run against the production environment.

Cloud execution is useful when you want the same infrastructure, secrets, package, and deployment context that your hosted agents use in production. Local execution is better for fast iteration while editing prompts, tools, and expectations.

#eval runs

List and inspect cloud eval run history.

#List runs

bash

daita eval runs

Options:

Flag	Default	Description
`--limit`	`20`	Number of runs to return
`--eval-suite-id <id>`	—	Filter by eval suite id
`--project-name <name>`	current project	Filter by project name
`--status <status>`	—	Filter by run status

#Show a run

bash

daita eval runs show <eval_run_id>

Returns the run summary, including suite name, status, score, duration, artifact location, and timestamps.

#Fetch a report

bash

daita eval runs report <eval_run_id>

Options:

Flag	Default	Description
`--format <format>`	`pretty`	Output format: `pretty`, `json`, `markdown`, or `junit`

The report command fetches the canonical report.json for a run. Use JSON mode for automation or JUnit mode for CI systems that understand test reports.

#eval suites

List and inspect registered cloud eval suites.

#List suites

bash

daita eval suites

Options:

Flag	Default	Description
`--limit`	`20`	Number of suites to return
`--project-name <name>`	current project	Filter by project name
`--agent-name <name>`	—	Filter by target agent
`--status <status>`	`active`	Filter by suite status

#Show a suite

bash

daita eval suites show <eval_suite_id>

Returns the registered suite snapshot, including config path, config hash, target agent or workflow, status, and deployment association.