evaluar

Reports

Listing, opening, exporting, cleaning up, and comparing saved runs from the CLI — the `evaluar report` subcommands.

Every Evaluar run writes a single JSON file at <results-dir>/<run_id>.json (default evaluar/results/; see Run storage). The evaluar report subcommands browse, open, export, clean up, and diff those files from the shell.

evaluar report list

Lists recent runs as a table.

evaluar report list
evaluar report list --limit 20
evaluar report list --results-dir /custom/path
FlagDefaultPurpose
--results-direvaluar/resultsWhere to look for runs.
--limit / -n10Maximum number of runs to display.

Implementation: src/evaluar/cli/commands/report.py.

evaluar report show [run_id]

Opens a saved run. Without a run_id, opens the most recent run.

evaluar report show
evaluar report show <run_id>
evaluar report show <run_id> --model my_model
evaluar report show <run_id> --headless
evaluar report show <run_id> --json
FlagDefaultPurpose
--model / -m(none)Show detail for a specific pipeline only.
--results-direvaluar/resultsWhere the run was saved.
--headlessfalsePrint to stdout instead of launching the TUI.
--jsonfalsePrint the saved JSON to stdout (implies --headless).

Implementation: src/evaluar/cli/commands/report.py.

By default this opens the results view for the run. With --headless, it prints a summary table instead — useful for quick checks over SSH.

evaluar report export [run_id]

Writes a saved run to a file. Without a run_id, exports the most recent run.

evaluar report export <run_id> --format json --out run.json
evaluar report export <run_id> --format summary-json --out summary.json
evaluar report export <run_id> --format csv --out metrics.csv
evaluar report export <run_id> --format markdown --out summary.md
evaluar report export --out latest.json
evaluar report export <run_id> --model layout_detector --out layout.json
FlagDefaultPurpose
--format / -fjsonExport the full saved run (json), a compact summary (summary-json), flat metric rows (csv), or Markdown (markdown).
--out / -orequiredFile to write. Parent directories are created when needed.
--model / -m(none)Export a specific pipeline result. With summary-json, keeps only that pipeline in the summary.
--results-direvaluar/resultsWhere the run was saved.

The json format preserves the saved run shape. The summary-json format keeps the rollup verdict, weighted score, failed pipelines, metadata, and one compact entry per pipeline. The csv format writes one row per scalar metric with run_id, scope, model_id, sample_id, task_type, verdict, metric, and value columns. The markdown format writes a compact run summary for CI artifacts, pull request comments, or handoff notes.

Implementation: src/evaluar/cli/commands/report.py.

evaluar report delete <run_id>

Deletes one saved run JSON file.

evaluar report delete <run_id>
evaluar report delete <run_id> --yes
evaluar report delete <run_id> --results-dir /custom/path
FlagDefaultPurpose
--results-direvaluar/resultsWhere the run was saved.
--yes / -yfalseDelete without an interactive confirmation prompt.

Implementation: src/evaluar/cli/commands/report.py.

evaluar report clear

Deletes all saved run JSON files in a results directory.

evaluar report clear
evaluar report clear --yes
evaluar report clear --results-dir /custom/path
FlagDefaultPurpose
--results-direvaluar/resultsDirectory to clear.
--yes / -yfalseClear without an interactive confirmation prompt.

Implementation: src/evaluar/cli/commands/report.py.

evaluar report prune

Deletes saved run JSON files by explicit retention criteria.

evaluar report prune --keep 20
evaluar report prune --older-than 30d
evaluar report prune --keep 20 --older-than 30d --yes
FlagDefaultPurpose
--keep(none)Keep the newest N saved runs and prune older files.
--older-than(none)Prune runs older than a duration such as 30d, 12h, 90m, or 3600s.
--results-direvaluar/resultsDirectory to prune.
--yes / -yfalsePrune without an interactive confirmation prompt.

When both --keep and --older-than are passed, Evaluar deletes the union of matching saved run files.

Implementation: src/evaluar/cli/commands/report.py.

evaluar report archive [run_id]

Creates a local zip archive for a saved run. Without a run_id, archives the most recent run.

evaluar report archive <run_id> --out run.zip
evaluar report archive <run_id> --include-local-artifacts --out run.zip
FlagDefaultPurpose
--out / -orequiredZip file to write.
--include-local-artifactsfalseInclude existing local artifact files referenced by scorecards, such as gt_path or cached image paths.
--results-direvaluar/resultsWhere the run was saved.

Archives always include <run_id>.json and an archive_manifest.json. Remote URLs and missing local files are recorded as skipped instead of downloaded.

Implementation: src/evaluar/cli/commands/report.py.

evaluar report compare <run_a> <run_b>

Opens the compare view for two saved runs.

evaluar report compare <run_a> <run_b>
evaluar report compare <run_a> <run_b> --headless
evaluar report compare <run_a> <run_b> --json --out compare.json
evaluar report compare <run_a> <run_b> --format markdown --out compare.md
FlagDefaultPurpose
--results-direvaluar/resultsWhere the runs were saved.
--headlessfalsePrint the comparison to stdout.
--jsonfalsePrint structured scalar metric deltas as JSON.
--format / -ftableHeadless comparison format: table or markdown.
--out / -o(none)Optional output path for JSON or Markdown comparison output.

Structured compare output includes rollup verdict/weighted-score changes and per-pipeline scalar metric deltas. Nested metric blocks remain in the saved run JSON.

Implementation: src/evaluar/cli/commands/report.py.

What the saved file looks like

Every run is a single JSON file. Top-level shape (mirrors RunnerResult):

{
  "run_id": "...",
  "elapsed_seconds": 0.0,
  "failed_pipelines": [],
  "rollup_scorecard": { "verdict": "pass", "metrics": {}, "thresholds": {}, ... },
  "pipeline_results": {
    "<model_id>": {
      "final_scorecard": { ... },
      "per_sample_scorecards": [ ... ],
      "failed_samples": []
    }
  },
  "metadata": {
    "source": "CLI",
    "suite_name": "...",
    "definition_path": "eval_layout_detector.py",
    "sample_ids": [...],
    "pipeline_ids": [...]
  }
}

See Run storage for the full schema.

On this page