Evaluar

A tour of the Evaluar TUI — what each view does, the bindings that exist, and the OpenCV bbox editor it hands off to.

The Evaluar TUI is built on Textual. It's intentionally small: six views, a couple of modals, and a hand-off to a separate OpenCV process for any time you need to look at predictions on the source image. This page is a tour; the canonical keymap reference is the source of truth for bindings.

Launching the TUI

evaluar

Running the binary with no arguments launches the TUI with the home view (src/evaluar/cli/main.py:45). For a saved run you can also do:

evaluar report show <run_id>

which opens directly into the results view for that run.

The views

The TUI is composed of widget classes under src/evaluar/tui/views/:

View	File	Purpose
Splash	`splash.py`	Brief intro on first boot; fades into Home.
Home	`home.py`	Recent-runs list, primary entry point.
Results	`results.py`	Detail view for a single run — metrics, samples, charts.
Failure inspector	`failure_inspector.py`	Side-by-side diff + samples list for failed records.
Dashboard	`dashboard.py`	Live monitor while a run is in flight.
Compare	`compare.py`	Side-by-side comparison of two saved runs.

Image work is handed off to the bbox editor subprocess; the TUI itself stays focused on run navigation, scorecards, charts, and failure inspection.

Home view

The home view lists every run in the current --results-dir (default evaluar/results/). Each row shows the run id, suite name, verdict, and timestamp.

The global bindings are always live (src/evaluar/tui/app.py:99):

`ctrl`+`h`	Return to the home view
`ctrl`+`l`	Toggle the log pane
`ctrl`+`p`	Open the command palette
`?`	Open the command reference modal
`ctrl`+`c`	Quit

Results view

The results view is what you see after opening a run. Three panes — models on the left, evaluation in the middle, samples + charts on the right.

Bindings (src/evaluar/tui/views/results.py:30):

`e`	Focus the evaluation pane
`g`	Toggle the charts widget
`i`	Open the failure inspector
`o`	Open the bbox editor (overlay, read-only)
`v`	Open the bbox editor (edit ground truth)
`left`+`right`	Move focus between panes

g toggles the charts widget — a metric-over-samples plot rendered with textual-plotext (src/evaluar/tui/widgets/charts.py). Here you can view the AP across IoU thresholds, per-class AP and per-class confusion matrices.

Results view with charts open — Charts pane open. Toggle with `g`.

Reading Detection Charts

These charts are meant to answer practical model-quality questions: did the model find the right objects, put boxes in the right place, and use the right labels?

IoU means "how much does the predicted box overlap the ground-truth box?" A high IoU means the box landed tightly on the expected object. A low IoU means the box is shifted, too large, too small, or on the wrong object.
AP (Average Precision) is a compact score for how well the model balances finding real objects while avoiding extra false detections. Higher is better. In detection work, AP drops when the model misses objects, predicts too many extras, uses the wrong label, or draws boxes that do not overlap enough.
AP across IoU thresholds shows how strictness changes the score. If AP is good at low IoU but falls sharply at high IoU, the model usually knows where objects are roughly located but needs tighter boxes.
Per-class AP helps isolate which labels are weak. A model may be solid on Door but poor on Window, even if the overall score hides that difference.
A confusion matrix shows label mistakes. Rows are ground truth; columns are predictions. The diagonal is where labels match. Off-diagonal cells show what the model confused, and missed entries show objects it failed to detect.

Use these visuals as a triage map. Start with the overall trend, check which classes are weak, then open the failure inspector or bbox overlay to inspect specific samples behind the numbers.

AP across IoU thresholds chart — AP across IoU thresholds. A steep drop at stricter thresholds usually points to loose or shifted boxes.

Per-class confusion matrix — Confusion matrix. The diagonal is correct classification; off-diagonal cells show class mix-ups.

Failure inspector

Press i from the results view to enter the failure inspector. Two panes: a structured diff for the focused sample, and the list of samples in the run.

Bindings (src/evaluar/tui/views/failure_inspector.py:27):

`d`	Focus the diff pane
`s`	Focus the samples pane
`o`	Open the bbox editor (overlay)
`v`	Open the bbox editor (edit)
`tab`	Cycle focus forward
`left`	Cycle focus backward
`b`	Go back to the previous view
`escape`	Go back to the previous view

See Failure inspection for the supported inspection workflow.

Dashboard

While a run is in flight, the dashboard view streams progress. This is the same data the run will write to disk on completion — viewing it live is not required for the run to save.

Compare

evaluar report compare <a> <b> opens the compare view (src/evaluar/tui/views/compare.py). It's a paired layout for diffing two runs at the rollup-scorecard level.

The bbox-editor hand-off

Pressing o or v from the results view or the failure inspector launches the bbox editor as a separate Python process (src/evaluar/tui/handoff.py:138). The TUI itself does not render images.

Bbox editor — overlay mode — Overlay mode is read-only. Predictions in turquoise, ground truth muted.

This separation exists for a practical reason: the TUI is a Textual app and OpenCV's UI is a native window. The bbox editor's keyboard surface is documented on the bbox editor page; it is not a TUI pane and shares no bindings with the rest of the app.

TUI guide

Launching the TUI

The views

Home view

Results view

Reading Detection Charts

Failure inspector

Dashboard

Compare

The bbox-editor hand-off

What about onboarding?

Where to go next

Failure inspection

Bbox editor

Reports

Keymap reference

On this page