evaluar

TUI guide

A tour of the Evaluar TUI — what each view does, the bindings that exist, and the OpenCV bbox editor it hands off to.

The Evaluar TUI is built on Textual. It's intentionally small: six views, a couple of modals, and a hand-off to a separate OpenCV process for any time you need to look at predictions on the source image. This page is a tour; the canonical keymap reference is the source of truth for bindings.

Launching the TUI

evaluar

Running the binary with no arguments launches the TUI with the home view (src/evaluar/cli/main.py:45). For a saved run you can also do:

evaluar report show <run_id>

which opens directly into the results view for that run.

The views

The TUI is composed of widget classes under src/evaluar/tui/views/:

ViewFilePurpose
Splashsplash.pyBrief intro on first boot; fades into Home.
Homehome.pyRecent-runs list, primary entry point.
Resultsresults.pyDetail view for a single run — metrics, samples, charts.
Failure inspectorfailure_inspector.pySide-by-side diff + samples list for failed records.
Dashboarddashboard.pyLive monitor while a run is in flight.
Comparecompare.pySide-by-side comparison of two saved runs.

Image work is handed off to the bbox editor subprocess; the TUI itself stays focused on run navigation, scorecards, charts, and failure inspection.

Home view

The home view lists every run in the current --results-dir (default evaluar/results/). Each row shows the run id, suite name, verdict, and timestamp.

evaluar
Home view + opened run
The home view alongside an opened run. The home view is keyboard-navigable; pressing Enter on a run opens the results view.

The global bindings are always live (src/evaluar/tui/app.py:99):

ctrl+hReturn to the home view
ctrl+lToggle the log pane
ctrl+pOpen the command palette
?Open the command reference modal
ctrl+cQuit

Results view

The results view is what you see after opening a run. Three panes — models on the left, evaluation in the middle, samples + charts on the right.

evaluar — results
Opened run in the results view
The results view. Verdicts roll up from per-sample to per-pipeline to per-suite.

Bindings (src/evaluar/tui/views/results.py:30):

eFocus the evaluation pane
gToggle the charts widget
iOpen the failure inspector
oOpen the bbox editor (overlay, read-only)
vOpen the bbox editor (edit ground truth)
left+rightMove focus between panes

g toggles the charts widget — a metric-over-samples plot rendered with textual-plotext (src/evaluar/tui/widgets/charts.py). Here you can view the AP across IoU thresholds, per-class AP and per-class confusion matrices.

evaluar — results (charts)
Results view with charts open
Charts pane open. Toggle with `g`.

Reading Detection Charts

These charts are meant to answer practical model-quality questions: did the model find the right objects, put boxes in the right place, and use the right labels?

  • IoU means "how much does the predicted box overlap the ground-truth box?" A high IoU means the box landed tightly on the expected object. A low IoU means the box is shifted, too large, too small, or on the wrong object.
  • AP (Average Precision) is a compact score for how well the model balances finding real objects while avoiding extra false detections. Higher is better. In detection work, AP drops when the model misses objects, predicts too many extras, uses the wrong label, or draws boxes that do not overlap enough.
  • AP across IoU thresholds shows how strictness changes the score. If AP is good at low IoU but falls sharply at high IoU, the model usually knows where objects are roughly located but needs tighter boxes.
  • Per-class AP helps isolate which labels are weak. A model may be solid on Door but poor on Window, even if the overall score hides that difference.
  • A confusion matrix shows label mistakes. Rows are ground truth; columns are predictions. The diagonal is where labels match. Off-diagonal cells show what the model confused, and missed entries show objects it failed to detect.

Use these visuals as a triage map. Start with the overall trend, check which classes are weak, then open the failure inspector or bbox overlay to inspect specific samples behind the numbers.

evaluar — AP / IoU
AP across IoU thresholds chart
AP across IoU thresholds. A steep drop at stricter thresholds usually points to loose or shifted boxes.
evaluar — confusion matrix
Per-class confusion matrix
Confusion matrix. The diagonal is correct classification; off-diagonal cells show class mix-ups.

Failure inspector

Press i from the results view to enter the failure inspector. Two panes: a structured diff for the focused sample, and the list of samples in the run.

evaluar — failure_inspector
Failure inspector
Inspector panes. Pressing `tab` cycles focus; `o` and `v` hand off to the bbox editor.

Bindings (src/evaluar/tui/views/failure_inspector.py:27):

dFocus the diff pane
sFocus the samples pane
oOpen the bbox editor (overlay)
vOpen the bbox editor (edit)
tabCycle focus forward
leftCycle focus backward
bGo back to the previous view
escapeGo back to the previous view

See Failure inspection for the supported inspection workflow.

Dashboard

While a run is in flight, the dashboard view streams progress. This is the same data the run will write to disk on completion — viewing it live is not required for the run to save.

evaluar — dashboard
Dashboard view
The dashboard while a run is in flight. Progress and verdicts update incrementally.

Compare

evaluar report compare <a> <b> opens the compare view (src/evaluar/tui/views/compare.py). It's a paired layout for diffing two runs at the rollup-scorecard level.

The bbox-editor hand-off

Pressing o or v from the results view or the failure inspector launches the bbox editor as a separate Python process (src/evaluar/tui/handoff.py:138). The TUI itself does not render images.

bbox_editor — overlay
Bbox editor — overlay mode
Overlay mode is read-only. Predictions in turquoise, ground truth muted.

This separation exists for a practical reason: the TUI is a Textual app and OpenCV's UI is a native window. The bbox editor's keyboard surface is documented on the bbox editor page; it is not a TUI pane and shares no bindings with the rest of the app.

What about onboarding?

A first boot with no saved runs lands on the splash + home views (src/evaluar/tui/views/splash.py, home.py). The recommended first step from there is evaluar init <task> from a separate shell — see Install & init.

Where to go next

On this page