Introduction
Evaluar is an evaluation framework for vision and document systems — code-first pipelines, threshold-driven scorers, and a Textual TUI for inspecting runs.
Evaluar is an evaluation framework for the kind of systems where a metric drop is rarely the whole story. It pairs a code-first Python API with a terminal UI for opening individual runs and a hand-off to an OpenCV bbox editor when you need to look at predictions on the source image.

Evaluar is an early preview (v0.1). The framework runs end-to-end for detection, OCR, and table tasks and is in active use internally. Surface area outside what's documented here may change between minor versions.
What's actually in the box
Code-first suites
Define an evaluation as an eval_*.py file that exposes build_suite(...). Run one file directly or run all of them with `evaluar test`.
Threshold-driven scorers
Detection, OCR, table, and merged scorers each apply MetricThreshold(pass_floor, warn_floor) to produce a pass / warn / fail verdict.
A small, sharp TUI
Open a saved run, focus the failing samples, then hand off to the bbox editor for visual inspection. Built on Textual.
The two surfaces
Evaluar has one Python API and one terminal binary. Both lead to the same on-disk artifacts.
- Python.
evaluar.apiexposesPipelineBuilder,EvaluarSuite, task helpers (detection(...),ocr(...),table(...),merged(...)),suite(...), and the@normalizerdecorator. A pipeline is built with thePipelineBuilderchain; a suite collects one or more pipelines and runs them. - CLI. One binary, all defined in
src/evaluar/cli/—evaluaropens the TUI,evaluar eval_file.pyruns one eval file,evaluar testdiscovers alleval_*.pyfiles, andevaluar reportworks with saved runs.
Every run produces a single JSON file at evaluar/results/<run_id>.json (configurable via --results-dir). That file is what the TUI opens, what evaluar report compare diffs, and what CI gates can parse.
Where to go next
Install & init
Install Evaluar and run `evaluar init detection` to scaffold a working pipeline in your repo.
Quick start
From a fresh clone to your first opened run in under five minutes.
Core concepts
The five primitives Evaluar is actually built on — Suite, PipelineBuilder, Connector, Normalizer, Scorer.
TUI guide
Every view, every binding, every hand-off — verified against `src/evaluar/tui/`.
What Evaluar isn't
Evaluar is intentionally narrow. It does not train models, host datasets, or replace a general annotation tool. The bbox editor exists to make small ground-truth corrections survivable inside an inspection workflow — not to label a fresh dataset from scratch.