evaluar

Introduction

Evaluar is an evaluation framework for vision and document systems — code-first pipelines, threshold-driven scorers, and a Textual TUI for inspecting runs.

Evaluar is an evaluation framework for the kind of systems where a metric drop is rarely the whole story. It pairs a code-first Python API with a terminal UI for opening individual runs and a hand-off to an OpenCV bbox editor when you need to look at predictions on the source image.

evaluar
Evaluar TUI showing a saved run
The TUI launches when you run `evaluar` with no arguments. The home view lists recent runs; opening one drops you into the results view.

Evaluar is an early preview (v0.1). The framework runs end-to-end for detection, OCR, and table tasks and is in active use internally. Surface area outside what's documented here may change between minor versions.

What's actually in the box

The two surfaces

Evaluar has one Python API and one terminal binary. Both lead to the same on-disk artifacts.

  • Python. evaluar.api exposes PipelineBuilder, EvaluarSuite, task helpers (detection(...), ocr(...), table(...), merged(...)), suite(...), and the @normalizer decorator. A pipeline is built with the PipelineBuilder chain; a suite collects one or more pipelines and runs them.
  • CLI. One binary, all defined in src/evaluar/cli/evaluar opens the TUI, evaluar eval_file.py runs one eval file, evaluar test discovers all eval_*.py files, and evaluar report works with saved runs.

Every run produces a single JSON file at evaluar/results/<run_id>.json (configurable via --results-dir). That file is what the TUI opens, what evaluar report compare diffs, and what CI gates can parse.

Where to go next

What Evaluar isn't

Evaluar is intentionally narrow. It does not train models, host datasets, or replace a general annotation tool. The bbox editor exists to make small ground-truth corrections survivable inside an inspection workflow — not to label a fresh dataset from scratch.

On this page