Evaluar

Evaluating text recognition with `OCRScorer` — character and word-level metrics composed from `compute_cer` / `compute_wer` / `compute_exact_match`.

OCR evaluation in Evaluar measures recognized text against ground-truth strings. The scorer is OCRScorer (src/evaluar/scoring/ocr.py:79); the metric functions live in src/evaluar/metrics/.

A minimal OCR suite

eval_ocr.py

from evaluar.api import ocr, suite

def my_ocr(image_url: str) -> dict:
    return {"text": "..."}

def build_suite(sample_ids=None, config=None):
    pipeline = (
        ocr("my_ocr")
        .callable(my_ocr)
        .inputs({"sample_001": {"image_url": "..."}})
        .ground_truth({"sample_001": {"text": "expected output"}})
        .default_mapping()
        .build()
    )
    s = suite(sample_ids=sample_ids or ["sample_001"], suite_name="ocr_eval")
    s.add_pipeline("my_ocr", pipeline)
    return s

Use ocr(model_id) to get an OCR-configured builder. It is equivalent to PipelineBuilder.for_task("ocr", model_id).

Canonical prediction shape

After normalization:

{"text": "the recognized string"}

For region-level OCR (one box → one string), use the multi-region shape from the schemas in src/evaluar/schemas/predictions.py.

Metrics

The OCR scorer composes:

Metric	Function
Character error rate	`compute_cer`
Word error rate	`compute_wer`
Exact match	`compute_exact_match`
Sequence match rate	`compute_sequence_match_rate`

The exact set the scorer enables — and the per-metric thresholds — is declared on OCRScorerConfig. Adjust them in your project's evaluar/configs/<model>.yaml (same shape as the detection example in Scorers).

Pairing with detection

For documents, OCR often runs alongside layout detection as a separate pipeline in the same suite. Evaluar aggregates those pipeline verdicts through suite rollup; it does not model stage dependencies between them.

OCR

A minimal OCR suite

Canonical prediction shape

Metrics

Pairing with detection

On this page