OCR
Evaluating text recognition with `OCRScorer` — character and word-level metrics composed from `compute_cer` / `compute_wer` / `compute_exact_match`.
OCR evaluation in Evaluar measures recognized text against ground-truth strings. The scorer is OCRScorer (src/evaluar/scoring/ocr.py:79); the metric functions live in src/evaluar/metrics/.
A minimal OCR suite
from evaluar.api import ocr, suite
def my_ocr(image_url: str) -> dict:
return {"text": "..."}
def build_suite(sample_ids=None, config=None):
pipeline = (
ocr("my_ocr")
.callable(my_ocr)
.inputs({"sample_001": {"image_url": "..."}})
.ground_truth({"sample_001": {"text": "expected output"}})
.default_mapping()
.build()
)
s = suite(sample_ids=sample_ids or ["sample_001"], suite_name="ocr_eval")
s.add_pipeline("my_ocr", pipeline)
return sUse ocr(model_id) to get an OCR-configured builder. It is equivalent to PipelineBuilder.for_task("ocr", model_id).
Canonical prediction shape
After normalization:
{"text": "the recognized string"}For region-level OCR (one box → one string), use the multi-region shape from the schemas in src/evaluar/schemas/predictions.py.
Metrics
The OCR scorer composes:
| Metric | Function |
|---|---|
| Character error rate | compute_cer |
| Word error rate | compute_wer |
| Exact match | compute_exact_match |
| Sequence match rate | compute_sequence_match_rate |
The exact set the scorer enables — and the per-metric thresholds — is declared on OCRScorerConfig. Adjust them in your project's evaluar/configs/<model>.yaml (same shape as the detection example in Scorers).
Pairing with detection
For documents, OCR often runs alongside layout detection as a separate pipeline in the same suite. Evaluar aggregates those pipeline verdicts through suite rollup; it does not model stage dependencies between them.