Evaluar

Evaluating structured table extraction with `TableScorer` — header match rate, key-field completeness, and structural comparison.

Table evaluation in Evaluar measures predicted tables against ground truth at three axes: structure, headers, and key-field completeness. The scorer is TableScorer (src/evaluar/scoring/table.py:81).

A minimal table suite

eval_table.py

from evaluar.api import suite, table

def my_table_extractor(image_url: str) -> dict:
    return {"headers": [...], "rows": [[...]]}

def build_suite(sample_ids=None, config=None):
    pipeline = (
        table("my_table_extractor")
        .callable(my_table_extractor)
        .inputs({"sample_001": {"image_url": "..."}})
        .ground_truth({"sample_001": {
            "headers": ["column_1", "column_2"],
            "rows": [["value_1", "value_2"]],
        }})
        .default_mapping()
        .build()
    )
    s = suite(sample_ids=sample_ids or ["sample_001"], suite_name="table_eval")
    s.add_pipeline("my_table_extractor", pipeline)
    return s

Canonical prediction shape

{
    "headers": ["col_a", "col_b", ...],
    "rows": [
        ["row1_val_a", "row1_val_b", ...],
        ...
    ]
}

The full schema is in src/evaluar/schemas/predictions.py.

Metrics

The table scorer composes:

Metric	Function
Header match rate	`compute_header_match_rate`
Key field completeness	`compute_key_field_completeness`
Structural comparison	`compare_table_structure`, `TableStructureResult`

compare_table_structure compares predicted and ground-truth tables structurally and returns a TableStructureResult carrying per-cell agreement information. Per-metric thresholds are configured on TableScorerConfig and overridable in evaluar/configs/<model>.yaml.

The metrics that exist today are listed above; the table scorer gates only the metrics emitted by the current table evaluator.

Inspecting table failures

Tables don't render in the bbox editor — that subprocess is detection-specific. For table failures, the failure inspector's diff pane is where you'll spend time: it surfaces the structural diff produced by compare_table_structure. See Failure inspection.

Table

A minimal table suite

Canonical prediction shape

Metrics

Inspecting table failures

On this page