evaluar

Table

Evaluating structured table extraction with `TableScorer` — header match rate, key-field completeness, and structural comparison.

Table evaluation in Evaluar measures predicted tables against ground truth at three axes: structure, headers, and key-field completeness. The scorer is TableScorer (src/evaluar/scoring/table.py:81).

A minimal table suite

eval_table.py
from evaluar.api import suite, table

def my_table_extractor(image_url: str) -> dict:
    return {"headers": [...], "rows": [[...]]}

def build_suite(sample_ids=None, config=None):
    pipeline = (
        table("my_table_extractor")
        .callable(my_table_extractor)
        .inputs({"sample_001": {"image_url": "..."}})
        .ground_truth({"sample_001": {
            "headers": ["column_1", "column_2"],
            "rows": [["value_1", "value_2"]],
        }})
        .default_mapping()
        .build()
    )
    s = suite(sample_ids=sample_ids or ["sample_001"], suite_name="table_eval")
    s.add_pipeline("my_table_extractor", pipeline)
    return s

Canonical prediction shape

{
    "headers": ["col_a", "col_b", ...],
    "rows": [
        ["row1_val_a", "row1_val_b", ...],
        ...
    ]
}

The full schema is in src/evaluar/schemas/predictions.py.

Metrics

The table scorer composes:

MetricFunction
Header match ratecompute_header_match_rate
Key field completenesscompute_key_field_completeness
Structural comparisoncompare_table_structure, TableStructureResult

compare_table_structure compares predicted and ground-truth tables structurally and returns a TableStructureResult carrying per-cell agreement information. Per-metric thresholds are configured on TableScorerConfig and overridable in evaluar/configs/<model>.yaml.

The metrics that exist today are listed above; the table scorer gates only the metrics emitted by the current table evaluator.

Inspecting table failures

Tables don't render in the bbox editor — that subprocess is detection-specific. For table failures, the failure inspector's diff pane is where you'll spend time: it surfaces the structural diff produced by compare_table_structure. See Failure inspection.

On this page