Table
Evaluating structured table extraction with `TableScorer` — header match rate, key-field completeness, and structural comparison.
Table evaluation in Evaluar measures predicted tables against ground truth at three axes: structure, headers, and key-field completeness. The scorer is TableScorer (src/evaluar/scoring/table.py:81).
A minimal table suite
from evaluar.api import suite, table
def my_table_extractor(image_url: str) -> dict:
return {"headers": [...], "rows": [[...]]}
def build_suite(sample_ids=None, config=None):
pipeline = (
table("my_table_extractor")
.callable(my_table_extractor)
.inputs({"sample_001": {"image_url": "..."}})
.ground_truth({"sample_001": {
"headers": ["column_1", "column_2"],
"rows": [["value_1", "value_2"]],
}})
.default_mapping()
.build()
)
s = suite(sample_ids=sample_ids or ["sample_001"], suite_name="table_eval")
s.add_pipeline("my_table_extractor", pipeline)
return sCanonical prediction shape
{
"headers": ["col_a", "col_b", ...],
"rows": [
["row1_val_a", "row1_val_b", ...],
...
]
}The full schema is in src/evaluar/schemas/predictions.py.
Metrics
The table scorer composes:
| Metric | Function |
|---|---|
| Header match rate | compute_header_match_rate |
| Key field completeness | compute_key_field_completeness |
| Structural comparison | compare_table_structure, TableStructureResult |
compare_table_structure compares predicted and ground-truth tables structurally and returns a TableStructureResult carrying per-cell agreement information. Per-metric thresholds are configured on TableScorerConfig and overridable in evaluar/configs/<model>.yaml.
The metrics that exist today are listed above; the table scorer gates only the metrics emitted by the current table evaluator.
Inspecting table failures
Tables don't render in the bbox editor — that subprocess is detection-specific. For table failures, the failure inspector's diff pane is where you'll spend time: it surfaces the structural diff produced by compare_table_structure. See Failure inspection.