Normalization
Turning a raw model response into Evaluar's canonical prediction schema — JMESPath mappings, function normalizers, and LLM-backed extraction.
Models return whatever shape they want. Evaluar's scorers expect a canonical prediction schema (src/evaluar/schemas/predictions.py). The normalizer is the bridge.
There are three ways to specify one. They live behind the PipelineBuilder chain.
JMESPath mapping
The default for most pipelines. Declarative, no Python required for simple shapes.
pipeline = (
detection("my_model")
.callable(my_model)
.mapping({
"objects": "prediction[*].{label: label_name, bbox: box, score: score, class_id: label_id}"
})
...
)Per-task default mappings cover the common case:
.default_mapping() # Use the canonical mapping for this task type.This is what evaluar init puts in the scaffold for detection projects.
.mapping(...) and .default_mapping() are mutually exclusive. The mapping dict's keys are output fields on the canonical prediction; the values are JMESPath expressions evaluated against the raw response.
Detection mappings also accept a lightweight label_map for canonicalizing
model labels before scoring:
pipeline = (
detection("door_window_model")
.http(base_url="http://localhost:8000", endpoint="/predict")
.default_mapping(label_map={"door": "Door", "window": "Window"})
...
)The same option works with custom mappings:
.mapping(
{"objects": "detections[*].{label: type, bbox: bbox, score: confidence}"},
label_map={"door": "Door", "window": "Window"},
)Use a full function normalizer when canonicalization needs more than a direct label-to-label lookup.
Function normalizer
When the shape is too irregular for JMESPath, use a Python function.
from evaluar.api import normalizer
@normalizer
def normalize_my_model(raw: dict) -> dict:
return {
"objects": [
{"label": x["name"], "bbox": x["bounds"], "score": x["confidence"]}
for x in raw["detections"]
]
}
pipeline = (
detection("my_model")
.callable(my_model)
.normalizer(normalize_my_model)
...
)The decorator (src/evaluar/api.py:431) accepts:
| Argument | Default | Purpose |
|---|---|---|
supported_models | None | Restrict the normalizer to specific model ids. |
run_in_thread | False | Run in a worker thread. Set this when the function blocks on I/O — e.g. an LLM call — so the Textual loop is not blocked when the suite runs from the TUI. |
Functions can return either a typed prediction model (e.g. an instance from evaluar.schemas) or a canonical dict; dict outputs are validated against the task's prediction schema.
LLM normalizer
When the model's output is unstructured text, you can use a hosted LLM to extract a canonical prediction.
pipeline = (
detection("my_model")
.http(base_url="...", endpoint="/predict")
.llm_normalizer(provider="openai", model="gpt-4o-mini")
...
)| Argument | Default | Purpose |
|---|---|---|
provider | required | One of openai, google. |
model | provider default | Model name. |
system_prompt | task default | Override the extraction prompt. |
api_key | env-resolved | Override the API key. |
api_key defaults are read from the standard provider env vars (OPENAI_API_KEY, GOOGLE_API_KEY).
Transformation normalizer
A variant of LLM normalization where the LLM transforms an already-structured response rather than extracting from raw text.
pipeline = (
detection("my_model")
...
.transformation_normalizer(
transformation="rephrase_label_names",
provider="openai",
model="gpt-4o-mini",
)
)Use this when post-processing benefits from a model — e.g. canonicalizing free-form labels into a fixed taxonomy.
Normalizers in YAML
Mapping and LLM normalizers are expressible in evaluar.yaml:
models:
my_model:
type: detection
normalizer:
type: mapping
mapping:
objects: "prediction[*].{label: label_name, bbox: box, score: score, class_id: label_id}"
label_map:
door: Door
window: WindowSee YAML manifests.
What the canonical schema looks like
Detection (the most common case) expects:
{
"objects": [
{"label": str, "bbox": [x_min, y_min, x_max, y_max], "score": float, "class_id": int | None},
...
]
}OCR and table schemas are documented in src/evaluar/schemas/predictions.py. Validation happens after normalization but before scoring; misshapen output surfaces as a sample-level error rather than crashing the suite.