evaluar

Install & init

Install Evaluar and scaffold a working project with `evaluar init`.

The fastest way to a real Evaluar setup is evaluar init <task>. It writes a runnable eval_<name>.py, a manifest, and a default scorer config so the first thing you do is iterate on a real pipeline rather than configure one from scratch.

Install

Evaluar currently installs from the private GitHub repository and requires Python 3.10 or later. Plain SSH clone works only after your GitHub account has repository access and a configured SSH key. For most downstream projects, use an authenticated HTTPS install or add an existing local checkout.

uv project

Use this when the downstream repository already has a pyproject.toml:

uv add "evaluar @ git+https://github.com/Koiiichi/evaluar.git"
uv run evaluar --version

Existing virtualenv / requirements repo

Use this for repositories that do not have a pyproject.toml, such as older requirements.txt projects:

uv pip install --python .venv/bin/python "evaluar @ git+https://github.com/Koiiichi/evaluar.git"
.venv/bin/evaluar --version

Or activate the environment and use pip:

source .venv/bin/activate
pip install "evaluar @ git+https://github.com/Koiiichi/evaluar.git"
evaluar --version

uv add edits project metadata, so it requires pyproject.toml. uv pip install and pip install install into an environment, so they are the right choice for legacy or service repos that already manage dependencies another way.

If the repository is private in your environment, authenticate first with gh auth login, or configure a GitHub token that has repository access. If you already have a checkout, uv add ../evaluar remains supported. The CLI is registered as evaluar (pyproject.toml [project.scripts]).

Scaffold a project

evaluar init <task_type>

where <task_type> is one of detection, ocr, table, merged (src/evaluar/cli/commands/init.py:19).

Flags

FlagDefaultDescription
--name / -ncurrent directory nameProject / model name.
--dir / -devaluarDirectory to write project files into.
--forcefalseOverwrite existing files.
--github-actionsfalseAlso generate .github/workflows/evaluar.yml.

Example

shell
$evaluar init detection --name layout_detector
  ✓ Created eval_layout_detector.py  ← development + CI/CD entry point
✓ Created evaluar/configs/layout_detector.yaml
✓ Created evaluar/ground_truth/layout_detector_gt.json
✓ Created evaluar.yaml

After init you'll have:

.
├── eval_layout_detector.py
├── evaluar.yaml
└── evaluar/
    ├── configs/
    │   └── layout_detector.yaml
    └── ground_truth/
        └── layout_detector_gt.json
  • eval_layout_detector.py — the eval file. Exposes build_suite(sample_ids, config). This is the file evaluar eval_layout_detector.py will execute.
  • evaluar.yaml — the manifest, declaring the project name, results directory, and per-model config (see YAML manifests).
  • evaluar/configs/<model>.yaml — scorer thresholds. Edited as your model improves.
  • evaluar/ground_truth/<model>_gt.json — a stub ground-truth file. Replace with real labels.

First run

The scaffolded eval file is runnable as-is — it ships a stub model that returns a hard-coded prediction matching the stub ground truth, so the first run produces a deterministic pass.

python eval_layout_detector.py
# → Run run_…: pass

Or via the CLI:

evaluar eval_layout_detector.py

Both paths save the result to evaluar/results/<run_id>.json (see Run storage).

To open it in the TUI:

evaluar

What --github-actions adds

When you pass --github-actions, Evaluar also writes a minimal workflow at .github/workflows/evaluar.yml:

.github/workflows/evaluar.yml
name: Evaluar
on:
  push:
    branches: [main]
  pull_request:

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          path: project
      - uses: actions/checkout@v4
        with:
          repository: Koiiichi/evaluar
          token: ${{ secrets.EVALUAR_REPO_TOKEN }}
          path: evaluar-src
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - name: Install evaluar
        run: pip install ./evaluar-src
      - name: Run evaluations
        working-directory: project
        run: evaluar test --headless

This is the literal template (src/evaluar/cli/commands/init.py:336). Adapt it as needed — see Headless / CI for how to gate on results.

Where to go next

On this page