Install & init
Install Evaluar and scaffold a working project with `evaluar init`.
The fastest way to a real Evaluar setup is evaluar init <task>. It writes a runnable eval_<name>.py, a manifest, and a default scorer config so the first thing you do is iterate on a real pipeline rather than configure one from scratch.
Install
Evaluar currently installs from the private GitHub repository and requires Python 3.10 or later. Plain SSH clone works only after your GitHub account has repository access and a configured SSH key. For most downstream projects, use an authenticated HTTPS install or add an existing local checkout.
uv project
Use this when the downstream repository already has a pyproject.toml:
uv add "evaluar @ git+https://github.com/Koiiichi/evaluar.git"
uv run evaluar --versionExisting virtualenv / requirements repo
Use this for repositories that do not have a pyproject.toml, such as older
requirements.txt projects:
uv pip install --python .venv/bin/python "evaluar @ git+https://github.com/Koiiichi/evaluar.git"
.venv/bin/evaluar --versionOr activate the environment and use pip:
source .venv/bin/activate
pip install "evaluar @ git+https://github.com/Koiiichi/evaluar.git"
evaluar --versionuv add edits project metadata, so it requires pyproject.toml. uv pip install and pip install install into an environment, so they are the right
choice for legacy or service repos that already manage dependencies another way.
If the repository is private in your environment, authenticate first with
gh auth login, or configure a GitHub token that has repository access. If you
already have a checkout, uv add ../evaluar remains supported. The CLI is
registered as evaluar (pyproject.toml [project.scripts]).
Scaffold a project
evaluar init <task_type>where <task_type> is one of detection, ocr, table, merged (src/evaluar/cli/commands/init.py:19).
Flags
| Flag | Default | Description |
|---|---|---|
--name / -n | current directory name | Project / model name. |
--dir / -d | evaluar | Directory to write project files into. |
--force | false | Overwrite existing files. |
--github-actions | false | Also generate .github/workflows/evaluar.yml. |
Example
✓ Created eval_layout_detector.py ← development + CI/CD entry point ✓ Created evaluar/configs/layout_detector.yaml ✓ Created evaluar/ground_truth/layout_detector_gt.json ✓ Created evaluar.yaml
After init you'll have:
.
├── eval_layout_detector.py
├── evaluar.yaml
└── evaluar/
├── configs/
│ └── layout_detector.yaml
└── ground_truth/
└── layout_detector_gt.jsoneval_layout_detector.py— the eval file. Exposesbuild_suite(sample_ids, config). This is the fileevaluar eval_layout_detector.pywill execute.evaluar.yaml— the manifest, declaring the project name, results directory, and per-model config (see YAML manifests).evaluar/configs/<model>.yaml— scorer thresholds. Edited as your model improves.evaluar/ground_truth/<model>_gt.json— a stub ground-truth file. Replace with real labels.
First run
The scaffolded eval file is runnable as-is — it ships a stub model that returns a hard-coded prediction matching the stub ground truth, so the first run produces a deterministic pass.
python eval_layout_detector.py
# → Run run_…: passOr via the CLI:
evaluar eval_layout_detector.pyBoth paths save the result to evaluar/results/<run_id>.json (see Run storage).
To open it in the TUI:
evaluarWhat --github-actions adds
When you pass --github-actions, Evaluar also writes a minimal workflow at .github/workflows/evaluar.yml:
name: Evaluar
on:
push:
branches: [main]
pull_request:
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
path: project
- uses: actions/checkout@v4
with:
repository: Koiiichi/evaluar
token: ${{ secrets.EVALUAR_REPO_TOKEN }}
path: evaluar-src
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install evaluar
run: pip install ./evaluar-src
- name: Run evaluations
working-directory: project
run: evaluar test --headlessThis is the literal template (src/evaluar/cli/commands/init.py:336). Adapt it as needed — see Headless / CI for how to gate on results.