Skip to content

The validator

The validator is the single source of truth for “is this tool well-formed and working.” It’s read-only — it never modifies the tool, only inspects.

You invoke it three ways:

  • Python: validator.validate_tool(tool_path) -> ValidationReport
  • CLI: uv run pixie validate <tool_id>
  • HTTP: GET /api/tools/<id>/validate

All three return the same ValidationReport shape.

The 11 checks

The validator runs checks in order. Schema checks (1–4) collect all problems before stopping. Spawn checks (6–11) stop at the first failure because later checks depend on earlier ones (you can’t validate /schema output if the process didn’t start).

#CheckWhat it verifies
1folder_structuretool.json, pyproject.toml, main.py exist; no weird permissions.
2tool_json_parsesValid JSON; matches the Pydantic ToolManifest model; no unknown top fields.
3schemas_coherentUnique keys; valid types; type-specific required fields; show_if resolves.
4pyproject_parsesParses; declares fastapi, uvicorn, python-dotenv at minimum.
5venv_functional.venv/ exists; python --version succeeds. (Validator never runs uv sync.)
6tool_spawnsProcess starts; /healthz returns 200 within 30 s.
7schema_matches_disk/schema response equals parsed tool.json (drift fails).
8sample_input_runSample inputs from defaults; /run returns 200 within max_runtime_seconds.
9output_conformsEvery declared output key present; value shape matches type; no extra keys.
10streaming_checkIf any output streaming: true, /stream emits ≥1 event within 10 s.
11clean_shutdownSIGTERM → exit within 5 s, else SIGKILL (recorded as warn).
12reference_fixturesOpt-in. Reference outputs compared with type-aware comparators.

How sample inputs are generated

For check 8, the validator builds a single set of sample inputs from the declared schema. The rules:

  • default field present → use it.
  • Otherwise:
    • text, textarea → empty string ""
    • number, slider (non-range) → (min + max) / 2, or 0 if no min/max
    • slider with range: true[min, max]
    • select, radio → first option’s value
    • multiselect → empty list []
    • checkbox, togglefalse
    • date → today
    • time"00:00"
    • datetime → now
    • date_range[today, today]
    • file, image, audio → a 1x1 transparent PNG / 1-second silent WAV encoded as a data URL
    • colour"#000000"
    • json{}
    • code"" (empty source)
    • markdown""
    • tags[]
    • table → empty list []
    • map_point → centre coords if declared; else (0, 0)
    • map_bbox → small box around centre
    • map_polygon, map_multipoint → empty lists
    • hiddendefault (must be present, otherwise check 3 already failed)

Inputs marked required: false with no default are omitted from the sample body — the validator tests the most-zero-effort happy path.

Output conformance

Check 9 walks the response against an internal _OUTPUT_OBJECT_REQUIREMENTS map that knows the required shape per output type. For example:

Output typeRequired keys in value
numbervalue: number (optional format, precision, unit)
tablecolumns: list, rows: list
chart_linex: list[number], series: list[{name, y: list[number]}]
chart_scatterseries: list[{name, points: list[{x, y, label?}]}]
map_pointspoints: list[{lat, lng, label?, colour?}]
imagevalue: str (data URL or http(s) URL)
loglines: list[{level, message, t}]
filefilename: str, data: str, mime_type: str

Missing required keys → fail. Extra keys → warn (your output schema may have drifted; the renderer ignores them but it’s worth knowing).

The report shape

class ValidationCheck(BaseModel):
name: str # e.g. "schema_matches_disk"
status: Literal["pass", "fail", "warn", "skip"]
message: str # one line, human-readable
details: str | None = None # multi-line stderr, diffs, traces
class ValidationReport(BaseModel):
tool_id: str
tool_path: str
timestamp: datetime
overall: Literal["pass", "fail", "warn"]
checks: list[ValidationCheck]
sample_inputs: dict | None = None
sample_output: dict | None = None
spawn_log: str | None = None

The overall is computed:

  • Any fail → overall fail
  • Any warn (no fails) → overall warn
  • All pass/skip → overall pass

How the dashboard uses it

Discovery consults the cached latest report from the validation_reports table:

  • pass — sidebar entry is green, click spawns the tool normally.
  • warn — sidebar entry is yellow, click still spawns; the warnings are shown in a “details” disclosure.
  • fail — sidebar entry is red, click shows the report, not the tool’s UI. There’s a “Re-validate” button.

The settings page for each tool has a “Re-validate” button that runs the validator on demand and shows the new report.

When skills must surface the report

Every skill that creates or modifies a tool ends with a validator call. The skill must not claim success if overall == "fail". Specifically:

  • add-tool-from-description, add-tool-from-repo, wrap-local-script, add-tool-from-notebook, convert-streamlit-app, convert-gradio-app, add-tool-from-cli-command, add-tool-from-openapi-spec, add-tool-from-excel-model, add-tool-from-paper, implement-model-from-spec — all end with a validator call and surface the full report.
  • update-tool, debug-tool, migrate-tool-format, organise-tool, fork-tool, rename-tool — re-run the validator after their change.

Read more: Skills overview.

What the validator does not do

  • It does not run user-specified tests. Bring your own pytest files inside the tool folder — the validator ignores them. (Out of scope for the validator’s responsibility.)
  • It does not test correctness of logic. Check 8 verifies the contract is honoured, not that the answer is right. Use reference fixtures (check 12) for that.
  • It does not retry on transient failures. A flaky tool is a broken tool.
  • It does not modify the tool. If a check fails, the report explains what’s wrong; the skill (or you) does the fix.

Reading the report

In the terminal, uv run pixie validate <id> gives you a rich table with coloured status icons and one-line messages. For machine consumption, --json returns the full structured report.

--summary keeps only the non-pass rows, which is the right default in agent contexts where every token matters.

Terminal window
uv run pixie validate lorenz-ode-solver --summary
uv run pixie validate lorenz-ode-solver --json | jq '.checks[]'
uv run pixie validate lorenz-ode-solver --reference-only # only check 12

See Validator checks reference for the exhaustive table.