Workloft Substrate Score
Most AI agent demos don't survive a procurement security review. The reason is rarely the model — it's everything around the model. The Workloft Substrate Score is a public benchmark for the everything-else: identity, audit, sovereignty, evaluation, replicability. Vendors submit, Walt scores against the published rubric, the leaderboard is open. Substrate before spectacle.
▸ How it works
Submit a runtime
Name, vendor, public URL (docs / repo / product page), one paragraph of plain English. Three submissions per IP per day.
Walt scores
Gemini 2.5 Flash with Google Search grounding reads the URL and any linked documentation, scores 0–10 against each of the nine axes, returns reasoning. Hourly.
Leaderboard goes public
Total score is the mean across the nine axes. Per-axis breakdowns and reasoning are open. Re-submissions allowed when you ship something material.
▸ The 9 axes
The same rubric Walt uses for daily arXiv picks. Defined publicly so the score is replicable.
▸ Leaderboard
Top 25 scored runtimes. Refresh this page after submitting; the scorer runs hourly.
