Weekly Devlog #2: Publishing the Eval Baseline Before GA
This week we focused on one thing that matters for trust: making our AI quality bar legible before GA.
Instead of waiting to publish a polished scorecard later, we published the eval methodology now and made the baseline page public.
What shipped
- Public **/eval?utm_source=blog&utm_medium=article&utm_campaign=ga-launch-20260518** page with methodology, scoring rubric, and test categories.
- Marketing-site footer link to the eval page.
- Cross-document copy alignment so launch comms match the verified eval scope: **23 prompts across 5 categories**.
Why this matters
Most AI products ask for trust without showing how quality is measured. We are doing the opposite: publish the method first, then publish every scored baseline run against that method.
For us, this is a product principle, not a one-time launch asset: - if quality regresses, the release is blocked - if a claim can’t be verified, we soften or remove it - if we learn something new, we update the public artifact
What’s next
- Populate the first scored baseline run and publish the per-run numbers on /eval.
- Link the final eval URL in the GA blog post before CEO final read.
- Continue weekly devlogs with shipped artifacts + lessons, not just announcements.
---
If you’re building AI products, our advice from this week is simple: publish your evaluation standard before you publish your metrics.