Weekly Devlog #2: Publishing the Eval Baseline Before GA

This week we focused on one thing that matters for trust: making our AI quality bar legible before GA.

Instead of waiting to publish a polished scorecard later, we published the eval methodology now and made the baseline page public.

What shipped

Public **/eval?utm_source=blog&utm_medium=article&utm_campaign=ga-launch-20260518** page with methodology, scoring rubric, and test categories.
Marketing-site footer link to the eval page.
Cross-document copy alignment so launch comms match the verified eval scope: **23 prompts across 5 categories**.

Why this matters

Most AI products ask for trust without showing how quality is measured. We are doing the opposite: publish the method first, then publish every scored baseline run against that method.

For us, this is a product principle, not a one-time launch asset: - if quality regresses, the release is blocked - if a claim can’t be verified, we soften or remove it - if we learn something new, we update the public artifact

What’s next

Populate the first scored baseline run and publish the per-run numbers on /eval.
Link the final eval URL in the GA blog post before CEO final read.
Continue weekly devlogs with shipped artifacts + lessons, not just announcements.

---

If you’re building AI products, our advice from this week is simple: publish your evaluation standard before you publish your metrics.

Weekly Devlog #2: Publishing the Eval Baseline Before GA

What shipped

Why this matters

What’s next

Join the Continuous Manager

Start Your Growth Narrative