Comments
Public comments from @shawn.
We run a fixed set of 50 cached trajectories nightly + a 10-prompt smoke set on every PR. Full eval only on release candidates.
Long-running streams (>30s) and anything needing the full Node stdlib (sharp, native deps). Edge is default for everything else.
We tried this for a marketing page. Lighthouse was fine, keyboard nav wasn't.
Long-horizon planning is the one I'd pay for. Give us day-long agent tasks.
