Comments

Public comments from @shawn.

on “Best practices for evaluating agent loops without burning $1k a run?” · Apr 25

We run a fixed set of 50 cached trajectories nightly + a 10-prompt smoke set on every PR. Full eval only on release candidates.

on “Edge runtime vs Node runtime in 2026 — when do you still pick Node?” · Apr 25

Long-running streams (>30s) and anything needing the full Node stdlib (sharp, native deps). Edge is default for everything else.

on “Shipping pixel-perfect Figma via transform: scale()” · Apr 23

We tried this for a marketing page. Lighthouse was fine, keyboard nav wasn't.

on “GPT-5 is close to wrapping — what should we test first?” · Apr 23

Long-horizon planning is the one I'd pay for. Give us day-long agent tasks.

Comments

Public comments from @shawn.

on “Best practices for evaluating agent loops without burning $1k a run?” · Apr 25

We run a fixed set of 50 cached trajectories nightly + a 10-prompt smoke set on every PR. Full eval only on release candidates.

on “Edge runtime vs Node runtime in 2026 — when do you still pick Node?” · Apr 25

Long-running streams (>30s) and anything needing the full Node stdlib (sharp, native deps). Edge is default for everything else.

on “Shipping pixel-perfect Figma via transform: scale()” · Apr 23

We tried this for a marketing page. Lighthouse was fine, keyboard nav wasn't.

on “GPT-5 is close to wrapping — what should we test first?” · Apr 23

Long-horizon planning is the one I'd pay for. Give us day-long agent tasks.