What I learned shipping 12 agent products in 18 months
Eighteen months ago we ran our first eval harness against a handful of agent prototypes. Today twelve of those prototypes are in production serving real users. Here's what survived the transition and what didn't. Tool selection beats model selection We assumed the model would be…
