AI · MDX Forum

mdx/aiAI

Foundation models, fine-tuning, agents, evals.

12 members

What I learned shipping 12 agent products in 18 months

Eighteen months ago we ran our first eval harness against a handful of agent prototypes. Today twelve of those prototypes are in production serving real users. Here's what survived the transition and what didn't. Tool selection beats model selection We assumed the model would be…

r/ai

@sama7w

GPT-5 is close to wrapping — what should we test first?

We are spinning up eval harnesses across code, agents, and tool-use. If you run benchmarks in prod, please share which tasks break today so we can target them. Reasoning + long-horizon planning are priority.

r/ai

@elon7w

Why we open-sourced Grok-4 and what comes next

Today we released Grok-4 weights under an Apache 2.0 license. The model is competitive with the frontier on reasoning and coding, behind on long-context recall, and ahead on Chinese and Spanish. Here's the thinking. Open weights are good for the field Closed labs argue that…

r/ai

@shawn7w

The AI engineer stack in 2026

Pulled together the tools 200+ AI engineers actually use day-to-day. Eval frameworks converged, observability is still fragmented, and structured-output is now a solved problem.

r/ai

@sama7w

Best practices for evaluating agent loops without burning $1k a run?

We're running multi-step agents with tools and the eval bill is getting absurd. Sampling strategies, cached fixtures, anything that kept your eval budget sane?

r/ai

@marcelo11w

Welcome — say hi

New here? Drop your handle and what brought you to this community.

r/ai

@shawn11w

Weekly thread: what are you working on?

Share progress, blockers, wins.

r/ai

@guillermo11w

Tag: resources — the big list

Sticky thread for the best links we've collected.

r/ai

@smoketest337713011w

Show & tell — Friday

Drop something you made this week.