S

Sam Altman

@sama

GPT-5 is close to wrapping — what should we test first?

We are spinning up eval harnesses across code, agents, and tool-use. If you run benchmarks in prod, please share which tasks break today so we can target them. Reasoning + long-horizon planning are priority.

0

Notes from a quiet week

Less noise on the timeline, more time with the team. The product always gets better when the people building it are less stressed. Obvious but easy to forget.

0

Best practices for evaluating agent loops without burning $1k a run?

We're running multi-step agents with tools and the eval bill is getting absurd. Sampling strategies, cached fixtures, anything that kept your eval budget sane?

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Show & tell — Friday

Drop something you made this week.

0

Sam Altman@sama

Building AGI at OpenAI.

1 followers · 1 following

GPT-5 is close to wrapping — what should we test first?

We are spinning up eval harnesses across code, agents, and tool-use. If you run benchmarks in prod, please share which tasks break today so we can target them. Reasoning + long-horizon planning are priority.

0

Notes from a quiet week

Less noise on the timeline, more time with the team. The product always gets better when the people building it are less stressed. Obvious but easy to forget.

0

Best practices for evaluating agent loops without burning $1k a run?

We're running multi-step agents with tools and the eval bill is getting absurd. Sampling strategies, cached fixtures, anything that kept your eval budget sane?

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Weekly thread: what are you working on?

Share progress, blockers, wins.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Tag: resources — the big list

Sticky thread for the best links we've collected.

0

Show & tell — Friday

Drop something you made this week.

0