Anthropic · closed · 2026-05

Claude Opus 4.8

Item: Claude Opus 4.8
Rating: 86.2

#7 index 86.2

the take
The one I actually reach for when the task is real code. Top of this list on agentic engineering, and the honesty upgrade is the part that matters: it is far less likely to quietly ship a broken patch and call it done. You pay for that on output.

benchmarks

88.6

93.6

—

—

LiveCodeBench (coding)

69.2

specs