Anthropic · closed · 2026-05
Claude Opus 4.8
#7 index 86.2
the takeThe one I actually reach for when the task is real code. Top of this list on agentic engineering, and the honesty upgrade is the part that matters: it is far less likely to quietly ship a broken patch and call it done. You pay for that on output.
SWE-bench Verified
88.6
GPQA Diamond
93.6
MMLU-Pro
—
- Context
- 1M
- Input
- $5/M
- Output
- $25/M
- Speed
- —
- Modality
- text, image
- Best-in-class agentic coding for its window
- Much better at flagging its own mistakes
- 1M context, effort control, fast mode
- $25 output is steep
- Anthropic no longer reports GPQA/MMLU/SWE-Verified directly
- High-effort settings burn tokens