Google DeepMind · closed · 2026-02
Gemini 3.1 Pro
#4 index 87.7
the takeThe reasoning ceiling of this whole list. Highest GPQA here, a genuinely multimodal million-token context, and it holds up on real coding. The long-context surcharge is the tax you pay for the ceiling.
SWE-bench Verified
80.6
GPQA Diamond
94.3
MMLU-Pro
91.0
- Context
- 1.048576M
- Input
- $2/M
- Output
- $12/M
- Speed
- 136.2 tok/s
- Modality
- text, image, audio, video
- Top GPQA (94.3) and ARC-AGI-2
- Strong SWE-bench (80.6)
- Native text/image/audio/video, 1M context
- 2x long-context surcharge past 200K
- Slower output than Flash tiers
- Card omits AIME/MMLU-Pro