xAI · closed · 2025-07
Grok 4
#6 index 86.3
the takexAI's best-documented model, which is faint praise given how little they publish now. Solid math and coding for its era, but the newer Groks are cheaper and this one's 256K context is small by 2026 standards.
GPQA Diamond
87.5
MMLU-Pro
—
AIME (math)
91.7
- Context
- 256K
- Input
- $3/M
- Output
- $15/M
- Speed
- —
- Modality
- text, image
- Strong reasoning and math (AIME 91.7)
- Decent coding (LiveCodeBench 79)
- Mature, widely integrated
- Small 256K context, high price
- Superseded by Grok 4.20 / 4.3
- No official MMLU-Pro or SWE-bench