xAI · closed · 2025-07

Grok 4

Item: Grok 4
Rating: 86.3

#6 index 86.3

the take
xAI's best-documented model, which is faint praise given how little they publish now. Solid math and coding for its era, but the newer Groks are cheaper and this one's 256K context is small by 2026 standards.

benchmarks

—

87.5

—

91.7

LiveCodeBench (coding)

79.0

specs

Context: 256K
Input: $3/M
Output: $15/M
Speed: —
Modality: text, image

strengths

Strong reasoning and math (AIME 91.7)
Decent coding (LiveCodeBench 79)
Mature, widely integrated

weaknesses

Small 256K context, high price
Superseded by Grok 4.20 / 4.3
No official MMLU-Pro or SWE-bench

Sources: [1][2]

Back to leaderboard Battle Grok 4