About // methodology

One opinion, shown its working

I'm Alex Busse. I build software for a living and I use these models every day, so I got tired of leaderboards that rank everything and commit to nothing. This one commits.

How the index works

The llmbusse index is a weighted blend of the benchmarks that best predict real work, tilted toward agentic coding and hard reasoning over trivia. Weights are renormalised over whichever scores a model reports, so a missing number does not tank a model unfairly (though a model with gaps is, fairly, harder to trust).

  • SWE-bench Verified30%
  • GPQA Diamond25%
  • LiveCodeBench (coding)15%
  • AIME (math)15%
  • MMLU-Pro15%

Disagree with the weighting? Good. The numbers are all on the model pages; re-rank them yourself.

Dated and sourced

Every figure is a snapshot as of July 2026, with a source on each model page. Models move fast and so do these numbers. If one is stale or wrong, it is wrong out loud, and I would rather fix it than pretend the ranking is eternal.

The rest of me

The paid work is at busse.com.au. The code and the write-ups are at alexbusse.com.