Compare Models

Select models to compare side-by-side. All tested on Mac Studio M2 Ultra.

Metric
MiniMax M2.5
456B MoE
Qwen3 Coder Next
32B
Gemma 4 31B
31B
Qwen2.5 Coder 14B
14B
Llama 3.3 70B
70B
Provider MiniMaxAlibabaGoogleAlibabaMeta
Quantization Q5_K_MQ5_K_XLQ4_K_MQ8_0Q4_K_M
VRAM Used 171GB58GB20GB16GB42GB
Context Length 33K33K262K33K131K
Tokens/sec 28.442.152.368.524.2
Time to First Token 1.2s0.8s0.6s0.4s1.8s
Benchmark Scores (out of 10)
Coding 8.5 9.2 8.7 8.9 8.3
Reasoning 8.8 8.4 8.9 7.8 8.6
Creative 8.2 7.5 8.4 7.2 8.8
Agents 9 8.6 8.8 8 8.2
Context Handling 8.7 8.3 9.2 7.9 8.5
Best For
agentsgeneral
codingrefactoring
long-contextagents
codingautocomplete
creativegeneral
💻

Best for Coding

Qwen3 Coder Next leads with 9.2 coding score and fast 42 tok/s.

View details →
🤖

Best for Agents

MiniMax M2.5 scores 9.0 on agent tasks with excellent tool calling.

View details →
📚

Best for Long Context

Gemma 4 handles 256K context with 9.2 context score.

View details →