Compare Models

Select models to compare side-by-side. All tested on Mac Studio M2 Ultra.

Metric	MiniMax M2.5 456B MoE	Qwen3 Coder Next 32B	Gemma 4 31B 31B	Qwen2.5 Coder 14B 14B	Llama 3.3 70B 70B
Provider	MiniMax	Alibaba	Google	Alibaba	Meta
Quantization	Q5_K_M	Q5_K_XL	Q4_K_M	Q8_0	Q4_K_M
VRAM Used	171GB	58GB	20GB	16GB	42GB
Context Length	33K	33K	262K	33K	131K
Tokens/sec	28.4	42.1	52.3	68.5	24.2
Time to First Token	1.2s	0.8s	0.6s	0.4s	1.8s
Benchmark Scores (out of 10)
Coding	8.5	9.2	8.7	8.9	8.3
Reasoning	8.8	8.4	8.9	7.8	8.6
Creative	8.2	7.5	8.4	7.2	8.8
Agents	9	8.6	8.8	8	8.2
Context Handling	8.7	8.3	9.2	7.9	8.5
Best For	agentsgeneral	codingrefactoring	long-contextagents	codingautocomplete	creativegeneral

💻

Qwen3 Coder Next leads with 9.2 coding score and fast 42 tok/s.

🤖

MiniMax M2.5 scores 9.0 on agent tasks with excellent tool calling.

📚

Gemma 4 handles 256K context with 9.2 context score.