Benchmarks Qwen3 Coder Next

Qwen3 Coder Next

Alibaba • 32B • Q5_K_XL
42.1
tokens/sec

Excellent for code generation. Fast inference, strong on structured outputs.

codingrefactoringdebugging

Specifications

Parameters
32B
Quantization
Q5_K_XL
VRAM Used
58GB
Context Length
32,768 tokens

Performance

Tokens/Second
42.1
Time to First Token
0.8s

Benchmark Scores

coding 9.2/10
reasoning 8.4/10
creative 7.5/10
agents 8.6/10
context 8.3/10

Quick Setup

Ollama
ollama run qwen3-coder
MLX
mlx_lm.generate --model mlx-community/Qwen3-Coder-Next-4bit
llama.cpp
llama-server -m Qwen3-Coder-Next-Q5_K_XL.gguf -ngl 99