Qwen3 Coder Next
Alibaba • 32B • Q5_K_XL
42.1
tokens/sec
Excellent for code generation. Fast inference, strong on structured outputs.
codingrefactoringdebugging
Specifications
- Parameters
- 32B
- Quantization
- Q5_K_XL
- VRAM Used
- 58GB
- Context Length
- 32,768 tokens
Performance
- Tokens/Second
- 42.1
- Time to First Token
- 0.8s
Benchmark Scores
coding 9.2/10
reasoning 8.4/10
creative 7.5/10
agents 8.6/10
context 8.3/10
Quick Setup
Ollama
ollama run qwen3-coder MLX
mlx_lm.generate --model mlx-community/Qwen3-Coder-Next-4bit llama.cpp
llama-server -m Qwen3-Coder-Next-Q5_K_XL.gguf -ngl 99