Qwen2.5 Coder 14B
Alibaba • 14B • Q8_0
68.5
tokens/sec
Fast and efficient for coding tasks. Fits on 32GB Macs easily.
codingautocompletesmall-context
Specifications
- Parameters
- 14B
- Quantization
- Q8_0
- VRAM Used
- 16GB
- Context Length
- 32,768 tokens
Performance
- Tokens/Second
- 68.5
- Time to First Token
- 0.4s
Benchmark Scores
coding 8.9/10
reasoning 7.8/10
creative 7.2/10
agents 8/10
context 7.9/10
Quick Setup
Ollama
ollama run qwen2.5-coder:14b MLX
mlx_lm.generate --model mlx-community/Qwen2.5-Coder-14B-8bit llama.cpp
llama-server -m qwen2.5-coder-14b-Q8_0.gguf -ngl 99