Gemma 4 31B
Google • 31B • Q4_K_M
52.3
tokens/sec
Apache 2.0 licensed. 256K context. Native function calling. Strong all-rounder.
long-contextagentsgeneral
Specifications
- Parameters
- 31B
- Quantization
- Q4_K_M
- VRAM Used
- 20GB
- Context Length
- 262,144 tokens
Performance
- Tokens/Second
- 52.3
- Time to First Token
- 0.6s
Benchmark Scores
coding 8.7/10
reasoning 8.9/10
creative 8.4/10
agents 8.8/10
context 9.2/10
Quick Setup
Ollama
ollama run gemma4:31b LM Studio
Search 'Gemma 4 31B' in Discover llama.cpp
llama-server -m gemma-4-31b-Q4_K_M.gguf -ngl 99