Benchmarks Gemma 4 31B

Gemma 4 31B

Google • 31B • Q4_K_M
52.3
tokens/sec

Apache 2.0 licensed. 256K context. Native function calling. Strong all-rounder.

long-contextagentsgeneral

Specifications

Parameters
31B
Quantization
Q4_K_M
VRAM Used
20GB
Context Length
262,144 tokens

Performance

Tokens/Second
52.3
Time to First Token
0.6s

Benchmark Scores

coding 8.7/10
reasoning 8.9/10
creative 8.4/10
agents 8.8/10
context 9.2/10

Quick Setup

Ollama
ollama run gemma4:31b
LM Studio
Search 'Gemma 4 31B' in Discover
llama.cpp
llama-server -m gemma-4-31b-Q4_K_M.gguf -ngl 99