Benchmarks Llama 3.3 70B

Llama 3.3 70B

Meta • 70B • Q4_K_M
24.2
tokens/sec

Meta's flagship open model. Strong creative writing. Good all-rounder but slower.

creativegeneralchat

Specifications

Parameters
70B
Quantization
Q4_K_M
VRAM Used
42GB
Context Length
131,072 tokens

Performance

Tokens/Second
24.2
Time to First Token
1.8s

Benchmark Scores

coding 8.3/10
reasoning 8.6/10
creative 8.8/10
agents 8.2/10
context 8.5/10

Quick Setup

Ollama
ollama run llama3.3:70b
LM Studio
Search 'Llama 3.3 70B' in Discover
llama.cpp
llama-server -m llama-3.3-70b-Q4_K_M.gguf -ngl 99