Llama 3.3 70B
Meta • 70B • Q4_K_M
24.2
tokens/sec
Meta's flagship open model. Strong creative writing. Good all-rounder but slower.
creativegeneralchat
Specifications
- Parameters
- 70B
- Quantization
- Q4_K_M
- VRAM Used
- 42GB
- Context Length
- 131,072 tokens
Performance
- Tokens/Second
- 24.2
- Time to First Token
- 1.8s
Benchmark Scores
coding 8.3/10
reasoning 8.6/10
creative 8.8/10
agents 8.2/10
context 8.5/10
Quick Setup
Ollama
ollama run llama3.3:70b LM Studio
Search 'Llama 3.3 70B' in Discover llama.cpp
llama-server -m llama-3.3-70b-Q4_K_M.gguf -ngl 99