Hardware Guide

What can you run on your Mac? Here's what fits at each memory tier.

🖥️

Our Test Machine

All benchmarks run on this hardware
Model
Mac Studio M2 Ultra
Memory
512GB Unified
Chip
M2 Ultra (24-core CPU, 76-core GPU)
OS
macOS 15.4

What Fits Your Mac?

32GB Unified Memory

Entry Level

MacBook Air M3, MacBook Pro base configs. Good for smaller models.

Recommended Models

  • Qwen2.5 Coder 14B Q8 — 16GB, best coding option at this tier
  • Gemma 4 31B Q4 — 20GB, squeezes in with some headroom
  • Llama 3.1 8B — Lightweight, fast, general purpose

64GB Unified Memory

Sweet Spot

MacBook Pro upgraded, Mac Mini M4 Pro. The productivity sweet spot.

Recommended Models

  • Llama 3.3 70B Q4 — 42GB, excellent all-rounder
  • Qwen3 Coder Next Q5 — 58GB, top coding model
  • DeepSeek Coder V2 — Strong reasoning + code

128GB Unified Memory

Pro Tier

Mac Studio M2 Max, MacBook Pro maxed. Run most models comfortably.

Recommended Models

  • Llama 3.3 70B Q6 — Higher quality quant
  • Mixtral 8x22B — MoE architecture, fast inference
  • Multiple 30B models — Run 2-3 models concurrently

192GB–512GB Unified Memory

Ultra Tier

Mac Studio M2 Ultra / M4 Ultra. Run anything, including massive MoE models.

What Opens Up

  • MiniMax M2.5 — 171GB, 456B MoE, API-quality local
  • DeepSeek V3 — Massive reasoning model
  • 3+ models hot — Route between specialized models
  • Full precision 70B — No quantization needed

Performance Tips

🎯 Leave Headroom

Don't use 100% of your RAM for the model. Leave 10-20% for system and context. If you have 64GB, target models under 50GB loaded.

⚡ GPU Layers

Always use -ngl 99 or equivalent to offload all layers to GPU. Apple Silicon's unified memory makes this seamless.

📊 Quantization Trade-offs

Q4 saves memory but loses quality. Q6/Q8 is closer to full precision. For coding, higher quants often matter more than for chat.

🔄 Hot Swapping

With enough RAM, keep multiple models loaded and route between them. No cold start penalty — instant switching.

Buying Advice

If you're buying a Mac for local LLMs, memory is the most important spec. You can't upgrade it later.

  • Casual use: 32GB works, but you'll hit limits quickly
  • Serious work: 64GB minimum, 128GB recommended
  • Production/multi-model: 192GB+ (Mac Studio territory)

The difference between a frustrating experience and a great one is often just RAM.