SYSTEM REQUIREMENTS

Before we dive in, let's make sure your hardware can handle local AI. The good news: it's more accessible than you might think.

The Golden Rule: RAM is Everything

Running AI locally is about one thing: memory. The model has to fit in RAM (or VRAM) to run. No exceptions.

Here's the simple math:

Model parameters × Quantization = RAM needed

22B parameters × Q6 (~0.75 bytes/param) = ~16.5GB
22B parameters × Q4 (~0.5 bytes/param)  = ~11GB
12B parameters × Q6 (~0.75 bytes/param) = ~9GB

But you also need headroom for:

The operating system (~4-8GB)
Context window (conversation history)
Other applications

So the PRACTICAL requirements are:

RAM Requirements by Model Size

Model Size    Min RAM    Comfortable    Ideal
-------------------------------------------------
7B  (Q4)      8GB        16GB           16GB+
12B (Q6)      16GB       24GB           32GB
22B (Q4)      16GB       32GB           48GB+
22B (Q6)      24GB       48GB           64GB+
70B (Q4)      48GB       64GB           96GB+

For tonight's RPMax 22B at Q6:

Minimum: 24GB (tight, might swap)
Comfortable: 32GB
Ideal: 48GB+

If you have 16GB RAM:

Use 12B model (Mistral-Nemo-12B-ArliAI-RPMax)
Or use Q4 quantization of 22B (quality loss)
Or expect slower performance due to swap

Supported Operating Systems

Ollama runs on:

macOS:    10.15+ (Catalina or newer)
          Native Apple Silicon support (M1/M2/M3/M4)
          Intel Macs work but slower
          
Linux:    Most distributions
          NVIDIA GPU support via CUDA
          AMD GPU support via ROCm
          CPU-only works fine
          
Windows:  Windows 10/11
          NVIDIA GPU support
          WSL2 works great
          Native Windows binary available

Best experience: Apple Silicon Mac or Linux with NVIDIA GPU Still great: Windows with NVIDIA GPU Works fine: Any modern computer with enough RAM

Mac Unified Memory Advantage

Here's why Macs are secretly amazing for local AI:

Traditional computers split memory:

System RAM (for CPU) - maybe 16GB
VRAM (for GPU) - maybe 8GB on a decent card
Model has to fit in ONE of these

Apple Silicon is different:

"Unified Memory" shared between CPU and GPU
Your 64GB Mac = 64GB available for the model
No artificial split

What this means in practice:

Gaming PC with 32GB RAM + RTX 3080 (10GB VRAM):
  → Model limited to 10GB (VRAM) for fast inference
  → Or 32GB (RAM) but slower CPU inference
  
Mac Studio with 64GB unified memory:
  → Full 64GB available
  → GPU-accelerated inference on entire model
  → Runs 70B models that need expensive GPUs on PC

The M1/M2/M3/M4 chips also have excellent memory bandwidth. AI inference is memory-bound, so this matters a lot.

Practical Mac recommendations:

Mac Mini M4 (16GB):     7B-12B models
Mac Mini M4 (24GB):     12B-22B models (Q4)
Mac Mini M4 Pro (48GB): 22B models comfortably
Mac Studio M2 Max (64GB+): 22B-70B models
Mac Studio M2 Ultra (128GB+): Multiple large models

My setup: Mac Studio M2 Max with 96GB runs 22B Q6 with tons of headroom. Response times are fast, no swapping.

Windows Considerations

Windows works great, especially with NVIDIA GPUs:

NVIDIA GPU (recommended):
  - Install CUDA toolkit
  - Ollama uses GPU automatically
  - RTX 3090/4090 (24GB VRAM) = sweet spot
  - RTX 3080/4080 (10-16GB) = good for smaller models
  
AMD GPU:
  - Less mature support
  - ROCm works on Linux better than Windows
  - CPU fallback is fine
  
CPU only:
  - Works, just slower
  - More RAM = better
  - 32GB+ recommended for 22B models

For Windows users without big GPUs:

Stick to 7B-12B models
Q4 quantization stretches further
Still great for companions

Linux Considerations

Linux is the power user's choice:

NVIDIA GPU:
  - Best supported
  - Install NVIDIA drivers + CUDA
  - Ollama auto-detects
  
AMD GPU:
  - ROCm support
  - Some models work better than others
  - Check Ollama docs for compatibility
  
Server/headless:
  - Ollama runs great as a service
  - Access via API from other machines
  - Perfect for home lab setups

Linux tip: If running headless, start Ollama as a service:

sudo systemctl enable ollama
sudo systemctl start ollama

Checking Your Hardware

Mac:

Apple menu → About This Mac → Memory

Or in terminal:
  system_profiler SPHardwareDataType | grep Memory

Windows:

Settings → System → About → Installed RAM

Or in PowerShell:
  (Get-CimInstance Win32_ComputerSystem).TotalPhysicalMemory / 1GB

Linux:

free -h

Or for detailed info:
  cat /proc/meminfo | grep MemTotal

Context Window and Memory

The context window (conversation history) also uses memory.

16K context ≈ 500MB - 1GB additional
32K context ≈ 1GB - 2GB additional
128K context ≈ 4GB - 8GB additional

This is why we recommend 16K context for companions:

Enough for long conversations
Doesn't eat too much RAM
Keeps character card influential

If you're tight on RAM, reduce context window first. Better to have a working model with shorter memory than a model that swaps to disk.

Performance Expectations

What to expect at different RAM levels (22B Q6 model):

24GB RAM:
  - Works but tight
  - First response: 5-10 seconds
  - May swap occasionally
  - Close other apps
  
32GB RAM:
  - Comfortable
  - First response: 2-5 seconds
  - Stable performance
  
48GB+ RAM:
  - Fast
  - First response: 1-3 seconds
  - Room for multiple models
  
64GB+ RAM:
  - Excellent
  - Sub-second first token
  - Can run larger models

These are rough estimates. Actual performance depends on:

CPU/GPU speed
Memory bandwidth
Quantization level
Context window size
What else is running

Quick Recommendations

Budget build (under $1000):

Used Mac Mini M1 (16GB) - runs 12B great
Or Linux PC with 32GB RAM + used RTX 3080

Mid-range ($1000-2000):

Mac Mini M4 Pro (24-48GB)
Or Windows PC with 32GB RAM + RTX 4070

High-end ($2000+):

Mac Studio M2 Max (64GB+)
Or Windows/Linux with 64GB RAM + RTX 4090

Enthusiast ($5000+):

Mac Studio M2 Ultra (128GB+)
Or multi-GPU Linux workstation

For tonight's workshop:

If you have 16GB+, you're fine with 12B models
If you have 32GB+, you can run 22B
If you have less, follow along and try smaller models

Troubleshooting Memory Issues

If Ollama is slow or crashing:

1. Check what's using RAM:

Mac: Activity Monitor → Memory
Windows: Task Manager → Memory
Linux: htop or top

2. Close unnecessary apps

Browsers are memory hogs
Docker containers add up

3. Reduce context window:

PARAMETER num_ctx 8192 (instead of 16384)

4. Try smaller quantization:

Q4 instead of Q6
Loses some quality, gains headroom

5. Try smaller model:

12B instead of 22B
Still great for companions

6. Check for swap usage:

Mac: Activity Monitor → Memory → Swap Used
Heavy swap = need more RAM or smaller model

Next: Let's look at the model we're using.