Techalicious Academy / 2026-01-22-ai-companion

(Visit our meetup for more great tutorials)

SYSTEM REQUIREMENTS

Before we dive in, let's make sure your hardware can handle local AI. The good news: it's more accessible than you might think.

The Golden Rule: RAM is Everything

Running AI locally is about one thing: memory. The model has to fit in RAM (or VRAM) to run. No exceptions.

Here's the simple math:

Model parameters × Quantization = RAM needed

22B parameters × Q6 (~0.75 bytes/param) = ~16.5GB
22B parameters × Q4 (~0.5 bytes/param)  = ~11GB
12B parameters × Q6 (~0.75 bytes/param) = ~9GB

But you also need headroom for:

So the PRACTICAL requirements are:

RAM Requirements by Model Size

Model Size    Min RAM    Comfortable    Ideal
-------------------------------------------------
7B  (Q4)      8GB        16GB           16GB+
12B (Q6)      16GB       24GB           32GB
22B (Q4)      16GB       32GB           48GB+
22B (Q6)      24GB       48GB           64GB+
70B (Q4)      48GB       64GB           96GB+

For tonight's RPMax 22B at Q6:

If you have 16GB RAM:

Supported Operating Systems

Ollama runs on:

macOS:    10.15+ (Catalina or newer)
          Native Apple Silicon support (M1/M2/M3/M4)
          Intel Macs work but slower
          
Linux:    Most distributions
          NVIDIA GPU support via CUDA
          AMD GPU support via ROCm
          CPU-only works fine
          
Windows:  Windows 10/11
          NVIDIA GPU support
          WSL2 works great
          Native Windows binary available

Best experience: Apple Silicon Mac or Linux with NVIDIA GPU Still great: Windows with NVIDIA GPU Works fine: Any modern computer with enough RAM

Mac Unified Memory Advantage

Here's why Macs are secretly amazing for local AI:

Traditional computers split memory:

Apple Silicon is different:

What this means in practice:

Gaming PC with 32GB RAM + RTX 3080 (10GB VRAM):
  → Model limited to 10GB (VRAM) for fast inference
  → Or 32GB (RAM) but slower CPU inference
  
Mac Studio with 64GB unified memory:
  → Full 64GB available
  → GPU-accelerated inference on entire model
  → Runs 70B models that need expensive GPUs on PC

The M1/M2/M3/M4 chips also have excellent memory bandwidth. AI inference is memory-bound, so this matters a lot.

Practical Mac recommendations:

Mac Mini M4 (16GB):     7B-12B models
Mac Mini M4 (24GB):     12B-22B models (Q4)
Mac Mini M4 Pro (48GB): 22B models comfortably
Mac Studio M2 Max (64GB+): 22B-70B models
Mac Studio M2 Ultra (128GB+): Multiple large models

My setup: Mac Studio M2 Max with 96GB runs 22B Q6 with tons of headroom. Response times are fast, no swapping.

Windows Considerations

Windows works great, especially with NVIDIA GPUs:

NVIDIA GPU (recommended):
  - Install CUDA toolkit
  - Ollama uses GPU automatically
  - RTX 3090/4090 (24GB VRAM) = sweet spot
  - RTX 3080/4080 (10-16GB) = good for smaller models
  
AMD GPU:
  - Less mature support
  - ROCm works on Linux better than Windows
  - CPU fallback is fine
  
CPU only:
  - Works, just slower
  - More RAM = better
  - 32GB+ recommended for 22B models

For Windows users without big GPUs:

Linux Considerations

Linux is the power user's choice:

NVIDIA GPU:
  - Best supported
  - Install NVIDIA drivers + CUDA
  - Ollama auto-detects
  
AMD GPU:
  - ROCm support
  - Some models work better than others
  - Check Ollama docs for compatibility
  
Server/headless:
  - Ollama runs great as a service
  - Access via API from other machines
  - Perfect for home lab setups

Linux tip: If running headless, start Ollama as a service:

sudo systemctl enable ollama
sudo systemctl start ollama

Checking Your Hardware

Mac:

Apple menu → About This Mac → Memory

Or in terminal:
  system_profiler SPHardwareDataType | grep Memory

Windows:

Settings → System → About → Installed RAM

Or in PowerShell:
  (Get-CimInstance Win32_ComputerSystem).TotalPhysicalMemory / 1GB

Linux:

free -h

Or for detailed info:
  cat /proc/meminfo | grep MemTotal

Context Window and Memory

The context window (conversation history) also uses memory.

16K context ≈ 500MB - 1GB additional
32K context ≈ 1GB - 2GB additional
128K context ≈ 4GB - 8GB additional

This is why we recommend 16K context for companions:

If you're tight on RAM, reduce context window first. Better to have a working model with shorter memory than a model that swaps to disk.

Performance Expectations

What to expect at different RAM levels (22B Q6 model):

24GB RAM:
  - Works but tight
  - First response: 5-10 seconds
  - May swap occasionally
  - Close other apps
  
32GB RAM:
  - Comfortable
  - First response: 2-5 seconds
  - Stable performance
  
48GB+ RAM:
  - Fast
  - First response: 1-3 seconds
  - Room for multiple models
  
64GB+ RAM:
  - Excellent
  - Sub-second first token
  - Can run larger models

These are rough estimates. Actual performance depends on:

Quick Recommendations

Budget build (under $1000):

Mid-range ($1000-2000):

High-end ($2000+):

Enthusiast ($5000+):

For tonight's workshop:

Troubleshooting Memory Issues

If Ollama is slow or crashing:

1. Check what's using RAM:

2. Close unnecessary apps

3. Reduce context window:

4. Try smaller quantization:

5. Try smaller model:

6. Check for swap usage:

Next: Let's look at the model we're using.