THE SOFTWARE STACK

We need three things: something to run the model, something to chat with it, and the model itself. Let's set them up.

Ollama - The Engine

Ollama is a program that runs LLMs locally. Think of it as a server that speaks AI. You send it a prompt, it sends back a response.

Why Ollama:

Dead simple to install
Handles model downloading automatically
Runs on Mac, Linux, and Windows
Optimized for Apple Silicon (M1/M2/M3/M4 chips)
Free and open source

Installing Ollama (Mac/Linux):

curl -fsSL https://ollama.com/install.sh | sh

Installing Ollama (Windows):

Download from https://ollama.com/download

After installation, Ollama runs as a background service. You can interact with it via the command line or through other apps.

Verify it's working:

ollama --version

You should see a version number like "ollama version 0.5.x"

OpenWebUI - The Interface

OpenWebUI gives you a ChatGPT-like interface for local models. It's prettier and more feature-rich than the command line.

Why OpenWebUI:

Familiar chat interface
Conversation history
Multiple chat sessions
Character/persona management built in
Parameter controls
Free and open source

Installing OpenWebUI (requires Docker):

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in your browser.

If you don't have Docker, you can also install via pip:

pip install open-webui
open-webui serve

OpenWebUI automatically detects Ollama running on your machine.

Bolt AI - The Mac Alternative

If you're on Mac and want something native (no Docker), Bolt AI is a slick option. It's a paid app but worth it for the polish.

Why Bolt AI:

Native Mac app (fast, clean)
Works with Ollama out of the box
Nice conversation management
Custom personas/characters
Keyboard shortcuts
One-time purchase, no subscription

Get it from the Mac App Store or https://boltai.com

For tonight's tutorial, we'll show examples in OpenWebUI since it's free and cross-platform, but the concepts work anywhere.

Command Line - The Raw Way

You can also talk to Ollama directly from the terminal. This is useful for scripting and understanding what's happening under the hood.

Basic chat:

ollama run mistral-small

Interactive session starts. Type your message, press Enter, get response. Type /bye to exit.

One-shot query:

echo "Tell me a joke" | ollama run mistral-small

API request (for scripts):

curl http://localhost:11434/api/generate \
  -d '{
    "model": "mistral-small",
    "prompt": "Tell me a joke",
    "stream": false
  }'

Modelfiles - Custom Model Configurations

A Modelfile is like a Dockerfile but for LLMs. It lets you create a custom model with specific settings baked in.

This is powerful for companions. You can create a model that:

Has your character's system prompt built in
Uses your preferred parameters
Has a custom name you can call

Example Modelfile (save as "companion.modelfile"):

FROM hf.co/bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF:Q6_K_L

PARAMETER temperature 1.0
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER num_ctx 16384
PARAMETER repeat_penalty 1.0

SYSTEM """
[Luna - warm, witty companion who loves deep conversations,
asks thoughtful questions, remembers details, uses gentle humor,
speaks naturally without being overly formal]
"""

Create the model:

ollama create luna -f companion.modelfile

Now you can run:

ollama run luna

And it loads with all your settings pre-configured. No need to set parameters every time.

List your custom models:

ollama list

Delete a custom model:

ollama rm luna

Downloading Models

Ollama pulls models from its library automatically. Just run:

ollama pull <model-name>

For example:

ollama pull llama3.2
ollama pull mistral-small

For our RPMax model, it's on Hugging Face, so the syntax is:

ollama pull hf.co/bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF:Q6_K_L

This downloads the 22B parameter RPMax model at Q6 quantization. It's about 15GB. Be patient.

Checking Available Models

See what you have installed:

ollama list

See what's running:

ollama ps

Stop a running model:

ollama stop <model-name>

Hardware Requirements

The RPMax 22B model needs:

Minimum: 16GB RAM (will be slow, uses swap)
Good:    32GB RAM
Great:   64GB+ RAM or GPU with 24GB+ VRAM

On Apple Silicon Macs, the unified memory is shared between CPU and GPU, making them excellent for local AI. An M1 Mac with 32GB RAM runs this model comfortably.

If you have less RAM, consider smaller models:

Mistral-Nemo-12B-ArliAI-RPMax (needs ~10GB)
Smaller Llama variants

Quality drops, but they still work for companions.

Two Platforms, Two Formats

Here's something that trips people up: Ollama and OpenWebUI use different character card formats.

Ollama Modelfiles:   PList + Ali:Chat format
                     Supports {{char}} and {{user}} variables
                     Character baked into the model itself

OpenWebUI Models:    XML-structured prompts
                     Uses literal character names (no variables)
                     Character defined in the web interface

Both work great. The difference is syntax, not capability. Pick based on which interface you prefer:

Like the command line? Use Ollama Modelfiles.
Like a web interface? Use OpenWebUI Models.
Want both? Create for Ollama, then access through OpenWebUI.

We'll cover the concept of character cards first, then show you the specific syntax for each platform.

Next up: why we're using RPMax specifically.