THE SOFTWARE STACK
We need three things: something to run the model, something to chat with it, and the model itself. Let's set them up.
Ollama - The Engine
Ollama is a program that runs LLMs locally. Think of it as a server that speaks AI. You send it a prompt, it sends back a response.
Why Ollama:
- Dead simple to install
- Handles model downloading automatically
- Runs on Mac, Linux, and Windows
- Optimized for Apple Silicon (M1/M2/M3/M4 chips)
- Free and open source
Installing Ollama (Mac/Linux):
curl -fsSL https://ollama.com/install.sh | sh
Installing Ollama (Windows):
Download from https://ollama.com/download
After installation, Ollama runs as a background service. You can interact with it via the command line or through other apps.
Verify it's working:
ollama --version
You should see a version number like "ollama version 0.5.x"
OpenWebUI - The Interface
OpenWebUI gives you a ChatGPT-like interface for local models. It's prettier and more feature-rich than the command line.
Why OpenWebUI:
- Familiar chat interface
- Conversation history
- Multiple chat sessions
- Character/persona management built in
- Parameter controls
- Free and open source
Installing OpenWebUI (requires Docker):
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000 in your browser.
If you don't have Docker, you can also install via pip:
pip install open-webui
open-webui serve
OpenWebUI automatically detects Ollama running on your machine.
Bolt AI - The Mac Alternative
If you're on Mac and want something native (no Docker), Bolt AI is a slick option. It's a paid app but worth it for the polish.
Why Bolt AI:
- Native Mac app (fast, clean)
- Works with Ollama out of the box
- Nice conversation management
- Custom personas/characters
- Keyboard shortcuts
- One-time purchase, no subscription
Get it from the Mac App Store or https://boltai.com
For tonight's tutorial, we'll show examples in OpenWebUI since it's free and cross-platform, but the concepts work anywhere.
Command Line - The Raw Way
You can also talk to Ollama directly from the terminal. This is useful for scripting and understanding what's happening under the hood.
Basic chat:
ollama run mistral-small
Interactive session starts. Type your message, press Enter, get response. Type /bye to exit.
One-shot query:
echo "Tell me a joke" | ollama run mistral-small
API request (for scripts):
curl http://localhost:11434/api/generate \
-d '{
"model": "mistral-small",
"prompt": "Tell me a joke",
"stream": false
}'
Modelfiles - Custom Model Configurations
A Modelfile is like a Dockerfile but for LLMs. It lets you create a custom model with specific settings baked in.
This is powerful for companions. You can create a model that:
- Has your character's system prompt built in
- Uses your preferred parameters
- Has a custom name you can call
Example Modelfile (save as "companion.modelfile"):
FROM hf.co/bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF:Q6_K_L
PARAMETER temperature 1.0
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER num_ctx 16384
PARAMETER repeat_penalty 1.0
SYSTEM """
[Luna - warm, witty companion who loves deep conversations,
asks thoughtful questions, remembers details, uses gentle humor,
speaks naturally without being overly formal]
"""
Create the model:
ollama create luna -f companion.modelfile
Now you can run:
ollama run luna
And it loads with all your settings pre-configured. No need to set parameters every time.
List your custom models:
ollama list
Delete a custom model:
ollama rm luna
Downloading Models
Ollama pulls models from its library automatically. Just run:
ollama pull <model-name>
For example:
ollama pull llama3.2
ollama pull mistral-small
For our RPMax model, it's on Hugging Face, so the syntax is:
ollama pull hf.co/bartowski/Mistral-Small-22B-ArliAI-RPMax-v1.1-GGUF:Q6_K_L
This downloads the 22B parameter RPMax model at Q6 quantization. It's about 15GB. Be patient.
Checking Available Models
See what you have installed:
ollama list
See what's running:
ollama ps
Stop a running model:
ollama stop <model-name>
Hardware Requirements
The RPMax 22B model needs:
Minimum: 16GB RAM (will be slow, uses swap)
Good: 32GB RAM
Great: 64GB+ RAM or GPU with 24GB+ VRAM
On Apple Silicon Macs, the unified memory is shared between CPU and GPU, making them excellent for local AI. An M1 Mac with 32GB RAM runs this model comfortably.
If you have less RAM, consider smaller models:
- Mistral-Nemo-12B-ArliAI-RPMax (needs ~10GB)
- Smaller Llama variants
Quality drops, but they still work for companions.
Two Platforms, Two Formats
Here's something that trips people up: Ollama and OpenWebUI use different character card formats.
Ollama Modelfiles: PList + Ali:Chat format
Supports {{char}} and {{user}} variables
Character baked into the model itself
OpenWebUI Models: XML-structured prompts
Uses literal character names (no variables)
Character defined in the web interface
Both work great. The difference is syntax, not capability. Pick based on which interface you prefer:
- Like the command line? Use Ollama Modelfiles.
- Like a web interface? Use OpenWebUI Models.
- Want both? Create for Ollama, then access through OpenWebUI.
We'll cover the concept of character cards first, then show you the specific syntax for each platform.
Next up: why we're using RPMax specifically.