QUICK REFERENCE

Everything from tonight on one page. Print this. Bookmark it. Tape it to your monitor.

Installation Commands

Install Ollama:

brew install --cask ollama

Install OpenWebUI (Docker Compose):

Create docker-compose.yml:

  services:
    open-webui:
      image: ghcr.io/open-webui/open-webui:main
      ports:
        - "3000:8080"
      volumes:
        - open-webui:/app/backend/data
      restart: always

  volumes:
    open-webui:

Then run:
  docker compose up -d

Access: http://localhost:3000 First account becomes admin.

Model Commands

Pull models (run in Terminal):

ollama pull qwen3:30b-a3b          19GB   General, code, reasoning
ollama pull qwen3-next             50GB   Heavyweight, long context
ollama pull mistral-small3.2       15GB   Text + vision, multilingual
ollama pull qwen3-vl               6GB    Vision specialist
ollama pull qwen3-vl:4b            3.3GB  Light vision

Check installed models:

ollama list

Check running models:

ollama ps

RAM Guide

8GB Mac:    qwen3-vl:4b only
16GB Mac:   qwen3-vl (8B), one small model at a time
32GB Mac:   qwen3:30b-a3b, mistral-small3.2, qwen3-vl
64GB+ Mac:  All models including qwen3-next

Creating Custom Models

Workspace > Models > Create a Model

Name it
Pick a base model
Write a system prompt
Add starter questions (optional)
Set parameters (optional)
Toggle capabilities (vision, web search, etc.)
Save

Parameter Cheat Sheet

Temperature:      Creativity dial
                  Low (0.2) = precise, High (1.0) = creative
                  Default: 0.7

Top P:            Probability filter
                  Low (0.3) = focused, High (0.95) = varied
                  Default: 0.9

Top K:            Fixed candidate count
                  Lower = more focused, Higher = more varied
                  Default: 40

Repeat Penalty:   Prevents loops
                  1.0 = off, 1.1 = mild, 1.3+ = aggressive
                  Default: 1.1

Context Length:    How much conversation the model remembers
                  Higher = more memory, more RAM used
                  Default: varies by model

Max Tokens:       Response length limit
                  -1 = unlimited
                  Default: -1

Seed:             Fixed output for testing
                  0 = random each time
                  Default: 0

Suggested Settings by Task

Coding:           Temperature 0.2, Top P 0.9
Conversation:     Temperature 0.7, Top P 0.9
Creative Writing: Temperature 0.9, Top P 0.95
Factual Q&A:      Temperature 0.3, Top P 0.85

Dynamic Variables for System Prompts

{{CURRENT_DATE}}     Today's date
{{CURRENT_TIME}}     Current time
{{USER_NAME}}        Logged-in user's name

Useful Keyboard Shortcuts

Enter:              Send message
Shift+Enter:        New line without sending
Ctrl+Shift+O:       New chat
Ctrl+Shift+S:       Toggle sidebar

Adding Tools

Workspace > Tools > Create a Tool

Paste Python code for the tool
Configure settings (API keys, etc.)
Assign to models via Tool Bindings or enable globally

Perplexity Web Search: Gives models live web search

Needs API key from perplexity.ai
Uses sonar-pro model by default

Docker Compose Management

Run these from the directory with your docker-compose.yml:

Start:    docker compose up -d
Stop:     docker compose down
Logs:     docker compose logs -f
Status:   docker compose ps
Update:   docker compose pull && docker compose up -d
Reset:    docker compose down -v (wipes everything)

Troubleshooting Checklist

Nothing loads?        docker compose ps (is container running?)
No models?            ollama list (did you pull any?)
Can't connect?        Check Ollama URL in Admin > Connections
Slow responses?       Activity Monitor > check memory pressure
Gibberish output?     Lower temperature, check parameters
Forgetting context?   Start fresh chat or increase num_ctx

Links

OpenWebUI:      docs.openwebui.com
Ollama:         ollama.com
Ollama Models:  ollama.com/library
Tutorial:       techalicious.academy