QUICK REFERENCE

Everything in one place for when you forget.

CRITICAL: The Modelfile Step

DO NOT SKIP THIS. Ollama defaults to 4K context. You need 32K+.

Create ~/Modelfile-qwen-claude:

FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768

Create the configured model:

ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude

Use "qwen3-coder-32k" everywhere, NOT "qwen3-coder-next:latest".

Installation Summary

Ollama Server:

# Install/update Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull base model
ollama pull qwen3-coder-next:latest

# Create Modelfile
echo 'FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768' > ~/Modelfile-qwen-claude

# Create configured model
ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude

# Enable network access (if remote)
echo 'export OLLAMA_HOST="0.0.0.0:11434"' >> ~/.zshrc
source ~/.zshrc
pkill ollama
ollama serve

Claude Code Workstation:

# Install
curl -fsSL https://claude.ai/install.sh | bash

# Method 1: Add to ~/.zshrc
export PATH="$HOME/.local/bin:$PATH"
export COLORTERM=truecolor

# Claude Code + Local Ollama
export ANTHROPIC_BASE_URL="http://YOUR_IP:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_MODEL="qwen3-coder-32k"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3-coder-32k"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1"
export API_TIMEOUT_MS="600000"

# Method 2: Create ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "http://YOUR_IP:11434",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_MODEL": "qwen3-coder-32k",
    "ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "API_TIMEOUT_MS": "600000"
  }
}

Replace YOUR_IP with Ollama server IP or "localhost". Replace model name with YOUR configured model. Do BOTH methods for reliability.

Then: source ~/.zshrc

First Launch

cd ~/your-project
claude
# Select option 3 (3rd-party platform)
/init

Context Window Settings

Context is set in OLLAMA via Modelfile, NOT in Claude Code.

Create Modelfile:

FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768

Create model:

ollama create qwen3-coder-32k -f Modelfile

Verify:

ollama run qwen3-coder-32k
# In another terminal:
ollama ps
# Check CONTEXT column shows 32768

RAM Requirements by Context Size

Context     Extra RAM    Total w/Model    Recommended For
---------------------------------------------------------
4K (bad)    ~1GB         ~21GB            Don't use this
8K          ~2GB         ~22GB            16GB machines
16K         ~4GB         ~24GB            24GB machines
32K         ~8GB         ~28GB            32GB machines
64K         ~16GB        ~36GB            64GB machines

Why Modelfile Is Required

Claude Code uses Anthropic API format (/v1/messages). Anthropic API has no num_ctx parameter. Claude Code CANNOT request a specific context size. The only way to control context is via the model definition.

Base model: qwen3-coder-next:latest = 4K context (useless)
Your model: qwen3-coder-32k = 32K context (works)

Slash Commands

Built-in:

/help           All commands
/context        Check context usage
/compact        Compress context
/clear          Clear history
/init           Generate CLAUDE.md
/permissions    Manage permissions

Custom commands:

Global:   ~/.claude/commands/name.md
Project:  .claude/commands/name.md

Use $ARGUMENTS, $1, $2 for parameters

Session Management

claude              Start new session
claude --continue   Resume last session
claude --resume     Pick from history
Ctrl+D              Exit
Ctrl+C              Cancel current response

CLAUDE.md Locations

~/.claude/CLAUDE.md     Global (all projects)
./CLAUDE.md             Project root
./subdir/CLAUDE.md      Subdirectory

Quick add during session:

# your note here

Testing the Connection

# Ollama version
ollama --version

# Model with correct context?
ollama ps
# Look for CONTEXT: 32768 (not 4096)

# API test with YOUR model
curl http://YOUR_IP:11434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-coder-32k","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'

Common Settings

Timeout (10 min):

"API_TIMEOUT_MS": "600000"

Auto-compact at 75%:

"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"

Full permissions (yolo mode):

"permissions": {
  "allow": ["Bash", "Read", "Write", "Edit", "MultiEdit"],
  "deny": []
}

Speed Expectations

Local inference (Apple Silicon):

First query:          60-180 seconds (model loading)
Simple question:      20-40 seconds
Code generation:      60-120 seconds
Large refactor:       2-5 minutes

Troubleshooting Checklist

[ ] Ollama 0.14.0+ installed
[ ] Ollama running (ps aux | grep ollama)
[ ] Base model pulled (ollama list shows qwen3-coder-next)
[ ] Modelfile created with num_ctx 32768
[ ] Configured model created (ollama list shows qwen3-coder-32k)
[ ] Context verified (ollama ps shows 32768)
[ ] Network binding (lsof shows *:11434)
[ ] settings.json valid JSON
[ ] Model name in settings matches configured model
[ ] Selected option 3 on first launch
[ ] API_TIMEOUT_MS set high enough

File Locations

~/.claude/settings.json     Your config
~/.claude/CLAUDE.md         Global memory
~/.claude/commands/         Global slash commands
~/.local/bin/claude         The binary

.claude/settings.json       Project config
./CLAUDE.md                 Project memory
.claude/commands/           Project slash commands

Environment Variables

ANTHROPIC_BASE_URL          Ollama server URL
ANTHROPIC_AUTH_TOKEN        Any non-empty string
ANTHROPIC_MODEL             Your configured model name
API_TIMEOUT_MS              Timeout in milliseconds
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC   "1" to disable

Useful Aliases

Add to ~/.zshrc:

alias c='claude'
alias cc='claude --continue'
alias cr='claude --resume'

One-Liner Setup Reference

# Server (Ollama) - Run these in order:

curl -fsSL https://ollama.ai/install.sh | sh

ollama pull qwen3-coder-next:latest

echo 'FROM qwen3-coder-next:latest PARAMETER num_ctx 32768' > ~/Modelfile-qwen-claude

ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude

# Enable network (if remote) echo 'export OLLAMA_HOST="0.0.0.0:11434"' >> ~/.zshrc source ~/.zshrc pkill ollama ollama serve

# Workstation (Claude Code):

curl -fsSL https://claude.ai/install.sh | bash

# Method 1: Add to ~/.zshrc (edit YOUR_IP first) cat >> ~/.zshrc << 'EOF'

# Claude Code + Local Ollama export PATH="$HOME/.local/bin:$PATH" export COLORTERM=truecolor export ANTHROPIC_BASE_URL="http://YOUR_IP:11434" export ANTHROPIC_AUTH_TOKEN="ollama" export ANTHROPIC_MODEL="qwen3-coder-32k" export ANTHROPIC_SMALL_FAST_MODEL="qwen3-coder-32k" export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1" export API_TIMEOUT_MS="600000" EOF

# Method 2: Create settings.json (edit YOUR_IP first) mkdir -p ~/.claude cat > ~/.claude/settings.json << 'EOF' {

"env": {
  "ANTHROPIC_BASE_URL": "http://YOUR_IP:11434",
  "ANTHROPIC_AUTH_TOKEN": "ollama",
  "ANTHROPIC_MODEL": "qwen3-coder-32k",
  "ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
  "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
  "API_TIMEOUT_MS": "600000"
}

} EOF

# Edit both files to fix YOUR_IP, then: source ~/.zshrc cd ~/your-project claude # Select option 3

That's Everything

You now have:

Local AI-powered coding
Proper context window (not the broken 4K default)
No cloud dependency
Zero ongoing costs
Full privacy

Happy vibe coding.