QUICK REFERENCE
Everything in one place for when you forget.
CRITICAL: The Modelfile Step
DO NOT SKIP THIS. Ollama defaults to 4K context. You need 32K+.
Create ~/Modelfile-qwen-claude:
FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768
Create the configured model:
ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude
Use "qwen3-coder-32k" everywhere, NOT "qwen3-coder-next:latest".
Installation Summary
Ollama Server:
# Install/update Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull base model
ollama pull qwen3-coder-next:latest
# Create Modelfile
echo 'FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768' > ~/Modelfile-qwen-claude
# Create configured model
ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude
# Enable network access (if remote)
echo 'export OLLAMA_HOST="0.0.0.0:11434"' >> ~/.zshrc
source ~/.zshrc
pkill ollama
ollama serve
Claude Code Workstation:
# Install
curl -fsSL https://claude.ai/install.sh | bash
# Method 1: Add to ~/.zshrc
export PATH="$HOME/.local/bin:$PATH"
export COLORTERM=truecolor
# Claude Code + Local Ollama
export ANTHROPIC_BASE_URL="http://YOUR_IP:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_MODEL="qwen3-coder-32k"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3-coder-32k"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1"
export API_TIMEOUT_MS="600000"
# Method 2: Create ~/.claude/settings.json
{
"env": {
"ANTHROPIC_BASE_URL": "http://YOUR_IP:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"ANTHROPIC_MODEL": "qwen3-coder-32k",
"ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"API_TIMEOUT_MS": "600000"
}
}
Replace YOUR_IP with Ollama server IP or "localhost". Replace model name with YOUR configured model. Do BOTH methods for reliability.
Then: source ~/.zshrc
First Launch
cd ~/your-project
claude
# Select option 3 (3rd-party platform)
/init
Context Window Settings
Context is set in OLLAMA via Modelfile, NOT in Claude Code.
Create Modelfile:
FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768
Create model:
ollama create qwen3-coder-32k -f Modelfile
Verify:
ollama run qwen3-coder-32k
# In another terminal:
ollama ps
# Check CONTEXT column shows 32768
RAM Requirements by Context Size
Context Extra RAM Total w/Model Recommended For
---------------------------------------------------------
4K (bad) ~1GB ~21GB Don't use this
8K ~2GB ~22GB 16GB machines
16K ~4GB ~24GB 24GB machines
32K ~8GB ~28GB 32GB machines
64K ~16GB ~36GB 64GB machines
Why Modelfile Is Required
Claude Code uses Anthropic API format (/v1/messages). Anthropic API has no num_ctx parameter. Claude Code CANNOT request a specific context size. The only way to control context is via the model definition.
Base model: qwen3-coder-next:latest = 4K context (useless)
Your model: qwen3-coder-32k = 32K context (works)
Slash Commands
Built-in:
/help All commands
/context Check context usage
/compact Compress context
/clear Clear history
/init Generate CLAUDE.md
/permissions Manage permissions
Custom commands:
Global: ~/.claude/commands/name.md
Project: .claude/commands/name.md
Use $ARGUMENTS, $1, $2 for parameters
Session Management
claude Start new session
claude --continue Resume last session
claude --resume Pick from history
Ctrl+D Exit
Ctrl+C Cancel current response
CLAUDE.md Locations
~/.claude/CLAUDE.md Global (all projects)
./CLAUDE.md Project root
./subdir/CLAUDE.md Subdirectory
Quick add during session:
# your note here
Testing the Connection
# Ollama version
ollama --version
# Model with correct context?
ollama ps
# Look for CONTEXT: 32768 (not 4096)
# API test with YOUR model
curl http://YOUR_IP:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"qwen3-coder-32k","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'
Common Settings
Timeout (10 min):
"API_TIMEOUT_MS": "600000"
Auto-compact at 75%:
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "75"
Full permissions (yolo mode):
"permissions": {
"allow": ["Bash", "Read", "Write", "Edit", "MultiEdit"],
"deny": []
}
Speed Expectations
Local inference (Apple Silicon):
First query: 60-180 seconds (model loading)
Simple question: 20-40 seconds
Code generation: 60-120 seconds
Large refactor: 2-5 minutes
Troubleshooting Checklist
[ ] Ollama 0.14.0+ installed
[ ] Ollama running (ps aux | grep ollama)
[ ] Base model pulled (ollama list shows qwen3-coder-next)
[ ] Modelfile created with num_ctx 32768
[ ] Configured model created (ollama list shows qwen3-coder-32k)
[ ] Context verified (ollama ps shows 32768)
[ ] Network binding (lsof shows *:11434)
[ ] settings.json valid JSON
[ ] Model name in settings matches configured model
[ ] Selected option 3 on first launch
[ ] API_TIMEOUT_MS set high enough
File Locations
~/.claude/settings.json Your config
~/.claude/CLAUDE.md Global memory
~/.claude/commands/ Global slash commands
~/.local/bin/claude The binary
.claude/settings.json Project config
./CLAUDE.md Project memory
.claude/commands/ Project slash commands
Environment Variables
ANTHROPIC_BASE_URL Ollama server URL
ANTHROPIC_AUTH_TOKEN Any non-empty string
ANTHROPIC_MODEL Your configured model name
API_TIMEOUT_MS Timeout in milliseconds
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC "1" to disable
Useful Aliases
Add to ~/.zshrc:
alias c='claude'
alias cc='claude --continue'
alias cr='claude --resume'
One-Liner Setup Reference
# Server (Ollama) - Run these in order:
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen3-coder-next:latest
echo 'FROM qwen3-coder-next:latest PARAMETER num_ctx 32768' > ~/Modelfile-qwen-claude
ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude
# Enable network (if remote) echo 'export OLLAMA_HOST="0.0.0.0:11434"' >> ~/.zshrc source ~/.zshrc pkill ollama ollama serve
# Workstation (Claude Code):
curl -fsSL https://claude.ai/install.sh | bash
# Method 1: Add to ~/.zshrc (edit YOUR_IP first) cat >> ~/.zshrc << 'EOF'
# Claude Code + Local Ollama export PATH="$HOME/.local/bin:$PATH" export COLORTERM=truecolor export ANTHROPIC_BASE_URL="http://YOUR_IP:11434" export ANTHROPIC_AUTH_TOKEN="ollama" export ANTHROPIC_MODEL="qwen3-coder-32k" export ANTHROPIC_SMALL_FAST_MODEL="qwen3-coder-32k" export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1" export API_TIMEOUT_MS="600000" EOF
# Method 2: Create settings.json (edit YOUR_IP first) mkdir -p ~/.claude cat > ~/.claude/settings.json << 'EOF' {
"env": {
"ANTHROPIC_BASE_URL": "http://YOUR_IP:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"ANTHROPIC_MODEL": "qwen3-coder-32k",
"ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"API_TIMEOUT_MS": "600000"
}
} EOF
# Edit both files to fix YOUR_IP, then: source ~/.zshrc cd ~/your-project claude # Select option 3
That's Everything
You now have:
- Local AI-powered coding
- Proper context window (not the broken 4K default)
- No cloud dependency
- Zero ongoing costs
- Full privacy
Happy vibe coding.