INSTALLING AND CONFIGURING OLLAMA
Ollama is the engine that runs your local models. We need to install it, pull a model, and create a properly configured version of that model. That last step is critical and often missed.
Version Requirement
+-------------------------------------------------------+
| IMPORTANT: Ollama 0.14.0 or higher required |
| |
| The Anthropic API compatibility was added January |
| 2026. Older versions return 404 errors. |
+-------------------------------------------------------+
Check your version:
ollama --version
If below 0.14.0, you need to update.
Installing Ollama (Fresh Install)
macOS:
Download from ollama.com, or:
brew install ollama
Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Updating Ollama
If you already have Ollama but it's too old:
macOS (Homebrew):
brew upgrade ollama
macOS (direct install):
Download the latest from ollama.com and install over the old version.
Linux:
curl -fsSL https://ollama.ai/install.sh | sh
(The script handles upgrades)
Verify Installation
ollama --version
Should show 0.14.0 or higher.
Start Ollama
macOS:
If you installed the app, just open Ollama from Applications.
It runs in the menu bar.
Or start manually:
ollama serve
Linux:
The install script sets up a systemd service:
sudo systemctl start ollama
sudo systemctl enable ollama # Start on boot
Pull the Base Model
We recommend qwen3-coder-next for this setup:
ollama pull qwen3-coder-next:latest
This downloads about 18GB. Go get coffee.
THE CRITICAL STEP: Create a Configured Model
+-------------------------------------------------------+
| DO NOT SKIP THIS SECTION |
| |
| The base model defaults to 4K context. |
| Claude Code will barely function with 4K. |
| You MUST create a model with proper context size. |
+-------------------------------------------------------+
Why is this necessary?
qwen3-coder-next supports: 256,000 tokens of context
Ollama default: 4,096 tokens of context
Ollama uses a conservative default regardless of model capability. With only 4K tokens, Claude Code can't even fit your files plus conversation history. It will seem broken.
Step 1: Create a Modelfile
nano ~/Modelfile-qwen-claude
Enter this content:
FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768
What this does:
- FROM: Uses qwen3-coder-next as the base
- num_ctx: Sets context window to 32K tokens
Save and exit (Ctrl+O, Enter, Ctrl+X).
Step 2: Create Your Configured Model
ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude
This creates a new model called "qwen3-coder-32k" that's identical to the base model but with 32K context instead of 4K.
It doesn't re-download anything. It just creates a configuration layer.
Step 3: Verify the Model Exists
ollama list
Should show both:
NAME SIZE
qwen3-coder-next:latest 18GB
qwen3-coder-32k:latest 18GB
The "32k" version is what you'll use with Claude Code.
Step 4: Test the Context Setting
Start the model:
ollama run qwen3-coder-32k
In another terminal:
ollama ps
Should show:
NAME SIZE PROCESSOR CONTEXT
qwen3-coder-32k 20 GB 100% GPU 32768
That CONTEXT column confirms it's using 32K, not 4K.
Exit the test: type /bye
Choosing Your Context Size
32K is a good default, but adjust based on your RAM:
16GB unified memory:
PARAMETER num_ctx 8192
Model name: qwen3-coder-8k
32GB unified memory:
PARAMETER num_ctx 32768
Model name: qwen3-coder-32k
64GB unified memory:
PARAMETER num_ctx 65536
Model name: qwen3-coder-64k
Larger context = more RAM used. Don't exceed your available memory or performance will tank as it swaps to disk.
Enable Network Access
By default, Ollama only listens on localhost. If Claude Code is on a different machine, we need Ollama to accept network connections.
+-------------------------------------------------------+
| Skip this section if running everything on one |
| machine. Localhost works without network config. |
+-------------------------------------------------------+
macOS:
Add to your ~/.zshrc:
echo 'export OLLAMA_HOST="0.0.0.0:11434"' >> ~/.zshrc
Then reload and restart Ollama:
source ~/.zshrc
pkill ollama
ollama serve
This runs Ollama in the foreground. Keep the terminal open.
Linux:
Edit the systemd service:
sudo systemctl edit ollama.service
Add these lines:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Then:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Verify Network Binding
Check what Ollama is listening on:
lsof -i :11434
For network access, you should see:
ollama 12345 user tcp *:11434 (LISTEN)
That asterisk (*) means all interfaces. If it says localhost:11434, the network setting didn't take effect.
Find Your Server IP
If this is your AI server, note its IP address:
macOS:
ipconfig getifaddr en0
Linux:
hostname -I | awk '{print $1}'
Write this down. You'll need it for Claude Code configuration.
Test the Anthropic Endpoint
This is the endpoint Claude Code will use. Test it:
curl http://YOUR_IP:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-32k",
"max_tokens": 50,
"messages": [{"role": "user", "content": "Say hello"}]
}'
Note: Use your configured model name (qwen3-coder-32k), not the base.
Should return a JSON response with the model's greeting.
Troubleshooting Ollama
"Connection refused":
- Is Ollama running? Check: ps aux | grep ollama
- Is firewall blocking? Allow port 11434
- Did OLLAMA_HOST setting take effect? Check lsof -i :11434
"404 page not found" on /v1/messages:
- Ollama version is too old. Update to 0.14.0+
"Model not found":
- Did you create the configured model? ollama list
- Typo in model name?
Model loads but Claude Code fails:
- Are you using the configured model (32k) or base model?
- Check with ollama ps - is CONTEXT showing 32768?
Summary Checklist
At this point you should have:
[ ] Ollama 0.14.0+ installed and running
[ ] qwen3-coder-next base model pulled
[ ] Custom Modelfile created with num_ctx set
[ ] Configured model created (e.g., qwen3-coder-32k)
[ ] Verified context with ollama ps
[ ] Network access enabled (if remote setup)
[ ] Server IP noted for later
The model name you'll use everywhere is your CONFIGURED model (qwen3-coder-32k), not the base model (qwen3-coder-next:latest).
Next chapter: Installing Claude Code on your workstation.