INSTALLING AND CONFIGURING OLLAMA

Ollama is the engine that runs your local models. We need to install it, pull a model, and create a properly configured version of that model. That last step is critical and often missed.

Version Requirement

+-------------------------------------------------------+
|  IMPORTANT: Ollama 0.14.0 or higher required          |
|                                                       |
|  The Anthropic API compatibility was added January    |
|  2026. Older versions return 404 errors.              |
+-------------------------------------------------------+

Check your version:

ollama --version

If below 0.14.0, you need to update.

Installing Ollama (Fresh Install)

macOS:

Download from ollama.com, or:

brew install ollama

Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Updating Ollama

If you already have Ollama but it's too old:

macOS (Homebrew):

brew upgrade ollama

macOS (direct install):

Download the latest from ollama.com and install over the old version.

Linux:

curl -fsSL https://ollama.ai/install.sh | sh
(The script handles upgrades)

Verify Installation

ollama --version

Should show 0.14.0 or higher.

Start Ollama

macOS:

If you installed the app, just open Ollama from Applications.
It runs in the menu bar.

Or start manually:
ollama serve

Linux:

The install script sets up a systemd service:
sudo systemctl start ollama
sudo systemctl enable ollama  # Start on boot

Pull the Base Model

We recommend qwen3-coder-next for this setup:

ollama pull qwen3-coder-next:latest

This downloads about 18GB. Go get coffee.

THE CRITICAL STEP: Create a Configured Model

+-------------------------------------------------------+
|  DO NOT SKIP THIS SECTION                             |
|                                                       |
|  The base model defaults to 4K context.               |
|  Claude Code will barely function with 4K.            |
|  You MUST create a model with proper context size.    |
+-------------------------------------------------------+

Why is this necessary?

qwen3-coder-next supports:    256,000 tokens of context
Ollama default:               4,096 tokens of context

Ollama uses a conservative default regardless of model capability. With only 4K tokens, Claude Code can't even fit your files plus conversation history. It will seem broken.

Step 1: Create a Modelfile

nano ~/Modelfile-qwen-claude

Enter this content:

FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768

What this does:

FROM: Uses qwen3-coder-next as the base
num_ctx: Sets context window to 32K tokens

Save and exit (Ctrl+O, Enter, Ctrl+X).

Step 2: Create Your Configured Model

ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude

This creates a new model called "qwen3-coder-32k" that's identical to the base model but with 32K context instead of 4K.

It doesn't re-download anything. It just creates a configuration layer.

Step 3: Verify the Model Exists

ollama list

Should show both:

NAME                            SIZE
qwen3-coder-next:latest         18GB
qwen3-coder-32k:latest          18GB

The "32k" version is what you'll use with Claude Code.

Step 4: Test the Context Setting

Start the model:

ollama run qwen3-coder-32k

In another terminal:

ollama ps

Should show:

NAME               SIZE     PROCESSOR  CONTEXT
qwen3-coder-32k    20 GB    100% GPU   32768

That CONTEXT column confirms it's using 32K, not 4K.

Exit the test: type /bye

Choosing Your Context Size

32K is a good default, but adjust based on your RAM:

16GB unified memory:
  PARAMETER num_ctx 8192
  Model name: qwen3-coder-8k

32GB unified memory:
  PARAMETER num_ctx 32768
  Model name: qwen3-coder-32k

64GB unified memory:
  PARAMETER num_ctx 65536
  Model name: qwen3-coder-64k

Larger context = more RAM used. Don't exceed your available memory or performance will tank as it swaps to disk.

Enable Network Access

By default, Ollama only listens on localhost. If Claude Code is on a different machine, we need Ollama to accept network connections.

+-------------------------------------------------------+
|  Skip this section if running everything on one       |
|  machine. Localhost works without network config.     |
+-------------------------------------------------------+

macOS:

Add to your ~/.zshrc:

echo 'export OLLAMA_HOST="0.0.0.0:11434"' >> ~/.zshrc

Then reload and restart Ollama:

source ~/.zshrc
pkill ollama
ollama serve

This runs Ollama in the foreground. Keep the terminal open.

Linux:

Edit the systemd service:
sudo systemctl edit ollama.service

Add these lines:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Then:
sudo systemctl daemon-reload
sudo systemctl restart ollama

Verify Network Binding

Check what Ollama is listening on:

lsof -i :11434

For network access, you should see:

ollama    12345 user   tcp   *:11434 (LISTEN)

That asterisk (*) means all interfaces. If it says localhost:11434, the network setting didn't take effect.

Find Your Server IP

If this is your AI server, note its IP address:

macOS:

ipconfig getifaddr en0

Linux:

hostname -I | awk '{print $1}'

Write this down. You'll need it for Claude Code configuration.

Test the Anthropic Endpoint

This is the endpoint Claude Code will use. Test it:

curl http://YOUR_IP:11434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-coder-32k",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

Note: Use your configured model name (qwen3-coder-32k), not the base.

Should return a JSON response with the model's greeting.

Troubleshooting Ollama

"Connection refused":

Is Ollama running? Check: ps aux | grep ollama
Is firewall blocking? Allow port 11434
Did OLLAMA_HOST setting take effect? Check lsof -i :11434

"404 page not found" on /v1/messages:

Ollama version is too old. Update to 0.14.0+

"Model not found":

Did you create the configured model? ollama list
Typo in model name?

Model loads but Claude Code fails:

Are you using the configured model (32k) or base model?
Check with ollama ps - is CONTEXT showing 32768?

Summary Checklist

At this point you should have:

[ ] Ollama 0.14.0+ installed and running
[ ] qwen3-coder-next base model pulled
[ ] Custom Modelfile created with num_ctx set
[ ] Configured model created (e.g., qwen3-coder-32k)
[ ] Verified context with ollama ps
[ ] Network access enabled (if remote setup)
[ ] Server IP noted for later

The model name you'll use everywhere is your CONFIGURED model (qwen3-coder-32k), not the base model (qwen3-coder-next:latest).

Next chapter: Installing Claude Code on your workstation.