CONFIGURATION

This is where we tell Claude Code to talk to Ollama instead of Anthropic's cloud. We're connecting two things: Claude Code on your workstation, and your configured model on the Ollama server.

What Gets Configured Where

Before we start, understand what's controlled where:

OLLAMA CONTROLS (via Modelfile):
  - Context window size (num_ctx)
  - Model parameters (temperature, etc.)
  - Which base model to use

CLAUDE CODE CONTROLS (env vars or settings.json):
  - Where to send requests (Ollama server URL)
  - Which model to use (your configured model name)
  - Timeouts and permissions
  - Claude Code behavior

You CANNOT set context window in Claude Code config. That's baked into your Ollama model. If you skipped creating a Modelfile in chapter 3, go back and do it now.

Two Configuration Methods

We recommend setting up BOTH for reliability:

Environment variables in ~/.zshrc (takes priority)
Settings file at ~/.claude/settings.json (backup)

If both exist, environment variables win.

Method 1: Environment Variables

Add these to your ~/.zshrc:

# Claude Code + Local Ollama
export ANTHROPIC_BASE_URL="http://10.0.0.79:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_MODEL="qwen3-coder-32k"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3-coder-32k"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1"
export API_TIMEOUT_MS="600000"

Then reload:

source ~/.zshrc

Method 2: Settings File

Create ~/.claude/settings.json:

mkdir -p ~/.claude
nano ~/.claude/settings.json

Paste this configuration:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://10.0.0.79:11434",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_MODEL": "qwen3-coder-32k",
    "ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "API_TIMEOUT_MS": "600000"
  }
}

Save and exit (Ctrl+O, Enter, Ctrl+X in nano).

Customize For Your Setup

1. Replace the IP address:

Change 10.0.0.79 to YOUR Ollama server's IP.

If running on the same machine, use localhost:

export ANTHROPIC_BASE_URL="http://localhost:11434"

2. Replace the model name:

Change qwen3-coder-32k to whatever you named your configured model.

This must EXACTLY match what ollama list shows.

Common mistake: using the base model name (qwen3-coder-next:latest)
instead of your configured model (qwen3-coder-32k). If you do this,
you'll get 4K context and wonder why everything is broken.

What Each Setting Does

ANTHROPIC_BASE_URL

Where to send API requests. Normally this is Anthropic's cloud.
We're pointing it at Ollama's Anthropic-compatible endpoint.

Format: http://HOST:PORT
Default Ollama port: 11434

Examples:
  "http://localhost:11434"        Same machine
  "http://10.0.0.79:11434"        Remote server
  "http://ai-server.local:11434"  Hostname (if DNS works)

ANTHROPIC_AUTH_TOKEN

Normally your Anthropic API key. Ollama doesn't need authentication,
but Claude Code won't start without SOMETHING here.

"ollama" is a dummy value. Could be any non-empty string.

This does NOT affect security. Ollama ignores it.

ANTHROPIC_MODEL

Which model to use. This must EXACTLY match a model in Ollama.

Run "ollama list" on your server. Use the name from that output.

WRONG: "qwen3-coder-next:latest"    (base model, 4K context)
RIGHT: "qwen3-coder-32k"            (your configured model)

If you created models with different context sizes, you can switch
between them by changing this value:

  "qwen3-coder-8k"    for light/fast work
  "qwen3-coder-32k"   for normal coding
  "qwen3-coder-64k"   for large codebases

ANTHROPIC_SMALL_FAST_MODEL

Claude Code uses a "small" model for quick tasks like summarization.
With local models, just use the same model as ANTHROPIC_MODEL.

If you have a lighter model for quick tasks, you could use it here.

CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC

Stops Claude Code from trying to contact Anthropic's servers for
telemetry, updates, and other non-essential stuff.

Set to "1" (as a string) to enable.

Without this, Claude Code may hang trying to reach Anthropic.

API_TIMEOUT_MS

How long to wait for a response before timing out. In milliseconds.

Default is way too short for local models. Local inference takes
30-120 seconds per response.

Recommended values:
  600000 = 10 minutes
  900000 = 15 minutes

If you get timeout errors, increase this.

Full Autonomous Mode (Optional)

By default, Claude Code asks permission before editing files or running commands. You can pre-approve these actions:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://10.0.0.79:11434",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_MODEL": "qwen3-coder-32k",
    "ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "API_TIMEOUT_MS": "600000"
  },
  "permissions": {
    "allow": [
      "Bash",
      "Read",
      "Write",
      "Edit",
      "MultiEdit",
      "NotebookEdit"
    ],
    "deny": []
  }
}

What these permissions mean:

Bash         Execute any shell command
Read         Read any file
Write        Create or overwrite files
Edit         Modify existing files
MultiEdit    Edit multiple files at once
NotebookEdit Modify Jupyter notebooks

This is "yolo mode." Claude does whatever it wants without asking. Great for productivity. Risky if you're not paying attention.

Selective Permissions

You can allow specific commands instead of everything:

"permissions": {
  "allow": [
    "Bash(npm run *)",
    "Bash(git *)",
    "Bash(prove *)",
    "Read",
    "Write",
    "Edit"
  ],
  "deny": [
    "Bash(rm -rf *)",
    "Bash(sudo *)"
  ]
}

This allows npm, git, and prove commands, plus file operations. It blocks dangerous commands like rm -rf and sudo.

First Launch

Now we can launch:

cd ~/some-project-directory
claude

You'll see a login prompt:

Select login method:
  1. Claude account with subscription
  2. Anthropic Console account
> 3. 3rd-party platform

SELECT OPTION 3 (3rd-party platform).

This tells Claude Code to use your configured base URL instead of trying to authenticate with Anthropic.

If It Works

You'll see the Claude Code interface. Type something:

Hello, can you see me?

Wait 30-60 seconds. Local inference is slow, especially first query.

If you get a response, congratulations. It's working.

If It Doesn't Work

Common errors:

"Cannot connect to server"

Check ANTHROPIC_BASE_URL is correct
Can you reach the Ollama server? curl http://YOUR_IP:11434/api/version
Is Ollama running?

"Model not found"

Does the model name match EXACTLY?
Run "ollama list" on the server to check
Are you using your configured model (32k) not the base model?

"Timeout exceeded"

Increase API_TIMEOUT_MS
First request is always slowest (model loading)

Stuck on login screen:

Make sure you selected option 3 (3rd-party platform)
Check settings.json syntax (valid JSON?)

Works but seems broken/forgetful:

You're probably using the base model with 4K context
Check: ollama ps (look at CONTEXT column)
Fix: Use your configured model name in settings.json

Testing the Full Pipeline

Before blaming Claude Code, test each piece:

1. Test Ollama is running:

curl http://YOUR_IP:11434/api/version

2. Test the model loads with proper context:

ollama run qwen3-coder-32k
# In another terminal:
ollama ps
# Should show CONTEXT: 32768

3. Test the Anthropic endpoint:

curl http://YOUR_IP:11434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-coder-32k",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Say hello in Perl"}]
  }'

If step 3 works but Claude Code doesn't, the problem is settings.json.

Alternative: Environment Variables

You CAN use environment variables instead of settings.json. This is useful for testing but less clean for permanent setup:

export ANTHROPIC_BASE_URL="http://10.0.0.79:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_MODEL="qwen3-coder-32k"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1"
export API_TIMEOUT_MS="600000"
claude

Environment variables override settings.json. If things aren't working, check for conflicting env vars:

env | grep ANTHROPIC

Summary

Your ~/.claude/settings.json should have:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://YOUR_IP:11434",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_MODEL": "qwen3-coder-32k",
    "ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "API_TIMEOUT_MS": "600000"
  }
}

Key points:

Use YOUR Ollama server IP (or localhost)
Use YOUR configured model name (with proper context)
Context size is set in Ollama, not here
Select option 3 on first launch

Next chapter: Understanding context windows in depth.