CONFIGURATION
This is where we tell Claude Code to talk to Ollama instead of Anthropic's cloud. We're connecting two things: Claude Code on your workstation, and your configured model on the Ollama server.
What Gets Configured Where
Before we start, understand what's controlled where:
OLLAMA CONTROLS (via Modelfile):
- Context window size (num_ctx)
- Model parameters (temperature, etc.)
- Which base model to use
CLAUDE CODE CONTROLS (env vars or settings.json):
- Where to send requests (Ollama server URL)
- Which model to use (your configured model name)
- Timeouts and permissions
- Claude Code behavior
You CANNOT set context window in Claude Code config. That's baked into your Ollama model. If you skipped creating a Modelfile in chapter 3, go back and do it now.
Two Configuration Methods
We recommend setting up BOTH for reliability:
- Environment variables in ~/.zshrc (takes priority)
- Settings file at ~/.claude/settings.json (backup)
If both exist, environment variables win.
Method 1: Environment Variables
Add these to your ~/.zshrc:
# Claude Code + Local Ollama
export ANTHROPIC_BASE_URL="http://10.0.0.79:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_MODEL="qwen3-coder-32k"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3-coder-32k"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1"
export API_TIMEOUT_MS="600000"
Then reload:
source ~/.zshrc
Method 2: Settings File
Create ~/.claude/settings.json:
mkdir -p ~/.claude
nano ~/.claude/settings.json
Paste this configuration:
{
"env": {
"ANTHROPIC_BASE_URL": "http://10.0.0.79:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"ANTHROPIC_MODEL": "qwen3-coder-32k",
"ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"API_TIMEOUT_MS": "600000"
}
}
Save and exit (Ctrl+O, Enter, Ctrl+X in nano).
Customize For Your Setup
1. Replace the IP address:
Change 10.0.0.79 to YOUR Ollama server's IP.
If running on the same machine, use localhost:
export ANTHROPIC_BASE_URL="http://localhost:11434"
2. Replace the model name:
Change qwen3-coder-32k to whatever you named your configured model.
This must EXACTLY match what ollama list shows.
Common mistake: using the base model name (qwen3-coder-next:latest)
instead of your configured model (qwen3-coder-32k). If you do this,
you'll get 4K context and wonder why everything is broken.
What Each Setting Does
ANTHROPIC_BASE_URL
Where to send API requests. Normally this is Anthropic's cloud.
We're pointing it at Ollama's Anthropic-compatible endpoint.
Format: http://HOST:PORT
Default Ollama port: 11434
Examples:
"http://localhost:11434" Same machine
"http://10.0.0.79:11434" Remote server
"http://ai-server.local:11434" Hostname (if DNS works)
ANTHROPIC_AUTH_TOKEN
Normally your Anthropic API key. Ollama doesn't need authentication,
but Claude Code won't start without SOMETHING here.
"ollama" is a dummy value. Could be any non-empty string.
This does NOT affect security. Ollama ignores it.
ANTHROPIC_MODEL
Which model to use. This must EXACTLY match a model in Ollama.
Run "ollama list" on your server. Use the name from that output.
WRONG: "qwen3-coder-next:latest" (base model, 4K context)
RIGHT: "qwen3-coder-32k" (your configured model)
If you created models with different context sizes, you can switch
between them by changing this value:
"qwen3-coder-8k" for light/fast work
"qwen3-coder-32k" for normal coding
"qwen3-coder-64k" for large codebases
ANTHROPIC_SMALL_FAST_MODEL
Claude Code uses a "small" model for quick tasks like summarization.
With local models, just use the same model as ANTHROPIC_MODEL.
If you have a lighter model for quick tasks, you could use it here.
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC
Stops Claude Code from trying to contact Anthropic's servers for
telemetry, updates, and other non-essential stuff.
Set to "1" (as a string) to enable.
Without this, Claude Code may hang trying to reach Anthropic.
API_TIMEOUT_MS
How long to wait for a response before timing out. In milliseconds.
Default is way too short for local models. Local inference takes
30-120 seconds per response.
Recommended values:
600000 = 10 minutes
900000 = 15 minutes
If you get timeout errors, increase this.
Full Autonomous Mode (Optional)
By default, Claude Code asks permission before editing files or running commands. You can pre-approve these actions:
{
"env": {
"ANTHROPIC_BASE_URL": "http://10.0.0.79:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"ANTHROPIC_MODEL": "qwen3-coder-32k",
"ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"API_TIMEOUT_MS": "600000"
},
"permissions": {
"allow": [
"Bash",
"Read",
"Write",
"Edit",
"MultiEdit",
"NotebookEdit"
],
"deny": []
}
}
What these permissions mean:
Bash Execute any shell command
Read Read any file
Write Create or overwrite files
Edit Modify existing files
MultiEdit Edit multiple files at once
NotebookEdit Modify Jupyter notebooks
This is "yolo mode." Claude does whatever it wants without asking. Great for productivity. Risky if you're not paying attention.
Selective Permissions
You can allow specific commands instead of everything:
"permissions": {
"allow": [
"Bash(npm run *)",
"Bash(git *)",
"Bash(prove *)",
"Read",
"Write",
"Edit"
],
"deny": [
"Bash(rm -rf *)",
"Bash(sudo *)"
]
}
This allows npm, git, and prove commands, plus file operations. It blocks dangerous commands like rm -rf and sudo.
First Launch
Now we can launch:
cd ~/some-project-directory
claude
You'll see a login prompt:
Select login method:
1. Claude account with subscription
2. Anthropic Console account
> 3. 3rd-party platform
SELECT OPTION 3 (3rd-party platform).
This tells Claude Code to use your configured base URL instead of trying to authenticate with Anthropic.
If It Works
You'll see the Claude Code interface. Type something:
Hello, can you see me?
Wait 30-60 seconds. Local inference is slow, especially first query.
If you get a response, congratulations. It's working.
If It Doesn't Work
Common errors:
"Cannot connect to server"
- Check ANTHROPIC_BASE_URL is correct
- Can you reach the Ollama server? curl http://YOUR_IP:11434/api/version
- Is Ollama running?
"Model not found"
- Does the model name match EXACTLY?
- Run "ollama list" on the server to check
- Are you using your configured model (32k) not the base model?
"Timeout exceeded"
- Increase API_TIMEOUT_MS
- First request is always slowest (model loading)
Stuck on login screen:
- Make sure you selected option 3 (3rd-party platform)
- Check settings.json syntax (valid JSON?)
Works but seems broken/forgetful:
- You're probably using the base model with 4K context
- Check: ollama ps (look at CONTEXT column)
- Fix: Use your configured model name in settings.json
Testing the Full Pipeline
Before blaming Claude Code, test each piece:
1. Test Ollama is running:
curl http://YOUR_IP:11434/api/version
2. Test the model loads with proper context:
ollama run qwen3-coder-32k
# In another terminal:
ollama ps
# Should show CONTEXT: 32768
3. Test the Anthropic endpoint:
curl http://YOUR_IP:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-coder-32k",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Say hello in Perl"}]
}'
If step 3 works but Claude Code doesn't, the problem is settings.json.
Alternative: Environment Variables
You CAN use environment variables instead of settings.json. This is useful for testing but less clean for permanent setup:
export ANTHROPIC_BASE_URL="http://10.0.0.79:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_MODEL="qwen3-coder-32k"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC="1"
export API_TIMEOUT_MS="600000"
claude
Environment variables override settings.json. If things aren't working, check for conflicting env vars:
env | grep ANTHROPIC
Summary
Your ~/.claude/settings.json should have:
{
"env": {
"ANTHROPIC_BASE_URL": "http://YOUR_IP:11434",
"ANTHROPIC_AUTH_TOKEN": "ollama",
"ANTHROPIC_MODEL": "qwen3-coder-32k",
"ANTHROPIC_SMALL_FAST_MODEL": "qwen3-coder-32k",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"API_TIMEOUT_MS": "600000"
}
}
Key points:
- Use YOUR Ollama server IP (or localhost)
- Use YOUR configured model name (with proper context)
- Context size is set in Ollama, not here
- Select option 3 on first launch
Next chapter: Understanding context windows in depth.