TROUBLESHOOTING
When things don't work, here's how to figure out why.
The Most Common Problem
SYMPTOM: Claude Code seems "stupid" - forgets things, ignores files, gives incomplete answers, seems confused.
CAUSE: You're using 4K context (Ollama's default).
CHECK:
ollama ps
Look at the CONTEXT column. If it says 4096, that's your problem.
FIX:
You didn't create a properly configured model. Go back to chapter 3 and create a Modelfile with num_ctx 32768, then create a new model.
echo 'FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768' > ~/Modelfile-qwen-claude
ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude
Then update settings.json to use "qwen3-coder-32k" instead of the base model name.
Context Problems
SYMPTOM: Claude forgets things mid-conversation
1. Check context usage:
/context
If over 80%, you're running out of room.
2. Check you're using the right model:
ollama ps
CONTEXT should show 32768 (or your configured size), NOT 4096.
3. If context is full, compact:
/compact
4. For long sessions, start fresh:
Ctrl+D, then claude again
SYMPTOM: Claude can't see files you asked it to read
Context is probably full. Earlier file contents got truncated.
/context
/compact
Or start a new session.
SYMPTOM: "I don't have access to that file" when file exists
The file was read but then truncated from context. Happens in long sessions. Run /compact or start fresh.
Connection Problems
SYMPTOM: "Cannot connect to server" or "Connection refused"
1. Is Ollama running?
On server:
ps aux | grep ollama
If not running, start it:
macOS: Open Ollama app
Linux: sudo systemctl start ollama
2. Is Ollama listening on the right interface?
lsof -i :11434
Should show *:11434 for network access
If localhost:11434, the OLLAMA_HOST setting didn't take
3. Can you reach the server?
From workstation:
curl http://YOUR_IP:11434/api/version
If this fails, it's network/firewall, not Claude Code
4. Is firewall blocking?
Allow port 11434 on the Ollama server
SYMPTOM: "404 page not found" on API calls
Ollama version is too old. The /v1/messages endpoint requires 0.14.0+.
ollama --version
# If old, update:
curl -fsSL https://ollama.ai/install.sh | sh
Authentication Problems
SYMPTOM: Claude Code keeps asking for login
1. Did you select option 3 (3rd-party platform)?
On first launch, you MUST select the third option.
2. Is settings.json valid JSON?
Syntax errors break everything. Validate:
python3 -c "import json; json.load(open('$HOME/.claude/settings.json'))"
If it errors, you have a typo (missing comma, wrong quotes, etc.)
3. Is ANTHROPIC_AUTH_TOKEN set?
Must be non-empty. Even "ollama" or "dummy" works.
SYMPTOM: Stuck at "Connecting to Anthropic"
The DISABLE_NONESSENTIAL_TRAFFIC setting isn't working.
Check settings.json has exactly:
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
Note: It's "1" as a STRING, not the number 1.
Model Problems
SYMPTOM: "Model not found" error
1. Check the model exists:
ollama list
Model name must match EXACTLY (including any :tag)
2. Are you using your CONFIGURED model name?
WRONG: "qwen3-coder-next:latest" (base model)
RIGHT: "qwen3-coder-32k" (your configured model)
3. Is the model fully downloaded?
ollama pull qwen3-coder-next:latest
Watch for completion
SYMPTOM: Model responses are gibberish
1. Context too small?
ollama ps
If CONTEXT shows 4096, you're using the base model, not your
configured model. Fix settings.json.
2. Context overflow?
Check /context. If near 100%, the model is confused.
Run /compact.
3. Wrong model for coding?
Make sure you're using a coding-specific model like qwen3-coder,
not a general chat model.
Timeout Problems
SYMPTOM: "Timeout exceeded" or "Request timed out"
Local models are SLOW. Default timeout is too short.
1. Increase timeout in settings.json:
"API_TIMEOUT_MS": "900000"
That's 15 minutes. Adjust as needed.
2. First request is always slowest
The model needs to load into memory. First query might take 2-3
minutes. Subsequent queries are faster.
3. Is your machine overloaded?
Check Activity Monitor / top. If CPU/memory is maxed, the model
can't run efficiently.
4. Is context too large for your RAM?
If you configured 64K context but only have 32GB RAM, the model
will swap to disk and become extremely slow. Use a smaller context.
Performance Problems
SYMPTOM: Everything is extremely slow
1. Check model size vs your RAM
Model weights: ~20GB
32K context: ~8GB extra
Total needed: ~28GB
If you have 32GB, you're tight. 16GB = definitely swapping.
2. Check nothing else is using resources
Activity Monitor > Memory tab
Close other heavy apps
3. Try a smaller context
Create a model with num_ctx 16384 or 8192 instead of 32768.
4. Verify you're on GPU, not CPU
ollama ps
PROCESSOR should show "100% GPU" not "CPU"
SYMPTOM: Response cuts off mid-sentence
Max tokens too low. Usually Claude Code handles this, but if using custom model parameters:
Modelfile:
PARAMETER num_predict 4096
Configuration File Issues
SYMPTOM: Settings changes have no effect
1. Are you editing the right file?
~/.claude/settings.json (user settings)
.claude/settings.json (project settings)
User settings apply everywhere. Project settings override.
2. Did you restart Claude Code?
Settings load at startup. Exit and restart.
3. Is JSON valid?
Even a missing comma breaks everything:
{
"env": {
"KEY1": "value", <-- trailing comma breaks it
}
}
4. Environment variables conflicting?
Check: env | grep ANTHROPIC
Shell variables override settings.json. Unset them if testing.
SYMPTOM: Model works in ollama but not Claude Code
You're probably using different model names.
In settings.json, you might have "qwen3-coder-next:latest". But you created "qwen3-coder-32k".
They're different models with different context settings.
Make sure settings.json uses YOUR configured model name.
Testing the Pipeline
Isolate where the problem is:
1. Test Ollama is running:
curl http://YOUR_IP:11434/api/version
If this fails: Ollama not running or network issue
2. Test model loads with correct context:
ollama run qwen3-coder-32k
In another terminal:
ollama ps
Check CONTEXT column shows 32768
If this fails: Model doesn't exist or Modelfile issue
3. Test the Anthropic endpoint:
curl http://YOUR_IP:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"qwen3-coder-32k","max_tokens":50,"messages":[{"role":"user","content":"Hello"}]}'
If this fails: Ollama version issue or wrong model name
4. Test Claude Code:
ANTHROPIC_LOG=debug claude
Shows what's happening under the hood
Getting Help
If still stuck:
1. Verify Ollama version is 0.14.0+ (ollama --version) 2. Verify model exists with right name (ollama list) 3. Verify context is set correctly (ollama ps while model running) 4. Verify settings.json is valid JSON 5. Verify model name in settings matches ollama list 6. Try with localhost if remote isn't working 7. Check Ollama logs for errors
Quick Diagnostic Commands
# Ollama status
ollama --version
ollama list
lsof -i :11434
# Model with correct context?
ollama run qwen3-coder-32k &
sleep 5
ollama ps
# Should show CONTEXT: 32768
# Network test
curl http://YOUR_IP:11434/api/version
# API test with YOUR model
curl http://YOUR_IP:11434/v1/messages \
-H "Content-Type: application/json" \
-d '{"model":"qwen3-coder-32k","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'
# Settings check
cat ~/.claude/settings.json
python3 -c "import json; print(json.load(open('$HOME/.claude/settings.json')))"
# Context check (inside claude)
/context
Summary
Most problems are:
- Using base model with 4K context (MUST create Modelfile)
- Ollama not running or wrong version
- Model name mismatch between ollama list and settings.json
- Timeout too short
- Invalid JSON in settings.json
- Network/firewall blocking
Test each layer separately. Find where it breaks.
The #1 issue: People skip the Modelfile step and wonder why Claude Code seems broken. It's not broken - it just has no context to work with. Create your configured model.