TROUBLESHOOTING

When things don't work, here's how to figure out why.

The Most Common Problem

SYMPTOM: Claude Code seems "stupid" - forgets things, ignores files, gives incomplete answers, seems confused.

CAUSE: You're using 4K context (Ollama's default).

CHECK:

ollama ps

Look at the CONTEXT column. If it says 4096, that's your problem.

FIX:

You didn't create a properly configured model. Go back to chapter 3 and create a Modelfile with num_ctx 32768, then create a new model.

echo 'FROM qwen3-coder-next:latest
PARAMETER num_ctx 32768' > ~/Modelfile-qwen-claude

ollama create qwen3-coder-32k -f ~/Modelfile-qwen-claude

Then update settings.json to use "qwen3-coder-32k" instead of the base model name.

Context Problems

SYMPTOM: Claude forgets things mid-conversation

1. Check context usage:

/context

If over 80%, you're running out of room.

2. Check you're using the right model:

ollama ps

CONTEXT should show 32768 (or your configured size), NOT 4096.

3. If context is full, compact:

/compact

4. For long sessions, start fresh:

Ctrl+D, then claude again

SYMPTOM: Claude can't see files you asked it to read

Context is probably full. Earlier file contents got truncated.

/context
/compact

Or start a new session.

SYMPTOM: "I don't have access to that file" when file exists

The file was read but then truncated from context. Happens in long sessions. Run /compact or start fresh.

Connection Problems

SYMPTOM: "Cannot connect to server" or "Connection refused"

1. Is Ollama running?

On server:
ps aux | grep ollama

If not running, start it:
macOS: Open Ollama app
Linux: sudo systemctl start ollama

2. Is Ollama listening on the right interface?

lsof -i :11434

Should show *:11434 for network access
If localhost:11434, the OLLAMA_HOST setting didn't take

3. Can you reach the server?

From workstation:
curl http://YOUR_IP:11434/api/version

If this fails, it's network/firewall, not Claude Code

4. Is firewall blocking?

Allow port 11434 on the Ollama server

SYMPTOM: "404 page not found" on API calls

Ollama version is too old. The /v1/messages endpoint requires 0.14.0+.

ollama --version

# If old, update:
curl -fsSL https://ollama.ai/install.sh | sh

Authentication Problems

SYMPTOM: Claude Code keeps asking for login

1. Did you select option 3 (3rd-party platform)?

On first launch, you MUST select the third option.

2. Is settings.json valid JSON?

Syntax errors break everything. Validate:

python3 -c "import json; json.load(open('$HOME/.claude/settings.json'))"

If it errors, you have a typo (missing comma, wrong quotes, etc.)

3. Is ANTHROPIC_AUTH_TOKEN set?

Must be non-empty. Even "ollama" or "dummy" works.

SYMPTOM: Stuck at "Connecting to Anthropic"

The DISABLE_NONESSENTIAL_TRAFFIC setting isn't working.

Check settings.json has exactly:

"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"

Note: It's "1" as a STRING, not the number 1.

Model Problems

SYMPTOM: "Model not found" error

1. Check the model exists:

ollama list

Model name must match EXACTLY (including any :tag)

2. Are you using your CONFIGURED model name?

WRONG: "qwen3-coder-next:latest"  (base model)
RIGHT: "qwen3-coder-32k"          (your configured model)

3. Is the model fully downloaded?

ollama pull qwen3-coder-next:latest

Watch for completion

SYMPTOM: Model responses are gibberish

1. Context too small?

ollama ps

If CONTEXT shows 4096, you're using the base model, not your
configured model. Fix settings.json.

2. Context overflow?

Check /context. If near 100%, the model is confused.
Run /compact.

3. Wrong model for coding?

Make sure you're using a coding-specific model like qwen3-coder,
not a general chat model.

Timeout Problems

SYMPTOM: "Timeout exceeded" or "Request timed out"

Local models are SLOW. Default timeout is too short.

1. Increase timeout in settings.json:

"API_TIMEOUT_MS": "900000"

That's 15 minutes. Adjust as needed.

2. First request is always slowest

The model needs to load into memory. First query might take 2-3
minutes. Subsequent queries are faster.

3. Is your machine overloaded?

Check Activity Monitor / top. If CPU/memory is maxed, the model
can't run efficiently.

4. Is context too large for your RAM?

If you configured 64K context but only have 32GB RAM, the model
will swap to disk and become extremely slow. Use a smaller context.

Performance Problems

SYMPTOM: Everything is extremely slow

1. Check model size vs your RAM

Model weights: ~20GB
32K context: ~8GB extra
Total needed: ~28GB

If you have 32GB, you're tight. 16GB = definitely swapping.

2. Check nothing else is using resources

Activity Monitor > Memory tab
Close other heavy apps

3. Try a smaller context

Create a model with num_ctx 16384 or 8192 instead of 32768.

4. Verify you're on GPU, not CPU

ollama ps

PROCESSOR should show "100% GPU" not "CPU"

SYMPTOM: Response cuts off mid-sentence

Max tokens too low. Usually Claude Code handles this, but if using custom model parameters:

Modelfile:
PARAMETER num_predict 4096

Configuration File Issues

SYMPTOM: Settings changes have no effect

1. Are you editing the right file?

~/.claude/settings.json  (user settings)
.claude/settings.json    (project settings)

User settings apply everywhere. Project settings override.

2. Did you restart Claude Code?

Settings load at startup. Exit and restart.

3. Is JSON valid?

Even a missing comma breaks everything:

{
  "env": {
    "KEY1": "value",  <-- trailing comma breaks it
  }
}

4. Environment variables conflicting?

Check: env | grep ANTHROPIC

Shell variables override settings.json. Unset them if testing.

SYMPTOM: Model works in ollama but not Claude Code

You're probably using different model names.

In settings.json, you might have "qwen3-coder-next:latest". But you created "qwen3-coder-32k".

They're different models with different context settings.

Make sure settings.json uses YOUR configured model name.

Testing the Pipeline

Isolate where the problem is:

1. Test Ollama is running:

curl http://YOUR_IP:11434/api/version

If this fails: Ollama not running or network issue

2. Test model loads with correct context:

ollama run qwen3-coder-32k

In another terminal:
ollama ps

Check CONTEXT column shows 32768

If this fails: Model doesn't exist or Modelfile issue

3. Test the Anthropic endpoint:

curl http://YOUR_IP:11434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-coder-32k","max_tokens":50,"messages":[{"role":"user","content":"Hello"}]}'

If this fails: Ollama version issue or wrong model name

4. Test Claude Code:

ANTHROPIC_LOG=debug claude

Shows what's happening under the hood

Getting Help

If still stuck:

1. Verify Ollama version is 0.14.0+ (ollama --version) 2. Verify model exists with right name (ollama list) 3. Verify context is set correctly (ollama ps while model running) 4. Verify settings.json is valid JSON 5. Verify model name in settings matches ollama list 6. Try with localhost if remote isn't working 7. Check Ollama logs for errors

Quick Diagnostic Commands

# Ollama status
ollama --version
ollama list
lsof -i :11434

# Model with correct context?
ollama run qwen3-coder-32k &
sleep 5
ollama ps
# Should show CONTEXT: 32768

# Network test
curl http://YOUR_IP:11434/api/version

# API test with YOUR model
curl http://YOUR_IP:11434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-coder-32k","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'

# Settings check
cat ~/.claude/settings.json
python3 -c "import json; print(json.load(open('$HOME/.claude/settings.json')))"

# Context check (inside claude)
/context

Summary

Most problems are:

Using base model with 4K context (MUST create Modelfile)
Ollama not running or wrong version
Model name mismatch between ollama list and settings.json
Timeout too short
Invalid JSON in settings.json
Network/firewall blocking

Test each layer separately. Find where it breaks.

The #1 issue: People skip the Modelfile step and wonder why Claude Code seems broken. It's not broken - it just has no context to work with. Create your configured model.