CLAUDE CODE + LOCAL OLLAMA - OVERVIEW
What We're Building
Claude Code is Anthropic's terminal-based agentic coding tool. By default, it phones home to Anthropic's cloud servers. You pay per token. Your code leaves your network.
But here's the thing: Ollama added Anthropic API compatibility in version 0.14.0 (January 2026). That means we can point Claude Code at our own local models instead.
Tonight we're setting that up. All the polish of Anthropic's CLI. None of the cloud dependency.
What is "Vibe Coding"?
Vibe coding is when you describe what you want and working code appears. Instead of typing every character yourself, you tell the AI your intent:
"Create a Perl script that fetches RSS feeds and extracts titles"
And it writes it. Edits files directly. Runs tests. Iterates until it works.
This isn't autocomplete. This is an agent that:
- Reads your existing code
- Understands your project structure
- Makes multi-file changes
- Executes shell commands
- Learns your preferences over time
Claude Code is currently the best tool for this workflow. And now we can run it without cloud dependencies.
The Architecture
Here's what we're building:
+------------------+ +------------------+
| Your Machine | | AI Server |
| | LAN | |
| Claude Code | ------> | Ollama |
| (CLI client) | | (qwen3-coder) |
+------------------+ +------------------+
Or if you're running everything on one machine:
+------------------------------------------+
| Your Machine |
| |
| Claude Code ------> Ollama |
| (localhost:11434) |
+------------------------------------------+
Both setups work. We'll cover configuring for a remote Ollama server since that's the more complex case. Localhost is just simpler.
What This Tutorial Covers
By the end of this tutorial, you'll understand:
- Installing Ollama and Claude Code
- Configuring the connection between them
- How context windows work (and why they matter)
- Setting appropriate timeouts for local models
- Using CLAUDE.md to teach Claude about your project
- Power user features: slash commands, context management
- Troubleshooting common problems
What You'll Need
Hardware:
- Mac with Apple Silicon (M1/M2/M3/M4)
- 16GB unified memory minimum, 32GB+ recommended
- About 30GB free disk space
Or:
- Linux machine with decent GPU
- Same memory and storage requirements
Software:
- macOS or Linux (Windows untested by us)
- Node.js (for Claude Code installer)
- Terminal comfort (we live in the command line)
What This Is NOT
This is a casual hobbyist demo, not a professional workshop. We're showing you something cool we figured out, sharing what works.
No certificates. No polished slides. No promises.
If you want hand-holding through every possible configuration issue, this isn't that. We'll cover the common cases and you'll figure out your edge cases.
A Note on Speed
Local inference is slower than cloud APIs. Way slower.
Anthropic's cloud: 20-50 tokens/second Local qwen3-coder on M4: 5-15 tokens/second
Expect 30-120 seconds for responses depending on complexity. That's the trade-off for privacy and zero cost.
If speed is your priority, pay for the cloud API. If privacy and independence matter more, welcome aboard.
Let's Get Started
Next chapter: Prerequisites. We'll make sure everything is in place before we start installing.