CLAUDE CODE + LOCAL OLLAMA - OVERVIEW

What We're Building

Claude Code is Anthropic's terminal-based agentic coding tool. By default, it phones home to Anthropic's cloud servers. You pay per token. Your code leaves your network.

But here's the thing: Ollama added Anthropic API compatibility in version 0.14.0 (January 2026). That means we can point Claude Code at our own local models instead.

Tonight we're setting that up. All the polish of Anthropic's CLI. None of the cloud dependency.

What is "Vibe Coding"?

Vibe coding is when you describe what you want and working code appears. Instead of typing every character yourself, you tell the AI your intent:

"Create a Perl script that fetches RSS feeds and extracts titles"

And it writes it. Edits files directly. Runs tests. Iterates until it works.

This isn't autocomplete. This is an agent that:

Reads your existing code
Understands your project structure
Makes multi-file changes
Executes shell commands
Learns your preferences over time

Claude Code is currently the best tool for this workflow. And now we can run it without cloud dependencies.

The Architecture

Here's what we're building:

+------------------+         +------------------+
|  Your Machine    |         |  AI Server       |
|                  |   LAN   |                  |
|  Claude Code     | ------> |  Ollama          |
|  (CLI client)    |         |  (qwen3-coder)   |
+------------------+         +------------------+

Or if you're running everything on one machine:

+------------------------------------------+
|  Your Machine                            |
|                                          |
|  Claude Code  ------>  Ollama            |
|  (localhost:11434)                       |
+------------------------------------------+

Both setups work. We'll cover configuring for a remote Ollama server since that's the more complex case. Localhost is just simpler.

What This Tutorial Covers

By the end of this tutorial, you'll understand:

Installing Ollama and Claude Code
Configuring the connection between them
How context windows work (and why they matter)
Setting appropriate timeouts for local models
Using CLAUDE.md to teach Claude about your project
Power user features: slash commands, context management
Troubleshooting common problems

What You'll Need

Hardware:

Mac with Apple Silicon (M1/M2/M3/M4)
16GB unified memory minimum, 32GB+ recommended
About 30GB free disk space

Or:

Linux machine with decent GPU
Same memory and storage requirements

Software:

macOS or Linux (Windows untested by us)
Node.js (for Claude Code installer)
Terminal comfort (we live in the command line)

What This Is NOT

This is a casual hobbyist demo, not a professional workshop. We're showing you something cool we figured out, sharing what works.

No certificates. No polished slides. No promises.

If you want hand-holding through every possible configuration issue, this isn't that. We'll cover the common cases and you'll figure out your edge cases.

A Note on Speed

Local inference is slower than cloud APIs. Way slower.

Anthropic's cloud: 20-50 tokens/second Local qwen3-coder on M4: 5-15 tokens/second

Expect 30-120 seconds for responses depending on complexity. That's the trade-off for privacy and zero cost.

If speed is your priority, pay for the cloud API. If privacy and independence matter more, welcome aboard.

Let's Get Started

Next chapter: Prerequisites. We'll make sure everything is in place before we start installing.