Private AI Coding Setup with Ollama and Claude [2026]

A private AI coding environment does not mean every token stays offline forever. It means you control the default path. In practice, the cleanest setup in 2026 is Ollama for local code assistance and Claude Code for selective, explicit escalation when the problem is genuinely difficult.

That split matters. Ollama keeps everyday prompts, repository context, and routine code work on your machine. Claude Code gives you frontier-level reasoning when you need it, but only when you decide the task is worth sending to the cloud. For engineering teams handling internal services, client code, or regulated data, that local-first workflow is a much better default than piping everything to a hosted model all day.

Key Takeaway

The practical goal is not “Claude but offline.” That does not exist. The goal is a local-first coding stack: route most work to Ollama, keep sensitive repo context on-device, and invoke Claude only for the tasks where stronger reasoning outweighs the privacy tradeoff.

Prerequisites

Before you start, make sure you have the following:

Node.js 18+ for Claude Code. Anthropic’s official setup docs list Node.js 18 or newer as the baseline.
8GB RAM minimum for smaller local models, though 16GB+ is a more realistic floor for comfortable coding assistance.
10GB to 30GB free disk, depending on the model sizes you pull with Ollama.
A Claude account or Anthropic Console billing if you want cloud escalation through Claude Code.
Bash, Zsh, or Fish if you want the exact shell workflow below.

Official references: Ollama download, Ollama OpenAI compatibility docs, Claude Code setup, and Claude Code CLI reference.

Step 1: Install Ollama

Install Ollama first because it becomes your default local engine. This is the piece that keeps routine coding prompts on-device.

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start the local runtime
ollama serve

Now pull one coding model that matches your hardware. A smaller coder model is enough to prove the workflow; you can upgrade later.

# Pull a local coding model
ollama pull qwen2.5-coder:7b

# Verify the model exists locally
ollama list

For a first pass, keep the model modest. Engineers often make the mistake of pulling the biggest thing they can find and then blaming the tool when their laptop swaps itself to death. Start with a model you can actually run, verify latency, then move up.

If you want an API-compatible local endpoint for scripts and tooling, Ollama also exposes an OpenAI-style interface at http://localhost:11434/v1/. That makes it easy to plug into custom scripts later without changing your mental model.

Step 2: Install Claude Code

Next, install Claude Code. Anthropic’s current docs support both npm installation and the native installer; the npm route is still the simplest to explain and automate.

# Install Claude Code globally
npm install -g @anthropic-ai/claude-code

# Verify installation and environment
claude doctor

# Start interactive auth
claude

After launch, authenticate with either your Claude App subscription or your Anthropic Console account. For scripting, Anthropic also documents print mode, which is the useful piece for a private-by-default workflow:

# Non-interactive one-shot prompt
claude -p "Summarize the architecture of this repository" --model sonnet

This is the line between local and cloud in your setup. Ollama is the default. Claude Code is the deliberate override.

Step 3: Create Routing Commands

The easiest way to make the setup usable is to stop thinking about providers and start thinking about intent. Create one command for local work and one for frontier work.

mkdir -p ~/.local/bin

cat <<'EOF' > ~/.local/bin/ask-local
#!/usr/bin/env bash
ollama run qwen2.5-coder:7b "$*"
EOF

cat <<'EOF' > ~/.local/bin/ask-claude
#!/usr/bin/env bash
claude -p "$*" --model sonnet
EOF

chmod +x ~/.local/bin/ask-local ~/.local/bin/ask-claude
export PATH="$HOME/.local/bin:$PATH"

Now your workflow is obvious:

# Local-first tasks
ask-local "Explain the error handling in src/api/auth.ts"
ask-local "Write a regex for nginx access logs"
ask-local "Draft unit tests for this function"

# Harder tasks that justify cloud reasoning
ask-claude "Review this repository structure and propose a safer deployment layout"
ask-claude "Find the likely root cause of this intermittent race condition"

This is where the environment becomes operational rather than theoretical. You do not need a big orchestration layer. You need a reliable default and a clear escalation path.

Step 4: Add Project Guardrails

Claude Code supports project memory through ./CLAUDE.md, which Anthropic documents as a team-shared instruction file. Use it to encode your privacy rules and routing logic so the workflow survives beyond one engineer’s shell history.

cat <<'EOF' > CLAUDE.md
# AI usage rules
- Use local Ollama first for summaries, code search guidance, boilerplate, regex, and simple refactors.
- Escalate to Claude only for cross-file design, hard debugging, migration planning, and test strategy.
- Never paste production secrets, tokens, customer records, or raw dumps into cloud prompts.
- Share minimal snippets instead of full files when cloud help is enough.
- Prefer masked logs and redacted stack traces.
EOF

This file is also the right place to define coding standards, repo conventions, or deployment constraints. If you need to share snippets externally, sanitize them first. TechBytes’ Data Masking Tool is useful for scrubbing tokens, emails, IDs, and customer fields before a cloud prompt. And if you want to clean a generated snippet before committing it, the Code Formatter fits naturally into the same workflow.

The important policy point is simple: privacy comes from routing discipline, not from wishful labeling.

Verification and Expected Output

At this point you want to verify three things: the local runtime is alive, the local model is installed, and Claude Code is authenticated.

# Ollama service health
curl http://localhost:11434

# Installed local models
ollama list

# Local prompt test
ask-local "Reply with LOCAL_OK"

# Claude prompt test
ask-claude "Reply with CLAUDE_OK"

Expected output:

curl http://localhost:11434 should return a simple response indicating Ollama is running.
ollama list should show qwen2.5-coder:7b or whichever model you pulled.
ask-local should answer immediately without opening a browser or auth flow.
ask-claude should return a normal text response after your Anthropic login is complete.

If those four checks pass, you have the core environment working: local-first coding help with a selective cloud escalation path.

Troubleshooting

1. Ollama is installed, but nothing responds on port 11434

This usually means the local service is not actually running in the current session.

ollama serve
curl http://localhost:11434

If the second command still fails, check whether another process is binding that port or whether your service manager started Ollama elsewhere.

2. The model downloads fine, but responses are painfully slow or crash

You likely chose a model that is too large for your machine’s RAM or VRAM budget. Drop to a smaller coder model first, then scale back up once the workflow is stable.

ollama pull qwen2.5-coder:7b
ollama run qwen2.5-coder:7b "Write a Python retry helper"

Do not optimize for benchmark bravado. Optimize for sustained local usability.

3. Claude Code installs, but auth or npm permissions fail

Anthropic explicitly warns against using sudo npm install -g. If global npm permissions are messy, use a user-owned npm directory or Anthropic’s native installer instead.

npm install -g @anthropic-ai/claude-code
claude doctor
claude

If login still fails, verify you are in a supported region and that your Anthropic or Claude account has valid billing attached.

What’s Next

Once the basic environment is working, the next upgrade is not another model. It is better routing. Add a few shell shortcuts, document when your team may use cloud prompts, and standardize a redaction step for logs and customer data. That gets you more real privacy than endlessly tuning prompt templates.

From there, you can layer in editor integration, project-specific CLAUDE.md instructions, and local scripts that hit Ollama through its OpenAI-compatible endpoint. But the core architecture should stay the same: local by default, cloud by exception, and explicit rules around what leaves the machine.

That is the setup most engineering teams actually need in 2026.