OpenClaw with Ollama: Local Personal AI Assistant Guide

Running a private AI assistant on your own hardware used to mean months of infrastructure work. Today, two open-source tools eliminate that complexity: Ollama, which downloads and serves local language models in seconds, and OpenClaw, an agentic AI assistant that can read files, run terminal commands, search the web, and automate workflows — all from a single terminal interface. Together, they give you a personal AI assistant that costs nothing per message and sends zero data to the cloud.

This guide walks through the complete setup for running ollama openclaw as a personal AI assistant, including model selection by hardware tier, the fastest bootstrap path, and the configuration mistakes that break tool calling.

What Is OpenClaw with Ollama — and Why Run It Locally?

Ollama is a command-line runtime that pulls open-source language models (Llama, Qwen, Gemma, Mistral, and dozens more) and serves them via a local HTTP API at http://localhost:11434. It handles quantization, model loading, and memory management automatically — you run one command and the model is ready.

OpenClaw is an agentic AI assistant built for real actions, not just conversation. Unlike a chatbot, OpenClaw can execute terminal commands, read and edit files, run web searches, control browsers, and call external APIs. It connects to any OpenAI-compatible backend — or Ollama's native API — so you can swap cloud models for local ones without changing your workflow.

OpenClaw as a personal AI assistant

Most OpenClaw guides focus on coding agents. But OpenClaw's capability set extends far beyond code: it can draft documents, summarize research papers, manage your file system, automate repetitive shell tasks, and serve as a persistent knowledge assistant. The "personal AI assistant" framing matters because it changes how you configure the tool — you want a generalist model with strong instruction-following, not necessarily the highest SWE-bench score.

Why Ollama is the easiest local LLM runtime

Compared to running models via Python scripts or Docker containers, Ollama abstracts all complexity into a single binary. Pull a model with one command, serve it with another. It auto-detects GPU (NVIDIA, AMD, Apple Silicon) and falls back to CPU when no GPU is present. If you want to compare local backends, the OpenClaw + LM Studio setup is the main alternative — but Ollama's CLI integration with OpenClaw is tighter and faster to configure.

What Can You Do with Your Local AI Assistant?

Before committing to a hardware-intensive setup, it helps to know what you are actually getting. OpenClaw with Ollama gives you an assistant that:

Reads and edits files — summarize a 200-page PDF, refactor a codebase, or rename files in batch based on natural language instructions
Runs terminal commands — installs packages, manages processes, parses log files, and runs test suites on your behalf
Searches the web — OpenClaw ships with a web search and fetch plugin; local models can query the web and extract structured content without a cloud API key
Drafts and edits documents — write emails, reports, commit messages, and documentation with full access to your local files as context
Assists with coding — review diffs, explain stack traces, generate boilerplate, and navigate unfamiliar codebases
Automates workflows — chain multiple steps (fetch data, transform it, write a file, notify via webhook) in a single natural language instruction

Everything runs on your machine. No conversation is sent to an external server unless you configure an external plugin that requires it.

Choosing Your Model for Ollama + OpenClaw: Hardware Tiers

OpenClaw's system prompt alone consumes approximately 17,000 tokens. You need a model with at least a 32K context window to run it reliably; 65K or more is recommended when sub-agents are active. Models below 32K context will truncate mid-task and produce broken tool calls.

With that constraint in mind, here are the recommended models by hardware tier. For a CPU-only machine, the Qwen3.5 CPU-only guide covers the resource-constrained path in detail. If you want a broader overview of local model options, the best small LLMs for local use guide covers the full landscape.

Entry tier (8–16 GB RAM, CPU-only or 8GB GPU): Llama 3.3 8B or Qwen3.5:9b — fast for general tasks, drafting, and simple file operations
Mid-range (16–24 GB VRAM): Qwen3-Coder:14b or Llama 3.3 70B Q4 — reliable tool calling, good for coding help and multi-step workflows
High-end (32 GB+ VRAM): Qwen3.5:27b (Q4_K_M quantization) or Qwen3-Coder:32b — best quality for complex reasoning and production use

Model selection tip: Qwen3.5 and Qwen3-Coder models handle OpenClaw's tool-calling format more reliably than Mistral or older Llama variants. Mistral and early Llama models often output raw JSON instead of executing tool calls correctly. If you are on a budget, Llama 3.3 8B is a solid entry point that handles general instructions reliably within 8 GB RAM.

Step 1 — Install Ollama

Install Ollama with a single command on macOS and Linux:

curl -fsSL https://ollama.com/install.sh | sh

On Windows, download the installer from ollama.com and run it. After installation, verify Ollama is running:

ollama --version
# Expected: ollama version 0.x.x

curl http://localhost:11434/api/tags
# Returns a JSON list of locally available models

If /api/tags returns a connection error, start the Ollama service manually:

# macOS / Linux
ollama serve

# Windows: Ollama runs as a system tray application — check the tray icon

Step 2 — Install OpenClaw with Ollama

The fastest path uses the ollama launch integration (requires Ollama 0.5 or later). This single command pulls your chosen model if you do not have it locally, installs OpenClaw if it is not already on your system, configures the Ollama gateway, and opens the assistant interface:

# Entry tier (8GB hardware)
ollama launch openclaw --model qwen3.5:9b

# Mid-range (16GB hardware)
ollama launch openclaw --model qwen3-coder:14b

# High-end (32GB+ hardware)
ollama launch openclaw --model qwen3.5:27b

If you prefer a manual install or need platform-specific instructions, follow the steps in the OpenClaw local installation guide for Windows, macOS, and Linux, then configure Ollama as a provider in the next step.

Step 3 — Configure Your Model

If you used ollama launch openclaw, the configuration is automatic. For manual setups, open OpenClaw's settings and set the Ollama provider with the native API URL:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://localhost:11434",
        "model": "qwen3.5:27b"
      }
    }
  }
}

Critical: Do not use http://localhost:11434/v1. The /v1 OpenAI-compatible endpoint breaks OpenClaw's tool-calling protocol — models output raw tool JSON as plain text instead of executing actions. Always use the native Ollama URL without the /v1 suffix.

Pull your chosen model separately if ollama launch did not do it:

ollama pull qwen3.5:27b
# or for entry-tier hardware:
ollama pull qwen3.5:9b

Step 4 — Start Your First ollama openclaw Session

Once OpenClaw is running and connected to Ollama, interact with your local AI assistant through OpenClaw's terminal or web interface. Run these tasks to verify the setup works end-to-end:

# Verify tool calling works (file system access)
"List all Python files in my home directory and summarize what each one does."

# Verify web search plugin (if enabled)
"Search for the latest Ollama release notes and give me a summary."

# Verify terminal execution
"Check how much disk space I have free on my main drive."

If the model returns clean, structured answers with actual results rather than generic text, tool calling is working correctly. If you see JSON blobs or error messages in the response body, the most likely cause is the /v1 URL misconfiguration covered below.

Troubleshooting: Common ollama openclaw Issues

Broken tool calls — the /v1 URL mistake

The single most common failure mode when connecting OpenClaw to Ollama. If your assistant responds with raw JSON like {"name":"bash","input":{"command":"ls"}} instead of executing the command, check your provider URL:

# WRONG — breaks tool calling
"baseUrl": "http://localhost:11434/v1"

# CORRECT — use native Ollama API
"baseUrl": "http://localhost:11434"

Fix the URL and restart OpenClaw. The model does not need to be re-pulled.

Context window errors

If OpenClaw reports context overflow or the assistant loses track of earlier instructions mid-task, your model's context window is too small. Verify the model's context length and override it if necessary:

# Check a model's context length
ollama show qwen3.5:27b --modelfile | grep "num_ctx"

# Override context window in OpenClaw config
"options": {
  "num_ctx": 65536
}

Slow inference

If responses exceed 30 seconds per message, the model likely exceeds your hardware tier. Drop to a smaller quantization (Q4_K_M instead of Q8_0) or switch to a smaller model. Verify Ollama is using your GPU rather than CPU:

ollama ps
# Shows loaded model and device (CPU or GPU name)

For production-grade isolation or multi-agent setups that need network-level sandboxing, the NemoClaw + OpenClaw secure sandbox guide covers vLLM-backed deployment with full GPU isolation.

What's Next

With OpenClaw running on Ollama, you have a personal AI assistant that costs nothing to run and keeps all data on your machine. The setup you just completed is the foundation — from here, you can customize it significantly. Custom system prompts let you give OpenClaw a persistent persona and task context that survives across sessions. Plugin extensions add calendar access, custom API integrations, or additional data sources. OpenClaw's sub-agent feature lets you delegate different task types to different models simultaneously — a coding-optimized model for code review while a general model handles writing tasks.

If you want to experiment with alternative model families, the Gemma 4 with Ollama guide is a good next step — Google's open-source models perform well on instruction-following tasks and are worth benchmarking against Qwen for your specific use case.