Unlock The Power Of Local AI: Run Your Own LLM On A Mac Mini With OpenClaw

You picked up a Mac Mini to run OpenClaw. Great choice.

Unfortunately, Anthropic has since steered OpenClaw users toward its pay-per-token API¹, transforming what started as a one-time hardware buy into a (hefty) recurring cost². Even if you go with OpenAI, you’ll still be shelling out a fair amount each month.

💵💵 Running a local model wipes out the monthly fee for your OpenClaw agents, completely. 💵💵

That said, getting everything installed and configured can feel overwhelming, particularly if you’re just starting out with local LLMs.

In this guide, I’ll walk you through setting up a local LLM on your Mac Mini — the smoothest way possible — so it can power your agent at no cost.

Even if you’re a complete beginner, you can follow along.

🤨 “I’ve heard local LLMs aren’t as good — is that actually true?”

A local LLM, when configured correctly, will deliver results that are nearly on par for everyday tasks like handling emails, managing your calendar, setting reminders, controlling smart home devices, and doing basic web research — the kinds of things you’d typically use OpenClaw for.

For more demanding work — say, using OpenClaw for software development — there’s a link at the end that walks you through setting up a fallback model.

⚠️ Note: This isn’t a comprehensive OpenClaw tutorial.
It’s designed to help you get your local LLM up and running alongside your agent(s) as quickly as possible.

Hardware

This guide was tested on a Mac Mini with the following configuration:

OS	macOS Tahoe
Version	26.3.1
Processor	M2
Cores	8
Unified Memory	24GB

If you’re considering buying a Mac Mini, I’d suggest going with at least an M2+ chip and a minimum of 24GB of RAM. You can manage with 16GB, but it’ll be tight, and you may run into issues with larger context windows.

Getting everything ready

Start by installing OpenClaw using the official instructions. If that’s already done, feel free to skip ahead.

1. Install llama.cpp

We’re going to skip Ollama (the recommended local provider) and go with llama.cpp instead. By pairing a quantized model with llama.cpp, we can accelerate inference by up to 70%.

We need to compile llama.cpp from source with metal flags enabled and CUDA disabled. This applies the optimizations necessary to run the model at full speed on your Mac. Just follow the steps below.

1️⃣ First, from your home directory, install a couple of prerequisites via Homebrew.

# paste this into your terminal
$ brew install cmake curl

2️⃣ Next, compile llama.cpp with the correct flags.

# Clone llama.cpp
git clone 

# Configure the build with Metal acceleration
cmake llama.cpp -B llama.cpp/build 
    -DBUILD_SHARED_LIBS=OFF 
    -DGGML_METAL=ON 
    -DGGML_CUDA=OFF

# Compile
cmake --build llama.cpp/build 
    --config Release 
    -j$(sysctl -n hw.ncpu) 
    --clean-first 
    --target llama-cli llama-mtmd-cli llama-server llama-gguf-split

At this point, llama.cpp is built and ready to go.

2. Download the local LLM

As mentioned earlier, the secret to strong performance from a local model is quantization.

Quantization lets us take a larger, more powerful model and “compress” it intelligently so it fits on more modest hardware. The quantized version retains the vast majority of the original model’s capabilities.
Unless you have a beefy GPU or a Mac loaded with the maximum unified memory (80GB+), quantization is essential.

Blindly following the OpenClaw docs while attempting to use a quantized model will only lead to confusion and frustration.

There’s simply no clear, step-by-step resource explaining how to make quantized models work smoothly with agents.

Below is a tested recipe that will get your agent up and running.

Model Choice: Qwen 3.5-9B

Here we’re using Qwen 3.5 (the 9B parameter variant).

As of June 2026, it ranks among the top local models, outperforming Gemma 4-12B. It fits comfortably on both 16GB and 24GB Macs, requiring roughly 6–8GB of RAM. Users also rate it highly for OpenClaw.

Keep in mind that agents need longer context windows, which rules out running the larger 27B version — even with quantization.

1️⃣ Let’s grab the model.

# download the model
 curl -L -o models/Qwen3.5-9B-UD-Q4_K_KL.gguf 
"

2️⃣ Download the template and save it to the templates folder.

mkdir templates && 
curl -o templates/qwen35.jinja 
"

Important: you must use an agent-compatible template for OpenClaw. Without this, nothing will function properly.

3. Launch llama-server

Llama-server will act as our backend API. OpenClaw will connect to this local service instead of reaching out to OpenAI or Anthropic’s API directly.

We’ve already built llama-server and downloaded our model. Let’s do a quick test run.

1️⃣ Run a quick test.

 ./llama.cpp/llama-server 
  -m models/Qwen3.5-9B-UD-Q4_K_XL.gguf 
  --chat-template-file templates/qwen35.jinja 
  --temp 0.7 
  --top-p 0.9 
  --top-k 20 
  -c 64000 
  -ngl 20 
  --host 127.0.0.1 
  --port 8080

You should see output similar to this (with no errors):

 srv  llama_server: /think_off It looks like the output was cut off. Let me continue paraphrasing from where it stopped:

 srv  llama_server: waiting for requests on http://127.0.0.1:8080
If you see the server start up without errors, that means everything is working correctly.
4. Connect OpenClaw to your local model
Now that your local LLM is running, the final step is to point OpenClaw at your llama-server instance instead of an external API provider.
In your OpenClaw configuration, set the API base URL to http://127.0.0.1:8080 and select the appropriate model. This way, all your agent's requests will be handled locally — no API keys, no monthly bills.
Wrapping up
That's all there is to it. You now have a fully functional local LLM powering your OpenClaw agents, running entirely on your Mac Mini with zero ongoing costs.
For most everyday tasks — email, scheduling, reminders, smart home control, and light research — the Qwen 3.5-9B quantized model will serve you well. And if you ever need extra horsepower for more complex work, you can always configure a fallback to a cloud-based model.
Enjoy your free, private, locally-run AI agent.

2️⃣ Now, lets write a launchd daemon, so your local LLM server starts automatically and stays available after reboot. If you're familiar with Linux, launchd is essentially systemd for macOS

Save the following as /Library/LaunchDaemons/com.openclaw.llama-server.plist. You will need to use sudo for this.



Expand this for the plist file
❗Ensure that you replace YOUR_USERNAME with your actual username in the xml.




Label
com.openclaw.llama-server

UserName
YOUR_USERNAME

ProgramArguments

    /Users/YOUR_USERNAME/llama.cpp/llama-server

    -m
    /Users/YOUR_USERNAME/models/Qwen3.5-9B-UD-Q4_K_XL.gguf

    --chat-template-file
    /Users/YOUR_USERNAME/templates/qwen35.jinja

    --temp
    0.7

    --top-p
    0.9

    --top-k
    20

    -c
    64000

    -ngl
    20

    --host
    127.0.0.1

    --port
    8080


WorkingDirectory
/Users/YOUR_USERNAME

RunAtLoad


KeepAlive


StandardOutPath
/tmp/llama-server.log

StandardErrorPath
/tmp/llama-server.err


Now, enable it.
sudo chown root:wheel /Library/LaunchDaemons/com.openclaw.llama-server.plist && 
sudo chmod 644 /Library/LaunchDaemons/com.openclaw.llama-server.plist && 
sudo launchctl bootstrap system /Library/LaunchDaemons/com.openclaw.llama-server.plist
We can check to make sure the service is running properly by monitoring our log file.
tail -f /tmp/llama-server.err
At this point, our local LLM is loaded and running as a background service. The next step is to reconfigure OpenClaw to work with it.
4. Reconfigure OpenClaw to use the local model
We now need to register this local model in the OpenClaw configuration so it can be used by the gateway.
1️⃣ Add the following to the "models" section in .openclaw/openclaw.json:
{
  "models": {
    "providers": {
      "local": {
        "baseUrl": "/v1",
        "apiKey": "sk-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3-9b",
            "name": "Qwen3.5 9B Local",
            "contextWindow": 64000,
            "maxTokens": 8192
          }
        ]
      }
    /* REMOVE THIS COMMENT */
    /* you may add additional providers, like anthropic here */ 
    }
  }
}
Note: the values for contextWindow and maxTokens might need to be tweaked depending on your specific use case.
You'll also want to designate this model as the default for your agents:
"agents": {
    "defaults": {
      "model": {
        "primary": "local/qwen3-9b"
      },
      "models": {
        "local/qwen3-9b": {}
      }
 }
It's a good idea to double-check that the config file is syntactically correct. Run the command below to validate it:
openclaw config validate
2️⃣ Restart the gateway to make the local model available:
openclaw gateway restart
3️⃣ Confirm that OpenClaw has recognized the local model:
openclaw models list --provider local
You can also run a quick test inference:
openclaw infer model run 
  --model local/qwen3-9b 
  --prompt "Reply with exactly: pong" 
  --json
You should get a JSON response back. Important: check that there are no leaked tags in the output. You shouldn't see any, but it's worth verifying for security reasons.
{
  "ok": true,
  "capability": "model.run",
  "transport": "local",
  "provider": "local",
  "model": "qwen3-9b",
  "attempts": [],
  "outputs": [
    {
      "text": "pong",
      "mediaUrl": null
    }
  ]
}
The entire pipeline is now confirmed to be working. To be completely thorough—especially if this is your first agent—let's set up a test skill and make sure the model can reason through problems and execute tool calls correctly.
5. Verify functionality with a test skill
Let's build a simple 'python-calc' skill to confirm that our local model can reason and fire off tool calls as expected.
1️⃣ Run the following to create the skill. This will make the tool available across all of your OpenClaw agents:
mkdir -p ~/.openclaw/workspace/skills/python-calc

cat << 'EOF' > ~/.openclaw/workspace/skills/python-calc/SKILL.md
---
name: python-calc
description: A tool that evaluates mathematical expressions by executing a Python one-liner.
version: 1.0.0
---
## Instructions
1. Extract the precise mathematical expression from the user's request.
2. Use your built-in shell tool to run this command, substituting `` with the expression: `python3 -c "print()"`
3. Wait for the shell tool to return the stdout result.
4. Respond to the user with the exact numeric result produced by the script.
EOF
Once more, restart the gateway.
2️⃣ Now, let's fire off a quick agent call to confirm the tool works as intended:
openclaw agent --local --agent main --verbose on --thinking high --message 
"Use the python-calc skill to calculate 8664 multiplied by 222. 
Do not use skill_workshop. Tell me the final answer."
After a brief moment, if everything is wired up correctly, you should see something like:
The final answer is 1,923,408.
Fantastic!
In practice, you can expect speeds ranging from 20 to 70 tokens per second*. While that falls short of Claude-level performance (130+ tps), it's perfectly usable for an OpenClaw agent running on modest hardware.
Keep in mind that the thinking mode is set to high, so don't worry if responses take a little longer.
If you want to confirm that OpenClaw is actually hitting your local model, open a separate terminal and watch the llama-server log with tail -f /tmp/llama-server.err.
_{*Your mileage may vary}
Wrapping up
Getting a local LLM up and running—especially when you're dealing with custom templates and quantization—can be a real headache. It took two full days of back-and-forth to get it working on a friend's Mac the first time! Thanks to Jacob W. for the inspiration.
That's all there is to it! Hopefully this saves you a lot of 💸.
If it did, or if it saved you some headaches, feel free to buy me a coffee.
☕Cheers!
1 Tweet by Boris Cherny, discussing the "ban" of OpenClaw
2 User spends $420 a month on API fees
3 Using multiple providers with with OpenClaw

Top Posts

Bitcoin’s Indifference to Oil Market Recovery: 5-Year Data Reveals the Reason

Unmasking Vulnerabilities Without Exploits – SecurityWeek

A Confident Path to Modern Data: Azure Storage Migration Made Simple

Unlock the Power of Local AI: Run Your Own LLM on a Mac Mini with OpenClaw

`Salesforce-Style Code Generation in OWL: Build, Test, and Safely Ranked Python Functions with Unit Tests`

`Mastering Structured LLM Outputs: The Definitive Guide to JSON Mode vs Function Calling and When to Deploy Each`

`Mastering LATERAL, Semi, and Anti Unlock Elite-Level Joins for High-Performance PostgreSQL`

`Perception-Driven World Models for Embodied AI`

`OpenAI Unveils LifeSciBench: A 750-Task Benchmark That Grades AI Models on Real Life-Science Research Using Expert-Crafted Rubrics`

`Churn Thresholds: The Hidden Lever in Your Pricing Strategy`

`Bitcoin’s Indifference to Oil Market Recovery: 5-Year Data Reveals the Reason`

`Unmasking Vulnerabilities Without Exploits – SecurityWeek`

`A Confident Path to Modern Data: Azure Storage Migration Made Simple`

`Living on Solar for Years: 12 Myths You Can Finally Stop Believing in 2026`

`Computer Vision Deployments Propel Retail Productivity to New Heights`

`Salesforce-Style Code Generation in OWL: Build, Test, and Safely Ranked Python Functions with Unit Tests`

`Outdated STRC: Retail Investors Sitting on $8.8 Billion of Questionable Value`

`“Klue OAuth Breach Unmasks ‘Icarus’ in Salesforce Data Heist Campaign”`

`Trending`

`Bitcoin’s Indifference to Oil Market Recovery: 5-Year Data Reveals the Reason`

`Unmasking Vulnerabilities Without Exploits – SecurityWeek`

`Latest Posts`

`Not More Data, but Better World Models – Unite.AI`

`OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears`

Subscribe to Updates

Top Posts

Unlock the Power of Local AI: Run Your Own LLM on a Mac Mini with OpenClaw

Hardware

Getting everything ready

1. Install llama.cpp

2. Download the local LLM

Model Choice: Qwen 3.5-9B

3. Launch llama-server

4. Connect OpenClaw to your local model

Wrapping up

4. Reconfigure OpenClaw to use the local model

5. Verify functionality with a test skill

Wrapping up

Related Posts

`Related Posts`