OpenClaw: My first experience with a self-hosted AI Assistant

Chris Cowley

February 25, 2026 - 5 minutes read - 977 words

Recently, I decided to jump on the latest AI bandwagon. While cloud-based large language models are powerful and convenient, I wanted complete control over my data, plus the satisfaction of running everything on my own infrastructure. Enter OpenClaw — a personal AI assistant that runs locally and integrates with messaging platforms you already use.

Info

The TL;DR is that I think Openclaw should have been called OpenSheep. I know a little bit about sheep because I know several farmers and they have all said the same thing: when sheep are looking for new, imaginative ways to kill themselves, they are looking for weird and wonderful ways to give all your money to the vet.

The challenge I set myself was to see how much I could do without actually spending anything. This left me with a few choices to try:

Google Gemini
Ollama, which has give you two more options:
- Ollama Cloud, which is a paid service
- Ollama Local, which is free to use on your own hardware

The first thing I tried was Ollama Cloud, because I use it quite a bit with Chat and it works really well. I can choose between most of the major open weight LLMs and I never really get anywhere near the limits of the free tier; even using reasoning models. Openclaw however will eat though the entire months allowance in a 10 minute conversation.

The key thing is that you need to avoid models that are designed for reasoning. These will do extra calls involved with the “thinking” process, which will use up your allowance very quickly. How quickly? It will use up the entire weekly allowance in a 15 minute conversation.

The next thing to try was Ollama Local; after all 99% of the configuration was done, I justed to switch to a local model instead of a cloud one. On paper, this is the dream: a fully self-hosted and private AI assistant. However, on my hardware at least, the reality is a long way from the dream. Finding a model that actually functions is incredibly hard. In principle, any model that support “tools” on Ollama should work, but I tried a LOT of models. I know that, as a rule of thumb, I need to use 8B models or less on my machines. I can use bigger ones, but they are too slow even for a patient chat. The problem is that most of these small models do not actually support tools. The say they do, they may even seem to work, but will fail in weird and silent ways. I tried all sorts:

Mistral
LLama
Qwen
Different permutations of the above, with different levels of quantization, and different fine-tuning.
And many more

All were a failure! I managed to get something working using Qwen-coder:30b, but on consumer hardware that is just too slow. In fact, as I write this, it has been chugging away at a todo item I gave it for the last 4 hours. Any smaller models just failed dismally (and often silently). Even the reasoning models were completely useless. Lot’s of resources say you should be able to use specifically Llama3:8b or Mistral:7b, but those are just a lie. I tried all sorts of different versions of those models, with different levels of quantization, and they all failed in different ways. Some would just not respond at all, some would respond but not actually do anything and some would do something, but then fail when it came to using tools. It was a complete nightmare.

Finally, I tried Google Gemini. This one works really well, but very quickly you run into the same problem as Ollama Cloud. Even though every one online says “Gemini has a really generous free tier”, it is not the case when used for OpenClaw. Gemini’s free tier is generous if you use it for chatting, but as soon as you use any agentic type workloads you eat through your allowance in minutes. The situation is a bit better than Ollama Cloud though: resets within 24 hours, rather than a week, but still not ideal.

So what can I actually do without spending money on AI services? Not much really! Honestly I cannot do mych more than having it remind of what I am working on each day.

Overall, I would call my experiment a failure. I genuinely feel no desire to pay for AI services when all I use is the chat interfaces. With Ollama, I have to really try to use up the weekly allowance, especially as offline models are fine for most of what I do. I was hoping for something similar with OpenClaw, but the complete and utter failure of all small local models make it impossible. None of this is made easier by the astounding amount of AI generated slop that is out there, with people claiming that certain models work when they don’t, or that they support tools when they don’t. I have no doubt that there are some models out there that do work, but finding them is a nightmare and the fact that they are so slow on consumer hardware makes it a non-starter for me.

For now I will continue using a mixture of Ollama Cloud (GLM-5) and Gemini. I do have a new Ryzen based mini PC on the way though, so perhaps that will allow me to run something locally. After all, OpenClaw is supposed to be a self-hosted AI assistant, but right now it is more of a “try to find a model that works and then use it in the cloud” assistant. I hope that in the future, as more models are developed and optimized for local use, that OpenClaw will become more of a viable option for those of us who want to keep our data private and have complete control over our AI assistants.