I tested local AI on my M1 Mac, expecting magic – and got a reality check instead

MacBook Pro M1 — The M1 MacBook Pro is an old but still capable device in 2026.

Kyle Kucharski/ZDNET

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

Ollama makes it fairly easy to download open-source LLMs.
Even small models can run painfully slow.
Don’t try this without a new machine with 32GB of RAM.

As a reporter covering artificial intelligence for over a decade now, I have always known that running artificial intelligence brings all kinds of computer engineering challenges. For one thing, the large language models keep getting bigger, and they keep demanding more and more DRAM memory to run their model “parameters,” or “neural weights.”

Also: How to install an LLM on MacOS (and why you should)

I have known all that, but I wanted to get a feel for it firsthand. I wanted to run a large language model on my home computer.

Now, downloading and running an AI model can involve a lot of work to set up the “environment.” So, inspired by my colleague Jack Wallen’s coverage of the open-source tool Ollama, I downloaded the MacOS binary of Ollama as my gateway to local AI.

Ollama is relatively easy to use, and it has done nice work integrating with LangChain, Codex, and more, which means it is becoming a tool for bringing together lots of aspects of AI, which is exciting.

Reasons to keep it local

Running LLMs locally, rather than just typing into ChatGPT or Perplexity online, has a lot of appeal for not just programmers, but any information worker.

First, as an information worker, you will be more desirable in the job market if you can do something like download a model and run it rather than typing into the online prompt just like every free user of ChatGPT. We’re talking basic professional development here.

Second, with a local instance of an LLM, you can keep your sensitive data from leaving your machine. That should be of obvious importance to any information worker, not just coders. In my case, my project goal was to use local models as a way to mine my own trove of articles over the years, as a kind of report on what I’ve written, including things I might have forgotten about. I liked the idea of keeping all the files local rather than uploading them to a cloud service.

Also: I tried vibe coding an app as a beginner – here’s what Cursor and Replit taught me

Third, you can avoid fees charged by OpenAI, Google, Anthropic, and the rest. As I wrote recently, prices are set to rise for using LLMs online, so now is a good time to think about ways to do the bulk of your work offline, on your own machine, where the meter is not constantly running.

(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

Fourth, you have a lot more control. For example, if you do want to do programming, you can tweak LLMs, known as fine-tuning them, to get more focused results. And you can use various locally installed tools such as LangChain, Anthropic’s Claude Code tool, OpenAI’s Codex coding tool, and more.

Also: Why you’ll pay more for AI in 2026, and 3 money-saving tips to try

Even if you just want to do information-worker tasks such as generating reports, doing so with a local cache of documents or a local database can be done with greater control than uploading stuff to the bot.

Bare-minimum bare-metal

I set out on this experiment with a bare-minimum machine, as far as what it takes to run an LLM. I wanted to find out what would happen if someone who doesn’t constantly buy new machines tried to do this at home on the same computer they use for everyday tasks.

My MacBook Pro is three years old and has 16 gigabytes of RAM and a terabyte hard drive that’s three-quarters full, running not the latest MacOS, but MacOS Sonoma. It’s the 2021 model, model number MK193LL/A, and so, while it was top of the line when I bought it at Best Buy in January of 2023 in a close-out sale, it was already becoming yesterday’s best model back then.

Also: 5 reasons I use local AI on my desktop – instead of ChatGPT, Gemini, or Claude

I know, I know: This is beyond the typical useful life of machines and beyond anyone’s depreciation schedule. Nevertheless, the MacBook was a great upgrade at the time, and it has continued to perform superbly on a daily basis for the typical information-worker tasks: calendar, tons of email, tons of websites, video post-production, podcast audio recording, and more. I never have any complaints. Hey, if it ain’t broke, right?

So the question was, how would this venerable but still mighty machine handle a very different new kind of workload?

Starting Ollama

The start-up screen for Ollama looks like ChatGPT, with a friendly prompt to type into, a “plus” sign to upload a document, and a drop-down menu of models you can install locally, including popular ones such as Qwen.

If you just start typing at the prompt, Ollama will automatically try to download whatever model is showing in the drop-down menu. So, don’t do any typing unless you want to go with model roulette.

Instead, I looked through the models in the drop-down list, and I realized that some of these models weren’t local — they were in the cloud. Ollama runs a cloud service if you want its infrastructure instead of your own. That can be useful if you want to use much larger models that would overly tax your own infrastructure.

Per the pricing page, Ollama offers some access to the cloud in the free account, with the ability to run multiple cloud models covered by the “Pro” plan at $20 per month, and even more usage in the “Max” plan at $100 per month.

Also: This app makes using Ollama local AI on MacOS devices so easy

Sticking with locally running options, I decided to check out the broader list of models in the model directory maintained by Ollama.

At random, I chose glm-4.7-flash, from the Chinese AI startup Z.ai. Weighing in at 30 billion “parameters,” or neural weights, GLM-4.7-flash would be a “small” large language model by today’s standards, but not tiny, as there are open-source models with fewer than a billion parameters. (A billion parameters was big, not so long ago!)

The directory gives you the terminal commands to download the chosen model from the Mac terminal, just by copying and pasting at the prompt, such as:

ollama run glm-4.7-flash

Be mindful of disk space. Glm-4.7-flash weighs in at 19 gigabytes of disk usage, and remember, that’s small!

In my experience, downloading models seems fairly swift, though not lightning fast. On a gigabit-speed cable modem to my home office provided by Spectrum in New York City, the model was downloading at a rate of 45 megabytes per second at one point, though it later dropped to a slower rate of throughput.

Getting to know the model

My first prompt was fairly straightforward: “What kind of large language model are you?”

I sat watching for a while as the first few characters materialized in response: “[Light bulb icon] Thinking — Let me analyze what makes me a” and that was it.

Also: My go-to LLM tool just dropped a super simple Mac and PC app for local AI – why you should try it

Ten minutes later, it hadn’t gotten much farther.

Let me analyze what makes me a large language model and how to explain this to the user.

First, I need to consider my fundamental nature as an AI system. I should explain that I’m designed to understand and generate human language through patterns in large datasets. The key is to be clear

And everything on the Mac had become noticeably sluggish.

Forty-five minutes later, glm-4.7-flash was still producing thoughts about thinking: “Let me structure this explanation to first state clearly…,” and so on.

Trapped in prompt creep

An hour and 16 minutes later — the model “thought” for 5,197.3 seconds — I finally had an answer to my query about what kind of language model glm-4.7-flash was. The answer turned out not to be all that interesting for all the time spent. It didn’t tell me much about glm that I couldn’t have divined on my own, nor anything significant about the difference between glm and other large language models:

I figured I was done with glm at this point. Unfortunately, Ollama provides no instructions for removing a model once it’s installed locally. The models are kept in a hidden folder “.ollama” in the current user directory on MacOS, inside another folder called “models.” Inside the models folder are two folders, “blobs” and “manifests.” The bulk of a model is in the blobs folder. Inside the manifests is a folder “library” containing a folder named for each model you’ve downloaded, and inside that, a “latest” folder.

Using the terminal, I deleted the contents of blobs and deleted the contents of each model folder, and that solved the matter. (Jack later informed me that the terminal command to get rid of any model is “ollama rm “.)

Jack had also recommended OpenAI’s recent open-source model, gpt-oss, in the 20-billion-parameter flavor, “20b,” which he said was markedly faster running locally than others he’d tried. So, I went next to that in the directory.

Also: This is the fastest local AI I’ve tried, and it’s not even close – how to get it

This time, after about six minutes, gpt-oss:20b produced — at a pace not snail-like, but not swift either — the response that it is “ChatGPT, powered by OpenAI’s GPT-4 family,” and so on.

That response was followed by a nice table of details. (Oddly, gpt-oss:20b told me it had “roughly 175 billion parameters,” which suggests gpt-oss:20b doesn’t entirely grasp its own 20b identity.)

At any rate, this was fine for a simple prompt. But it was already clear that I was going to have problems with anything else more ambitious. The feeling of waiting for the reply was slow enough — a kind of prompt creep, you might say — that I didn’t dare venture to add any more complexity, such as uploading an entire trove of writings.

We’re going to need a newer machine

OpenAI’s actual ChatGPT online service (running GPT5.2) tells me that a minimum configuration for a computer running gpt-oss:20b is really 32 gigabytes of DRAM. The M1 Pro silicon of the MacBook has an integrated GPU, and ChatGPT approvingly pointed out that Ollama has provided the gpt-oss:20b version with support for the Mac GPU, a library known as a “llama.cpp backend.”

Also: I tried the only agentic browser that runs local AI – and found only one downside

So, everything should be OK, but I really do need more DRAM than just 16 gigs. And I need to trade up from the now five-year-old M1 to an M4 or M5. It’s rather fascinating to me, with three decades of writing about computers, that for an information worker, we are talking about 32 gigabytes as the minimum reasonable configuration.

As I mentioned recently, DRAM is skyrocketing in price because all those cloud data centers are consuming more and more DRAM to run large language models. So, it’s me against the cloud vendors, you could say, and I’ll probably be dipping into the credit card to trade up to a new computer. (Apple will give me about $599 for my M1 MacBook as a trade-in.)

While my fledgling local Ollama effort didn’t yield success, it has given me a newfound appreciation for just how memory-intensive AI is. I always knew that from years of reporting on AI, but I now feel it in my bones, that sense when the response to the prompt takes forever scrolling across the screen.

Source link

Sign Up to Our Newsletter

Top Categories

Uncategorized

Tech News

Tech

Software development

Popular Tech News

Somerset and Dorset sign £222m contract with Epic...

Will AI steal your job? It’s complicated, new...

The starter you pick in Pokémon FireRed and...

How to Make a Killing review: a serial...