Large Language Models seem to be taking over the world. People in all kinds of careers are using LLMs for their jobs. I’ve been experimenting at work with using an LLM to write code with mixed, but promising, results.
After playing around with LLMs at work, I thought it might be interesting to run one locally on my laptop. Thanks to the hard work of several open-source developers, this is pretty easy.
Here are some instructions that I shared with my co-workers. These are specifically for Macs with an M-series processor. On a PC, skip the steps about MLX and use Ollama to download a model. Then install the llm-llama
plugin instead of llm-mlx
.
Install uv
If you’re a Python developer, you might already have uv
. If not, it’s easy to install:
curl -LsSf https://astral.sh/uv/install.sh | sh
If you’re not familiar with uv
, the online documentation describes its as “An extremely fast Python package and project manager, written in Rust.”
Install llm
Now that we have uv installed, you can use it to install Simon Willison’s llm:
uv tool install llm --python 3.12
Note that I specified Python version 3.12. One of the dependencies for the llm-mlx plugin doesn’t support 3.13 yet.
Install llm-mlx
MLX is an open-source framework for efficient machine learning research on Apple silicon. Basically what that means is MLX optimized models run much faster.
llm install llm-mlx
Again, if you’re not on a Mac, skip this step.
Download A Model
The llm CLI makes this easy:
llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit
This installs a model from the mlx-community, optimized for a Mac.
Run The Model
It’s finally time to test everything out:
llm -m mlx-community/Llama-3.2-3B-Instruct-4bit "Tell me a joke"
If you followed these steps, you should see a joke. AI jokes are kind of like dad jokes. People seem to grown more than laugh.
Chat With The Model
Rather than make one request at a time, you can also chat with local models:
llm chat -m mlx-community/Llama-3.2-3B-Instruct-4bit
This is how most of us interact with LLM models online. The chat option responds to whatever you type at the prompt until you enter “exit” or “quit”.
What’s Next?
Maybe try a different model and compare the results. The mlx-community on Hugging Face has lots of options. Beware, some of these are very large. In addition to a large download, they also require a lot of memory to run locally. For another small model, you might want to try Qwen3:
llm mlx download-model mlx-community/Qwen3-4B-4bit
Check out Simon Willison’s blog. He shares tons of interesting info covering the world of AI. I have a bad habit of leaving his posts open in tabs. Here are a few I have open right now:
- Building search-based RAG using Claude, Datasette and Val Town – an in-depth look at how he uses LLMs to build tools.
- Run LLMs on macOS using llm-mlx and Apple’s MLX framework – the basis for this blog post.
- Here’s how I use LLMs to help me write code – it’s always interesting to see someone else’s process. Lots of useful tips here.
- Not all AI-assisted programming is vibe coding (but vibe coding rocks) – To some folks vibe coding is any time you use an AI to assist, but that’s really not what it means.
Finally, a really interesting thing to me is embedding an LLM in an application. I’ll cover that in another post.