Run Your Own AI

Large Language Models seem to be taking over the world. People in all kinds of careers are using LLMs for their jobs. I’ve been experimenting at work with using an LLM to write code with mixed, but promising, results.

After playing around with LLMs at work, I thought it might be interesting to run one locally on my laptop. Thanks to the hard work of several open-source developers, this is pretty easy.

Here are some instructions that I shared with my co-workers. These are specifically for Macs with an M-series processor. On a PC, skip the steps about MLX and use Ollama to download a model. Then install the llm-llama plugin instead of llm-mlx.

Install uv

If you’re a Python developer, you might already have uv. If not, it’s easy to install:

curl -LsSf https://astral.sh/uv/install.sh | sh

If you’re not familiar with uv, the online documentation describes its as “An extremely fast Python package and project manager, written in Rust.”

Install llm

Now that we have uv installed, you can use it to install Simon Willison’s llm:

uv tool install llm --python 3.12

Note that I specified Python version 3.12. One of the dependencies for the llm-mlx plugin doesn’t support 3.13 yet.

Install llm-mlx

MLX is an open-source framework for efficient machine learning research on Apple silicon. Basically what that means is MLX optimized models run much faster.

llm install llm-mlx

Again, if you’re not on a Mac, skip this step.

Download A Model

The llm CLI makes this easy:

llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit

This installs a model from the mlx-community, optimized for a Mac.

Run The Model

It’s finally time to test everything out:

llm -m mlx-community/Llama-3.2-3B-Instruct-4bit "Tell me a joke"

If you followed these steps, you should see a joke. AI jokes are kind of like dad jokes. People seem to grown more than laugh.

Chat With The Model

Rather than make one request at a time, you can also chat with local models:

llm chat -m mlx-community/Llama-3.2-3B-Instruct-4bit

This is how most of us interact with LLM models online. The chat option responds to whatever you type at the prompt until you enter “exit” or “quit”.

What’s Next?

Maybe try a different model and compare the results. The mlx-community on Hugging Face has lots of options. Beware, some of these are very large. In addition to a large download, they also require a lot of memory to run locally. For another small model, you might want to try Qwen3:

llm mlx download-model mlx-community/Qwen3-4B-4bit

Check out Simon Willison’s blog. He shares tons of interesting info covering the world of AI. I have a bad habit of leaving his posts open in tabs. Here are a few I have open right now:

Finally, a really interesting thing to me is embedding an LLM in an application. I’ll cover that in another post.