June 2025 – AnthonyLewis.com

LLM Workflows

There are as many ways to use LLMs to write code as there are engineers experimenting with them. I love reading about LLM workflows, picking up tips, and trying to streamline my own setup.

I’ve already linked to Simon Willison’s Here’s how I use LLMs to help me write code, but I’ll start with it again. Simon describes using LLMs to write code as difficult and unintuitive which I agree with, especially when I was just getting started. His post then includes a bullet point list of tips.

An older post that I just came across is How I use LLMs by Karen Sharma. This one focuses on using Aider to write and run code. As someone who used to spend almost all of my time in a terminal, I can appreciate this workflow. Aider is very powerful. If I was still spending all my dev time in tmux and vim, I can definitely see myself using something like this.

While you’re at Karen’s blog, check out Cleaning up Notes with LLM. He wrote a Python script using LLM to clean up Obsidian files. Not only is this a great example of LLM use it’s also something that I, as an Obsidian user, could run on my own files to generate some better tags and categories.

Next, up Harper Reed shares My LLM codegen workflow atm. I appreciate the fact that he separated greenfield from legacy code. Many people focus on using LLMs for greenfield, but as an engineer at a large corporation most of my time is spent working on existing code.

Harper’s workflow for greenfield development is really interesting. He uses one LLM for planning. This generates a prompt_plan.md and a todo.md checklist. He then feeds those files to another LLM either directly to something like Claude or using Aider. For non-greenfield he uses a set of mise tasks to get context and feed it to the LLM.

Harper followed up the original workflow post with Basic Claude Code. The workflow is similar to the previous post, but it uses Claude Code instead of Aider. Claude Code opens the generated files and handles everything that needs to be done.

As for me, I’ve been trying to follow An LLM Codegen Hero’s Journey. I’m currently using GPT-4 through Copilot in Visual Studio Code. Having an employer provide free access to all the tools is really nice. For most of my work I’m using autocomplete, but I’m trying to let Copilot do more. I guess that puts me somewhere around step 3.

I’ve spent a good chunk of time working on instruction files for VS Code. This improves Copilot’s ability to make changes to our existing code. An interesting trick is to have Copilot enhance your instruction file using what it knows about your application. I was able to catch and correct some misunderstandings which made Copilot that much better.

If you have an interesting workflow or a link to someone else’s workflow, please share it with me on Bluesky. We’re all still learning and sharing is caring.

Using AI In Your Own Apps

In my previous post I covered how to run a large language model on your own computer. I covered how to send prompts to it and start a chat. The next step is adding an LLM to your own applications. The llm library makes this easy.

First, initialize a new project with uv and add the llm and llm-mlx libraries:

uv init llm-test --python 3.12
cd llm-test
uv add llm llm-mlx

Now open up main.py with your favorite editor and make it look something like this:

import llm

def main():
    model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
    response = model.prompt(
        "Tell me a joke."
    )
    print(response.text())

if __name__ == "__main__":
    main()

There are three steps to the process:

Get the model we installed in the previous post.
Get a response by prompting the model.
Print the response text.

You can now run your new AI program with uv run main.py to see the result.

Once this is working, see the LLM Python API documentation for more information. For example, maybe you’d like to add a system prompt. Here’s a classic:

import llm

def main():
    model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
    response = model.prompt(
        "Tell me a joke.",
        system="Talk like a pirate."
    )
    print(response.text())

if __name__ == "__main__":
    main()

In the documentation you’ll also find support for conversations that allow you to have an ongoing chat. Here’s a simple example of a CLI chat bot:

import llm

def main():
    model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
    conversation = model.conversation()
    
    print("Welcome to my LLM chatbot! Type 'exit' to quit.")

    while True:
        question = input("What can I help you with? ")

        if question == "exit":
            break

        response = conversation.prompt(question)
        print(response.text())

if __name__ == "__main__":
    main()

Note that I added a new conversation with model.conversation(). Then instead of prompting the model directly, I prompt the conversation. This allows the model to remember previous questions so you can ask follow-up questions.

From here the possibilities are basically endless. You could use your favorite web framework to create your own web-based chat bot or use fragments to analyze external content.

Next time I’ll cover using a commercial model. Until our personal computers get much faster we’ll only be able to go so far with local models.

Run Your Own AI

Large Language Models seem to be taking over the world. People in all kinds of careers are using LLMs for their jobs. I’ve been experimenting at work with using an LLM to write code with mixed, but promising, results.

After playing around with LLMs at work, I thought it might be interesting to run one locally on my laptop. Thanks to the hard work of several open-source developers, this is pretty easy.

Here are some instructions that I shared with my co-workers. These are specifically for Macs with an M-series processor. On a PC, skip the steps about MLX and use Ollama to download a model. Then install the llm-llama plugin instead of llm-mlx.

Install uv

If you’re a Python developer, you might already have uv. If not, it’s easy to install:

curl -LsSf https://astral.sh/uv/install.sh | sh

If you’re not familiar with uv, the online documentation describes its as “An extremely fast Python package and project manager, written in Rust.”

Install llm

Now that we have uv installed, you can use it to install Simon Willison’s llm:

uv tool install llm --python 3.12

Note that I specified Python version 3.12. One of the dependencies for the llm-mlx plugin doesn’t support 3.13 yet.

Install llm-mlx

MLX is an open-source framework for efficient machine learning research on Apple silicon. Basically what that means is MLX optimized models run much faster.

llm install llm-mlx

Again, if you’re not on a Mac, skip this step.

Download A Model

The llm CLI makes this easy:

llm mlx download-model mlx-community/Llama-3.2-3B-Instruct-4bit

This installs a model from the mlx-community, optimized for a Mac.

Run The Model

It’s finally time to test everything out:

llm -m mlx-community/Llama-3.2-3B-Instruct-4bit "Tell me a joke"

If you followed these steps, you should see a joke. AI jokes are kind of like dad jokes. People seem to groan more than laugh.

Chat With The Model

Rather than make one request at a time, you can also chat with local models:

llm chat -m mlx-community/Llama-3.2-3B-Instruct-4bit

This is how most of us interact with LLM models online. The chat option responds to whatever you type at the prompt until you enter “exit” or “quit”.

What’s Next?

Maybe try a different model and compare the results. The mlx-community on Hugging Face has lots of options. Beware, some of these are very large. In addition to a large download, they also require a lot of memory to run locally. For another small model, you might want to try Qwen3:

llm mlx download-model mlx-community/Qwen3-4B-4bit

Check out Simon Willison’s blog. He shares tons of interesting info covering the world of AI. I have a bad habit of leaving his posts open in tabs. Here are a few I have open right now:

Building search-based RAG using Claude, Datasette and Val Town – an in-depth look at how he uses LLMs to build tools.
Run LLMs on macOS using llm-mlx and Apple’s MLX framework – the basis for this blog post.
Here’s how I use LLMs to help me write code – it’s always interesting to see someone else’s process. Lots of useful tips here.
Not all AI-assisted programming is vibe coding (but vibe coding rocks) – To some folks vibe coding is any time you use an AI to assist, but that’s really not what it means.

Finally, a really interesting thing to me is embedding an LLM in an application. I’ll cover that in another post.