Using AI In Your Own Apps

In my previous post I covered how to run a large language model on your own computer. I covered how to send prompts to it and start a chat. The next step is adding an LLM to your own applications. The llm library makes this easy.

First, initialize a new project with uv and add the llm and llm-mlx libraries:

uv init llm-test --python 3.12
cd llm-test
uv add llm llm-mlx

Now open up main.py with your favorite editor and make it look something like this:

import llm

def main():
    model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
    response = model.prompt(
        "Tell me a joke."
    )
    print(response.text())

if __name__ == "__main__":
    main()

There are three steps to the process:

  1. Get the model we installed in the previous post.
  2. Get a response by prompting the model.
  3. Print the response text.

You can now run your new AI program with uv run main.py to see the result.

Once this is working, see the LLM Python API documentation for more information. For example, maybe you’d like to add a system prompt. Here’s a classic:

import llm

def main():
    model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
    response = model.prompt(
        "Tell me a joke.",
        system="Talk like a pirate."
    )
    print(response.text())

if __name__ == "__main__":
    main()

In the documentation you’ll also find support for conversations that allow you to have an ongoing chat. Here’s a simple example of a CLI chat bot:

import llm

def main():
    model = llm.get_model("mlx-community/Llama-3.2-3B-Instruct-4bit")
    conversation = model.conversation()
    
    print("Welcome to my LLM chatbot! Type 'exit' to quit.")

    while True:
        question = input("What can I help you with? ")

        if question == "exit":
            break

        response = conversation.prompt(question)
        print(response.text())

if __name__ == "__main__":
    main()

Note that I added a new conversation with model.conversation(). Then instead of prompting the model directly, I prompt the conversation. This allows the model to remember previous questions so you can ask follow-up questions.

From here the possibilities are basically endless. You could use your favorite web framework to create your own web-based chat bot or use fragments to analyze external content.

Next time I’ll cover using a commercial model. Until our personal computers get much faster we’ll only be able to go so far with local models.