Playing With Local AI
I've poked at some of the generative AI tools on the Internet, but hesitate to jump in fully because they often have limits or costs, and I have a bevy of hardware I can use to task things locally.
I've poked at a few different "host your own" things, but almost all of them require some heavy GPU or severe options. I'm not looking to build an LLM, and I'm not (yet) concerned with performance of a system, so I've been a little disappointed in the options discovered previously.
The other day I found a piece on Ollama. Their landing page is a bit terse, and it's really unclear how to use it unless you scour their blog, find a different article, or stumble on their GitHub repo. Even then, it's a little bit of "reminding people who know how it's done," or "trust us, this works." Also in the article was a UI in the form of Open WebUI. Their "get now" heads right to their GitHub repo, with similar terse instructions. There, however, is a link to some nicely styled documentation, and there it has (also found in some of the other links) a couple quick steps to get this all running in Docker. After scanning through it all a few times, the bits started to click.
I quickly tossed the Docker image of Ollama on my big server. It doesn't have a GPU, which is recommended but not required, but it does have a flurry of CPUs (with 8 independent quad-core CPUs) and 128GB of RAM. It's a little bit of an older system, and is evidently missing something in its generation of CPUs, because the log messages in the container complained about them. So I moved it to my lesser server, which is a fast 16-core CPU with only 32GB of RAM. It spun right up, let me load a model, and hit it with some test queries.
I also tossed the Docker image of Open WebUI on the same server, with the base URL hint pointing to the Ollama container. I couldn't find the admin bit shown in the documentation to use the UI to load models, so I had to do that through the command line, but once added to the server I was able to select it in the web UI. In no time I had the web interface helping me review blog posts, suggest some rewrites (that completely changed the message within), and even iterate a little on a short story. Very much like the other gen AI tools out there, with a couple Docker commands and a little in-container configuration.
In my searches, and one of the things I was looking for, I noticed that there is a plug-in for my IDE that allows it to use my local Ollama installation for code completion and other AI help. There's even a plug-in for iTerm2 on my Mac to allow Ollama to help create shell commands.
The iTerm2 plugin took a little exploratory tinkering. The AI "tab" existed in the settings, but informed me I needed to add a plugin. There was a link to the plugin, which I quickly downloaded and put in my Applications folder. After restarting iTerm2, I was able to quickly get the configuration set up, but unlike the other tools, it didn't seem to be able to list the models available on the Ollama server. I eventually took a stab and just typed the label for the model into the configuration line, and then it worked. It looks like there might be other models that might be what the iTerm2 plugin authors intended to use, but for my tinkering, this works fine. I may add others, since it's simple enough.
Similarly, there's a plugin for some of the IDEs I use. I spend most of my time in the JetBrains IntelliJ IDEA, so that's what I decided to try first. I've tinkered with the JetBrains AI, but it has limits and costs, so I haven't spent the time to get really comfortable using it. It does fine interrogating my code from the editor and offering suggestions, especially for the boilerplate and repetitive code bits we all write so often, so I get why people want to use it. I found suggestions that the CodeGPT plugin is the one to use, with its easy configuration for using a locally hosted Ollama server. Here I banged my head trying to get it to work for far too long, and ended in a face-palm "MacOS is helping me too much" moment.
Already comfortable with reaching the Docker container URL from other parts of the same computer, I was frustrated to see that the plugin couldn't reach the server. I checked IP addresses and routes, host names and DNS, and all kinds of things that either shouldn't have been necessary or should have led me quickly to the solution. Notably, I was able to ping the server by IP and host name from my terminal (the aforementioned iTerm2), but I couldn't do the same from the terminal in IDEA. Jumping ahead, the face palm fix was to discover that MacOS has a setting that limits software access to local network resources. Curiously, by default apps can reach out to the Internet, but it doesn't seem to trust my network, or maybe it's trying to prevent things on the Mac from reaching things on the LAN. Once I bonked the toggle in the system settings, IDEA and the CodeGPT plugin could see the network nodes and Ollama server. I asked it to make a unit test for a simple method, and it made some simple tests based on the method's parameters, but not related to the code in the method. This is where the tinkering begins.
For my future self, or any other interested parties, here's the step-by-step bits I used to get here, without the head-banging or face-palming. Starting essentially from the examples in the documents.
# Get Ollama running in Docker - Docker configuration out of scope
# Make a local space volume for use by container to persist data
docker create volume ollama
# Run the container - all the basic defaults
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Confirm it's running - response should say "Ollama is running"
curl http://dockerhost.local:11434
# Add the llama3.2 LLMs
docker exec ollama ollama pull llama3.2
docker exec ollama ollama pull codellama:13b
# Confirm they exist - Should give a list including the pulled models
docker exec ollama ollama list
# Another volume for data storage
docker create volume open-webui
# Run the container
docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=http://dockerhost.local:11434 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
Note that I used my Docker host as the "dockerhost.local" in the examples above; I probably could have kept the Docker traffic on the Docker network between the server and web UI, but my traffic will be trivial, I'm sure. There are other flags that can be added, and there are warnings all over that there is no security on the containers, but this is a simple test, and since they're only accessible from my LAN, I'm a little less worried about security to my servers. I also get that there are more collective Docker ways to do things. I didn't do this on the command line, though, as I use Portainer to configure my Docker containers. I keep meaning to get a better cluster working, but it isn't like I'm made out of time...
After installing those things, I sent my browser to http://dockerhost.local:3000 and was rewarded with the web UI. I was compelled to create and admin account, and then to login using it. I could the chat with the Ollama engine. I asked it about itself, bantered about the weather, and asked it to create a short story. Together we added characters and plot moments to the story. I asked it to read a couple blog post to give feedback (evidently I'm courteous, conversational, and casual). I asked it to rewrite one on which it gave me some feedback for improvements, but it turned the post into more of a prose story, deviating from the "news" being shared within. It was a richer story, for sure, but not the story (or remembering) I intended.
For iTerm2, I hit the settings, discovered the link. I downloaded the plugin from the link, and restarted iTerm. Returning to the settings page let me add the same URL as the OLLAMA_BASE_URL above. I also edited the model tag to be "llama3.2" after finding it wouldn't hit the server to list the available model. After that, hitting command+y pops up a prompt window, in which I asked it "create a bash script to iterate a list of strings and echo each" as a simple example. It offered a simple for loop example that would do the job.
For IDEA, and after banging my head, check the System Settings under Privacy & Security in Local Network to be sure IDEA has LAN access. After starting IDEA, I hit the settings, found plugins, and searched for the CodeGPT. I found the right one and bonked install. After IDEA restarted, I returned to the settings and found CodeGPT in the Tools, selected Ollama as the desired provider. I entered the same URL as above and hit the Refresh Models button (after allowing LAN access) and selected the codellama model. I selected the "Enable code completion" checkbox, and applied the settings. I opened a project, activated the CodeGPT panel, selected a class in the editor, highlighted a method, and asked the CodeGPT to suggest a unit test. It wrote a bit of a story about the test, but offered a couple simple tests to pass in parameters and validate the response.
Now I have an AI running on my network, on a previously mostly idle server. I've chatted with it, and will poke at it more as I continue reading my "Prompt Engineering for Dummies" ebook and a couple other gen AI things I have. I'm also interested to see if I can give it "my voice," perhaps leveraging these many years of blog posts on this site, and some other writings I have, and maybe we'll make these posts a little more quickly and with more richness. I want to see if I can use it to reduce some of the code effort on my hobby projects, and provide some of the double-checking I enjoyed when I worked with others. Maybe I'll see if I can get an AI to spew out some other content.
The server does have room for a GPU. I've poked around and found a few I can afford in my hobby budget that are geared toward AI instead of graphics processing, so we'll see if I step that game up. I'm also thinking of throwing an M4 Mac Mini at it, to see if that's "as good" as a GPU-powered system, since the new Mac Mini with more RAM and storage costs about the same or less than just adding a big GPU to the other server.