Tuesday, June 13, 2023

A Local Large Language Model

I wanted to see what all the hype around Large Language Models (LLMs) amounted to, but the privacy terms in the EULA for Bard were completely unacceptable. So when I read Dylan Patel and Afzal Ahmad's We Have No Moat: And neither does OpenAI:
Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.
I decided to try open source LLMs for myself, since as they run locally the privacy risk is mitigated. Below the fold I tell the story so far; I will update it as I make progress.

DuckDuckGo took me to How to Run a ChatGPT-like AI on Your Own PC by Nick Lewis about Alpaca:
Alpaca is a language model (a chatbot, basically), much like ChatGPT. It is capable of answering questions, reasoning, telling jokes, and just about every other thing we’ve come to expect from chatbots. Alpaca was created by Stanford researchers by fine-tuning Facebook’s LLaMA.

Unlike ChatGPT, and most other chatbots available today, Alpaca runs completely on your own PC. That means that no one can snoop on your conversations or what you ask Alpaca, and your exchanges can’t be accidentally leaked, either.
Lewis explains that:
There are three main variants of Alpaca currently, 7B, 13B, and 30B. Generally speaking, the larger the number, the smarter the chatbot will be.
...
The 13B and 30B models are quite another story. 13B is capable of providing a coherent, human-like conversation, and can answer complex questions. 30B is even more impressive, if you’ve got the hardware to run it, and is within striking distance of ChatGPT. It’ll wax on philosophically or make a joke without missing a beat if prompted.
To run the 30B model takes an i7 CPU and 32GB of RAM. I wanted to run Alpaca as a service on our network, not on my desktop (which in any case has only 16GB of RAM). So my first step was to look on Amazon for a suitable machine. I ordered a Minisforum TH80 which had an i7-1180H CPU, 32GB of RAM and a 512GB PCIe SSD for $410.88 including tax.

It showed up the next morning and, after a bit of a struggle persuading it to boot from a USB drive (select Legacy boot for the USB then UEFI for the hard disk), I installed Linux Mint 21.1 with no further problems. Lewis' post describes installing Alpaca on a Windows machine via Windows Subsystem for Linux, so I had to adapt the instructions slightly to install Docker:

sudo apt install apt-transport-https
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /usr/share/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) \
  signed-by=/usr/share/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu jammy stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io \
  docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker ${USER}
sudo reboot

And then install Alpaca and Serge, its web front-end:

git clone https://github.com/nsarrazin/serge.git

Finally, run Serge:

cd serge && docker run -d -v weights:/usr/src/app/weights \
  -v datadb:/data/db/ -p 8008:8008 ghcr.io/nsarrazin/serge:latest

I used Serge to download the 30B model, and here is my first conversation:
Me: Tell me about Bitcoin
Serge: Bitcoin is a digital currency and payment system created in 2009 by Satoshi Nakamoto, which uses cryptography to secure transactions between users without requiring any third party intermediaries such as banks or governments. It allows for the transfer of value from one user to another with no middleman involved. Bitcoin is decentralized and its network is maintained through a peer-to-peer protocol that verifies each transaction using cryptography, making it secure against fraudulent transactions.
Me: Is Bitcoin a good investment?
Serge: It depends on your risk tolerance as well as the current market conditions and trends. Generally speaking though, bitcoins are volatile in nature so they can be risky to hold for long periods of time without proper research or analysis.
This isn't bad. The first and second sentences repeat the same (correct) assertion. The assertion that Bitcoin is decentralized is false but understandable, since almost all of the training data would have contained it. The assertion that it is "secure against fraudulent transactions" is again false but understandable. The caution about Bitcoin's volatility is apt.

As I currently have a granddaughter away at summer camp on the East Coast, I then asked Serge:
Me: write a letter from a grandfather to his granddaughter at summer camp
Serge:Dear Granddaughter,
I hope you are having fun and making lots of new friends this summer! I am so proud that you decided to go away for the first time. It’s an important step in growing up and becoming independent. Keep exploring your interests at camp - it will help shape who you become as a person.
Love, Grandpa
Again, not bad, an encouraging start.

On the TH80 Alpaca is fairly slow. I downloaded the 13B model and tried it; it is a good deal faster to generate noticeably worse results, but I am not in a hurry.

Overall, I am definitely impressed with what can be done with $400-worth of hardware.

3 comments:

  1. OK, I'll bite. I just ordered a 32GB RAM upgrade for my gaming PC. (However it seems that my GPU will be unused, unless I want to fine-tune the models.)

    ReplyDelete
  2. Alas.... I upgraded my laptop to 32GB, installed WSL2, and followed the instructions in Nick's March blog piece. The problem is that a project like Serge is so volatile that three months later everything has changed.... Tried following your instructions inside WSL2, and they didn't work. But per https://news.ycombinator.com/item?id=35672652 , this is normal:
    Meanwhile in this world it's just: "Get the latest version of this thing. It won't work, of course, because it only worked for a week in 2023 when I did it, but good luck!"

    ReplyDelete
  3. OK, I solved my problem by installing Alpaca Electron. This is a turnkey CPP-based version of Alpaca that has installers for Windows, Mac, and Linux. Running the 7b model on Windows, it takes about 24GB of RAM, and performs remarkably well. See
    https://github.com/ItsPi3141/alpaca-electron/releases

    ReplyDelete