Building a language model from scratch, from a tutorial

I’m working from Brian Kitano’s Llama from scratch (or how to implement a paper without crying). It’s very deep, so I probably won’t make it all the way through the long-weekend I’ve allocated for it.

I’ve skimmed the paper, but didn’t pay extremely close attention to the implementation details. But the tutorial provides breadcrumbs into the deeply math-y bits. No problems here.

I noticed that there are Ruby bindings for most of these ML libraries and was tempted to try implementing it in the language I love. But I would rather not get mired in looking up docs or translating across languages/APIs. And, I want to get more familiar with Python (after almost twenty years of not using it).

I started off trying to implement this like a Linux veteran would, as a basic CLI program. Nonetheless I switched over to Jupyter as it looks like part of building models is analyzing plots and that’s not going to go well on a CLI. And, so I’m not swimming upstream so much.

Per an idea from Making Large Language Models work for you, I’m frequently using ChatGPT to quickly understand the PyTorch neural network APIs in context. Normally, I’d go on a time-consuming side-quest getting up to speed on an unfamiliar ecosystem. ChatGPT is reducing that from hours and possibly a blocker to a few minutes. Highly recommend reading those slides and trying a few of the ideas in your daily work.