Part I — LLM Fundamentals

This is the original 18-chapter walkthrough. By the end you have built a working mygpt Python package + CLI that trains a tiny GPT-2-level character-level language model from scratch in PyTorch and generates text. Every line of code is introduced and explained; every “Expected output” block is captured from a real run.


Audience and prerequisites

This part is written for university students. It assumes:

  • working Python knowledge (functions, classes, modules, virtual environments),
  • basic linear algebra (vectors, matrices, matrix multiplication, dot products),
  • basic calculus (derivatives, partial derivatives, the chain rule),
  • basic probability (mean, variance, probability distributions),
  • no prior machine learning knowledge.

You do not need to know what gradient descent is, what an embedding is, or what a transformer is. We build all of that from scratch.

Hardware: CPU is enough. A typical training run finishes in under one minute.


How to read this part

Each chapter follows the same shape:

  1. Concept — the theory and intuition, with diagrams and worked examples.
  2. Math — the equations that make the concept precise, written in $\LaTeX$.
  3. Code — the exact files to create, with their full content.
  4. Run — the uv command to execute it and what to expect in the output.
  5. Experiments — small variations you can run to see the concept move.
  6. Exercises — problems you solve to deepen understanding.

Always run code from the project root. The package layout we choose makes that the only place where imports work cleanly.


Chapters

Foundations

  1. What is a language model?
  2. Project setup with uv
  3. PyTorch in 20 minutes: tensors, autograd, modules
  4. How machines learn: loss, gradients, gradient descent
  5. From text to numbers: tokens and embeddings

The attention mechanism

  1. Single-head self-attention from scratch
  2. A reusable attention module
  3. Multi-head attention

The transformer

  1. The feed-forward network and residual connections
  2. Layer normalization
  3. Putting it together: the transformer block

The GPT model

  1. Position embeddings and the language modeling head
  2. The forward pass with loss
  3. Training loop: gradient descent in practice
  4. Generation: sampling text from a trained model

A real workflow

  1. A reusable character tokenizer
  2. Training on a real text file
  3. Checkpoints, inference, and a CLI

Ready? Start with Chapter 1: What is a language model?. When you finish Chapter 18, continue with Part II — Advanced Topics.

Stuck on a chapter? Each chapter has a chapter_states/chNN/ snapshot — a complete, runnable uv package matching its end-state. If you get lost partway through Ch.5 (say), cp -r chapter_states/ch04/ <your-working-dir> to start from a known-good state. See chapter_states/README.md on GitHub.


Table of contents