Part I — LLM Fundamentals
This is the original 18-chapter walkthrough. By the end you have built a working mygpt Python package + CLI that trains a tiny GPT-2-level character-level language model from scratch in PyTorch and generates text. Every line of code is introduced and explained; every “Expected output” block is captured from a real run.
Audience and prerequisites
This part is written for university students. It assumes:
- working Python knowledge (functions, classes, modules, virtual environments),
- basic linear algebra (vectors, matrices, matrix multiplication, dot products),
- basic calculus (derivatives, partial derivatives, the chain rule),
- basic probability (mean, variance, probability distributions),
- no prior machine learning knowledge.
You do not need to know what gradient descent is, what an embedding is, or what a transformer is. We build all of that from scratch.
Hardware: CPU is enough. A typical training run finishes in under one minute.
How to read this part
Each chapter follows the same shape:
- Concept — the theory and intuition, with diagrams and worked examples.
- Math — the equations that make the concept precise, written in $\LaTeX$.
- Code — the exact files to create, with their full content.
- Run — the
uvcommand to execute it and what to expect in the output. - Experiments — small variations you can run to see the concept move.
- Exercises — problems you solve to deepen understanding.
Always run code from the project root. The package layout we choose makes that the only place where imports work cleanly.
Chapters
Foundations
- What is a language model?
- Project setup with
uv - PyTorch in 20 minutes: tensors, autograd, modules
- How machines learn: loss, gradients, gradient descent
- From text to numbers: tokens and embeddings
The attention mechanism
The transformer
- The feed-forward network and residual connections
- Layer normalization
- Putting it together: the transformer block
The GPT model
- Position embeddings and the language modeling head
- The forward pass with loss
- Training loop: gradient descent in practice
- Generation: sampling text from a trained model
A real workflow
Ready? Start with Chapter 1: What is a language model?. When you finish Chapter 18, continue with Part II — Advanced Topics.
Stuck on a chapter? Each chapter has a
chapter_states/chNN/snapshot — a complete, runnableuvpackage matching its end-state. If you get lost partway through Ch.5 (say),cp -r chapter_states/ch04/ <your-working-dir>to start from a known-good state. Seechapter_states/README.mdon GitHub.
Table of contents
- 1. What is a language model?
- 2. Project setup with uv
- 3. PyTorch in 20 minutes
- 4. How machines learn
- 5. From text to numbers
- 6. Single-head self-attention
- 7. A reusable attention module
- 8. Multi-head attention
- 9. MLP and residual connections
- 10. Layer normalization
- 11. The transformer block
- 12. Position embeddings and the LM head
- 13. The forward pass with loss
- 14. Training loop
- 15. Generation
- 16. A reusable character tokenizer
- 17. Training on a real text file
- 18. Checkpoints, inference, and a CLI