Part II — Advanced Topics

Ten chapters that upgrade mygpt from a hand-rolled toy to a code base whose architecture and training recipe match real modern open-weight LLMs (Llama-style: BPE + RoPE + RMSNorm + GQA + cosine LR + bf16). Same model size as Part I; modern recipe.

Prerequisites

Part I — LLM Fundamentals finished. An Apple M1 / M2 / M3 / M4 Mac (any RAM tier; 8 GB works), or a CUDA GPU, or willingness to wait on CPU. ~10 GB free disk for the Chapter 28 Wikipedia subset.

What changes

Ch.19–21 — training infrastructure: device-aware (MPS/CUDA/CPU), mixed precision (bf16), validation loss + cosine LR schedule + gradient clipping.
Ch.22–23 — BPE tokenization: build the algorithm from scratch, then wire it into mygpt alongside the existing CharTokenizer.
Ch.24–26 — modern architecture: replace LayerNorm with RMSNorm, learned position embeddings with RoPE, multi-head attention with GQA.
Ch.27–28 — payoff: same training run as Part I but with the modern stack (Ch.27); then a real ~500 MB Wikipedia training run on M1 in 1–3 hours (Ch.28).

Backward compatibility: every Part-I checkpoint continues to load. Architecture flags (--norm, --position, --num-kv-heads, --tokenizer) coexist with Part-I defaults; Part-II checkpoints record which combination was used.

Chapters

Stuck on a chapter? Each chapter has a chapter_states/chNN/ snapshot — a complete, runnable uv package matching its end-state. If you get lost partway through Ch.25 (say), cp -r chapter_states/ch24/ <your-working-dir> to start from a known-good state. See chapter_states/README.md on GitHub.

Part II — Advanced Topics

Prerequisites

What changes

Chapters

Table of contents