# Hands-On Large Language Models by [[Jay Alammar]], [[Maarten Grootendorst]] ## Summary <!-- a couple of paragraphs --> *Hands-On Large Language Models* by Jay Alammar and Maarten Grootendorst is a visual, intuition-first introduction to how LLMs actually work under the hood. Alammar is well known for his illustrated explanations (such as "The Illustrated Transformer"), and the book carries that pedagogical style into a full treatment of tokenization, embeddings, the attention mechanism, and the transformer architecture. It pairs each concept with runnable code so the reader can inspect tokens, embeddings, and attention patterns directly. Beyond mechanics, the book covers practical applications: text classification, clustering, semantic search, and generation, building toward retrieval and prompt-driven use. Its strength is bridging the gap between a conceptual understanding of transformers and the concrete behavior of real models — how text becomes tokens, how tokens become vectors, and how the model decodes the next token. This makes it an ideal grounding for the internals a practitioner needs to reason about model behavior. ## Table of Contents - Ch. 1 — An Introduction to Large Language Models - Ch. 2 — Tokens and Embeddings - Ch. 3 — Looking Inside Transformer LLMs (attention, architecture) - Ch. 4 — Text Classification - Ch. 5 — Text Clustering and Topic Modeling - Ch. 6 — Prompt Engineering - Ch. 7 — Advanced Text Generation Techniques and Tools - Ch. 8 — Semantic Search and Retrieval-Augmented Generation - Ch. 9 — Multimodal Large Language Models - Ch. 12 — Fine-Tuning Generation Models ## Notes <!-- main takeaways; LINK to the permanent notes this book grounds --> - Grounds [[Tokenization]] and the role of the [[Token]] as the model's atomic unit. - Backs [[Embedding]] — how tokens and text are represented as vectors. - Primary visual grounding for the [[Attention Mechanism]] and [[Transformer Architecture]]. - Supports [[KV Cache]] as an inference-time optimization of attention. - Grounds [[Positional Encoding]] (how order is injected into the transformer). - Underpins [[Decoding Strategy]] — how the next token is selected during generation. ## Quotes - <!-- placeholder: add a verified short quote here --> ## Relevance to the course - Primary grounding for Module 1 — the internals of how LLMs represent and process text (tokens, embeddings, attention, transformers). --- ## References -