Build an LLM from Scratch 3: Coding attention mechanisms

02:15:40

Información de descarga y detalles del video Build an LLM from Scratch 3: Coding attention mechanisms

Autor:

Sebastian Raschka

Publicado el:

11/3/2025

Vistas:

5.5K

Descripción:

This supplementary lecture explains how self‑attention, causal attention, and multi‑head attention work, building each component step by step. It begins with a simple self‑attention mechanism that has no trainable weights, then shows how to compute attention weights for each input token and implements a compact SelfAttention class. The video continues by applying a causal attention mask, adding dropout for additional masking, and creating a compact causal self‑attention class. Finally it covers stacking multiple single‑head layers and implements multi‑head attention via weight splits. Key timestamps for each section are: 00:00 – 3.3.1 A simple self‑attention mechanism without trainable weights; 41:01 – 3.3.2 Computing attention weights for all input tokens; 52:40 – 3.4.1 Computing the attention weights step by step; 1:12:33 – 3.4.2 Implementing a compact SelfAttention class; 1:21:00 – 3.5.1 Applying a causal attention mask; 1:32:33 – 3.5.2 Masking additional attention weights with dropout; 1:38:05 – 3.5.3 Implementing a compact causal self‑attention class; 1:46:55 – 3.6.1 Stacking multiple single‑head attention layers; 1:58:55 – 3.6.2 Implementing multi‑head attention with weight splits. Bonus materials such as efficient multi‑head attention implementations and PyTorch buffer explanations are available on GitHub.

Descargar Video