Build an LLM from Scratch 3: Coding attention mechanisms

Інформація про завантаження та деталі відео Build an LLM from Scratch 3: Coding attention mechanisms
Автор:
Sebastian RaschkaДата публікації:
11.03.2025Переглядів:
5.5KОпис:
This supplementary lecture explains how self‑attention, causal attention, and multi‑head attention work, building each component step by step. It begins with a simple self‑attention mechanism that has no trainable weights, then shows how to compute attention weights for each input token and implements a compact SelfAttention class. The video continues by applying a causal attention mask, adding dropout for additional masking, and creating a compact causal self‑attention class. Finally it covers stacking multiple single‑head layers and implements multi‑head attention via weight splits. Key timestamps for each section are: 00:00 – 3.3.1 A simple self‑attention mechanism without trainable weights; 41:01 – 3.3.2 Computing attention weights for all input tokens; 52:40 – 3.4.1 Computing the attention weights step by step; 1:12:33 – 3.4.2 Implementing a compact SelfAttention class; 1:21:00 – 3.5.1 Applying a causal attention mask; 1:32:33 – 3.5.2 Masking additional attention weights with dropout; 1:38:05 – 3.5.3 Implementing a compact causal self‑attention class; 1:46:55 – 3.6.1 Stacking multiple single‑head attention layers; 1:58:55 – 3.6.2 Implementing multi‑head attention with weight splits. Bonus materials such as efficient multi‑head attention implementations and PyTorch buffer explanations are available on GitHub.
Схожі відео: Build an LLM from Scratch

Підручник з 2D-анімації в Toonsquid

Атмосферні ефекти в Premiere Pro: методи для початківців та професіоналів

Визитка программиста

This NEW UNIQUE BELT enables POWERFUL TECH in 0.3! | Path of Exile 2: The Third Edict

Tutorial: Making a Mechanical Walking Creature in Blender

