Build an LLM from Scratch 3: Coding attention mechanisms

Información de descarga y detalles del video Build an LLM from Scratch 3: Coding attention mechanisms
Autor:
Sebastian RaschkaPublicado el:
11/3/2025Vistas:
5.5KDescripción:
This supplementary lecture explains how self‑attention, causal attention, and multi‑head attention work, building each component step by step. It begins with a simple self‑attention mechanism that has no trainable weights, then shows how to compute attention weights for each input token and implements a compact SelfAttention class. The video continues by applying a causal attention mask, adding dropout for additional masking, and creating a compact causal self‑attention class. Finally it covers stacking multiple single‑head layers and implements multi‑head attention via weight splits. Key timestamps for each section are: 00:00 – 3.3.1 A simple self‑attention mechanism without trainable weights; 41:01 – 3.3.2 Computing attention weights for all input tokens; 52:40 – 3.4.1 Computing the attention weights step by step; 1:12:33 – 3.4.2 Implementing a compact SelfAttention class; 1:21:00 – 3.5.1 Applying a causal attention mask; 1:32:33 – 3.5.2 Masking additional attention weights with dropout; 1:38:05 – 3.5.3 Implementing a compact causal self‑attention class; 1:46:55 – 3.6.1 Stacking multiple single‑head attention layers; 1:58:55 – 3.6.2 Implementing multi‑head attention with weight splits. Bonus materials such as efficient multi‑head attention implementations and PyTorch buffer explanations are available on GitHub.
Videos similares: Build an LLM from Scratch

A Toonsquid 2D Animation Tutorial

Atmospheric Effects in Premiere Pro: Beginner vs Pro Techniques

Визитка программиста

This NEW UNIQUE BELT enables POWERFUL TECH in 0.3! | Path of Exile 2: The Third Edict

Tutorial: Making a Mechanical Walking Creature in Blender

