Build an LLM from Scratch 2: Working with text data

Build an LLM from Scratch 2: Working with text data01:28:01

Download information and video details for Build an LLM from Scratch 2: Working with text data

Uploader:

Sebastian Raschka

Published at:

3/2/2025

Views:

10.5K

Description:

This supplementary video, part of the "Build an LLM from Scratch" series, walks through the text data preparation steps for training large language models, including tokenization, byte pair encoding, data loaders, and more. The video covers tokenizing text (00:00), converting tokens into token IDs (14:02), adding special context tokens (23:56), byte pair encoding (30:26), data sampling with a sliding window (44:00), creating token embeddings (1:07:10), and encoding word positions (1:15:45).