Build an LLM from Scratch 2: Working with text data

Download information and video details for Build an LLM from Scratch 2: Working with text data
Uploader:
Sebastian RaschkaPublished at:
3/2/2025Views:
10.5KDescription:
This supplementary video, part of the "Build an LLM from Scratch" series, walks through the text data preparation steps for training large language models, including tokenization, byte pair encoding, data loaders, and more. The video covers tokenizing text (00:00), converting tokens into token IDs (14:02), adding special context tokens (23:56), byte pair encoding (30:26), data sampling with a sliding window (44:00), creating token embeddings (1:07:10), and encoding word positions (1:15:45).
Similar videos: Build an LLM from Scratch

Package Your n8n Workflows Into Full Web Apps (Step-By-Step)

Building LLMs from the Ground Up: A 3-hour Coding Workshop

ESP32 - CMake with ESP-IDF Tutorial

Part 7: Prediction Sense | "Alien: Isolation" Smart AI in UE5

Compositing in After Effects - Advanced Explosions Tutorial!

