Create a Large Language Model from Scratch with Python – Tutorial

Información de descarga y detalles del video Create a Large Language Model from Scratch with Python – Tutorial
Autor:
freeCodeCamp.orgPublicado el:
25/8/2023Vistas:
1.1MDescripción:
At 0:00:00 the instructor introduces the course, outlining the goal of building a large language model from scratch using Python. At 0:03:25 they install the necessary libraries, then at 0:06:24 they build Pylzma tools. At 0:08:58 the instructor sets up a Jupyter Notebook, and at 0:12:11 they download the Wizard of Oz dataset. At 0:14:51 they experiment with a text file, and at 0:17:58 they create a character‑level tokenizer. At 0:19:44 they discuss different types of tokenizers, and at 0:20:58 they switch from arrays to tensors. At 0:22:37 they cover linear algebra basics, then at 0:23:29 they split the data into training and validation sets. At 0:25:30 they explain the premise of a bigram model, and at 0:26:41 they define inputs and targets. At 0:29:29 they implement the inputs and targets, and at 0:30:10 they discuss batch‑size hyperparameters. At 0:32:13 they switch from CPU to CUDA, and at 0:33:28 they give an overview of PyTorch. At 0:42:49 they compare CPU versus GPU performance in PyTorch, then at 0:47:49 they cover additional PyTorch functions. At 1:06:03 they discuss embedding vectors, and at 1:11:33 they implement embeddings. At 1:13:06 they explain dot product and matrix multiplication, then at 1:25:42 they implement matmul. At 1:26:56 they discuss integer versus float types, and at 1:29:52 they recap and show the get_batch function. At 1:35:07 they subclass nn.Module, and at 1:37:05 they cover gradient descent. At 1:50:53 they discuss logits and reshaping, then at 1:59:28 they demonstrate a generate function and give the model some context. At 2:03:58 they discuss logits dimensionality, and at 2:05:17 they run a training loop with an optimizer and zero‑grad explanation. At 2:13:56 they give an overview of optimizers, then at 2:17:04 they discuss applications of optimizers. At 2:18:11 they report loss and switch between train and eval mode. At 2:32:54 they overview normalization, and at 2:35:45 they cover ReLU, Sigmoid, and Tanh activations. At 2:45:15 they introduce transformers and self‑attention, then at 2:46:55 they discuss transformer architecture. At 3:17:54 they build a GPT model, not a generic transformer, and at 3:19:46 they dive deep into self‑attention. At 3:25:05 they cover GPT architecture, then at 3:27:07 they switch to a MacBook. At 3:31:42 they implement positional encoding, and at 3:36:57 they initialize the GPTLanguageModel. At 3:40:52 they perform a forward pass of the GPTLanguageModel, then at 3:46:56 they discuss standard deviation for model parameters. At 4:00:50 they cover transformer blocks, at 4:04:54 they cover the feed‑forward network, and at 4:07:53 they cover multi‑head attention. At 4:12:49 they discuss dot‑product attention, then at 4:19:43 they explain scaling by 1/sqrt(dk). At 4:26:45 they compare sequential versus ModuleList processing, and at 4:30:47 they overview hyperparameters. At 4:32:14 they fix errors and refine the model, then at 4:34:01 they begin training. At 4:35:46 they download OpenWebText and survey a paper on large language models, and at 4:37:56 they discuss changes to the dataloader and batch getter. At 4:41:20 they extract the corpus with WinRAR, then at 4:43:44 they write a Python data extractor. At 4:49:23 they adjust the train and validation splits, and at 4:57:55 they add a dataloader. At 4:59:04 they train on OpenWebText, and at 5:02:22 they show that training works well, with model loading and saving. At 5:04:18 they discuss pickling, then at 5:05:32 they fix errors and manage GPU memory in the task manager. At 5:14:05 they parse command‑line arguments, and at 5:18:11 they port the code to a script. At 5:22:04 they demonstrate a prompt completion feature and address more errors. At 5:24:23 they cover nn.Module inheritance and generation cropping, then at 5:27:54 they discuss pre‑training versus fine‑tuning. At 5:33:07 they give R&D pointers, and at 5:44:38 they conclude the tutorial.




![Python for Beginners – Full Course [Programming Tutorial]](https://videodownloadbot.com/images/video/6fc/1145wrxztijlkjup5tix512h8l7dax66_medium.jpeg)

