Training an LLM (In a Nutshell!)
A large language model (LLM) learns how to reply to a conversation by learning how to predict the next token (akin to a word), given the preceding conversation. This is framed as a multi-class classification problem, where each token represents a different class. The LLM outputs the likelihood of each token in the vocabulary (i.e. set of all possible tokens) of being the next token (which represent the probability parameters of a categorical distribution). Prediction of the next token occurs by sampling from the categorical distribution of all possible tokens. After a token is predicted, it is used along with tokens from the preceding conversation to predict the next token. Training an LLM typically consists of three main stages (different LLMs may have different training schemes): Unsupervised pre-training Supervised fine-tuning Reinforcement learning Stages 2 and 3 are commonly termed collectively as the fine-tuning stage. Unsupervised pre-training In the unsupervised pre-tra...