Wei Xin Chan

Posts

Showing posts from August, 2025

Training an LLM (In a Nutshell!)

August 27, 2025

A large language model (LLM) learns how to reply to a conversation by learning how to predict the next token (akin to a word), given the preceding conversation. This is framed as a multi-class classification problem, where each token represents a different class. The LLM outputs the likelihood of each token in the vocabulary (i.e. set of all possible tokens) of being the next token (which represent the probability parameters of a categorical distribution). Prediction of the next token occurs by sampling from the categorical distribution of all possible tokens. After a token is predicted, it is used along with tokens from the preceding conversation to predict the next token. Training an LLM typically consists of three main stages (different LLMs may have different training schemes): Unsupervised pre-training Supervised fine-tuning Reinforcement learning Stages 2 and 3 are commonly termed collectively as the fine-tuning stage. Unsupervised pre-training In the unsupervised pre-tra...

Upcoming blog posts

August 27, 2025

Here is a list of topics that I plan to post about in the future! They will be part of the “In a nutshell” series introducing important topics on AI! [ x ] Training an LLM (In A Nutshell!) [ x ] DeepSeek: Multi-head latent attention! [ ] Deep Reinforcement Learning (In A Nutshell!) [ ] DeepSeek: Sparse mixture of experts [ ] AlphaFold 3: Triangle Attention (In A Nutshell!) [ ] Diffusion Models (In A Nutshell!) [ ] Large Multimodal Models (In A Nutshell!) [ ] Importance of tool calling in LLMs! Do let me know if you would like me to post about any topic!

Wei Xin Chan

Posts

Training an LLM (In a Nutshell!)

Upcoming blog posts

Popular posts from this blog

Training an LLM (In a Nutshell!)

Multi-head Latent Attention (In A Nutshell!)

Self-Attention and the Key-Value Cache (In A Nutshell!)