Fine-tuning neural networks with LoRA (Low-Rank Adaptation)
Low-rank adaptation (LoRA) is a popular method of fine-tuning neural networks that trains a relatively small number of parameters while achieving comparable performance to full fine-tuning of all parameters ( Hu et al., 2021 ). Instead of fine-tuning the original weights from the base model, an additional set of weights (that are much lesser in number) are trained instead. This reduces computational cost, allowing for cheaper and faster fine-tuning. Figure 1: Illustration of low-rank adaptation (LoRA) during training (left) and inference (right). LoRA works by using two low-rank matrices, the down-projection matrix \( \mathbf{A} \in \mathbb{R}^{d \times r} \) and the up-projection matrix \( \mathbf{B} \in \mathbb{R}^{r \times d} \), to represent a original weight matrix from the base model \( \mathbf{W} \in \mathbb{R}^{d \times d} \) (Figure 1). \(\mathbf{W}\) represents the connections between two layers in a neural network . In LoRA, \(2dr\) parameters are fine-tuned, as opposed ...