Fully Fine-Tuning 대체하는 LoRA

#Fine-Tuning #DeepLearning #LoRA #arXiv #LLM #LLaMA #GPT #Stable Diffusion

kiio

2025년 5월 19일 — 4 min read

LoRA 소개 전에 나오게 된 이유에 대해서 설명한다.
Fully Fine-Tuning 기법을 사용하는 LLM 모델은 굉장히 무겁다.
LLM의 weight 최소한 1B 요구하고 모델을 로드 하는 것만으로 비싼 GPU 칩을 사용해야한다.

Fine-Tuning 학습 또한 Forward Backward 기능에는 이전 기록이 필요하고 GPU 메모리에도 저장해 무게가 무거워지고 데이터가 많아진다(weight * 최소2배 VRAM 요구).

그래서 많은 업체들이 weight 수가 많은 LLM 모델들은 Fully Fine-Tunning 하지 않고 대체하기 위한 시도가 많았다. 시도한 학습 중 LoRA 방법이 탄생하게 되었다.

LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

arXiv.orgEdward J. Hu

LoRA 소개

LoRA 의 줄임 말은 Low-Rank Adaptation 이름이다.
대형 언어나 비전 모델을 효율적인 미세 조정을 하기위한 기법으로 모델 파라미터를 학습 대신 파라미터 수가 적은 저랭크(Row-Rank) 행렬만 학습하도록 하여 계산 자원과 저장 공간을 절약한다.

사용 분야

대형 언어 모델(LLM) 미세 조정: LLaMA, GPT
이미지 생성 모델: Stable Diffusion TTI 미세 조정
모바일 임베디드 모델 최적화: 경량 학습 파라미터로 적용

LoRA 학습 레이어

LoAR 학습 레이어는 모델의 파라미터가 아닌 특정 Linear Layer(선형층) 저랭크 행렬을 삽입하는 부분이다. 기존 모델의 weight를 Freeze 고정하고 선형층에서만 학습 대상이 된다.

그림의 파랑 부분은 Freeze 되어 기존 가중치 행렬 W를 변경하지 않는다.
두 개의 저랭크 행렬 A와 B 학습한다.
A는 정규 분포로 초기화하고 B는 0으로 초기화 되어 두 행렬만 학습하여 최종 출력은 h = Wx + BAx 와 같이 계산된다.

요약

W: 학습에 사용된 가중치 (LoRA 에서는 다시 학습하지 않는다.)
A,B: LoRA의 학습 대상
BAx: LoRA 보정 값
Wx + BAx: 최종 출력 h 값

더 읽어볼 내용

Embed HTML Video Tag

Play Video <video src="https://kiioio.com/content/me ...

Java Generic - 도입 전 Object 다형성 사용하기

이전에 NumberBox, StringBox 객체를 만들어 저장하고 꺼내는 클래스를 생성하였다. 개발자에게 추가로 ...

Java Generic - 필요성

대부분의 프로그래밍 언어에서 제네릭(Generic) 개념을 도입하였다. 처음 제네릭을 접한 경우 이해하기가 굉장히 ...

언리얼 엔진 - 레플리케이션

언리얼 엔진은 레플리케이션 기능을 제공하고 있다. 레플리케이션 기능은 서버에 있는 정보를 클라이언트에게 전달하는 ...