Post-Training & Forgetting
Terminology definition
LLM = foundation model ---(Post-Training = continual training = Alignment )----> Fine-tuned Model
- Foundation Model -> can be chat/instruct model or base/Pretrain model
Methods
- Pre-train style
- SFT Style
- RL style
Catastrophic Forgetting
- 這個是post training 最大的挑戰,他會忘記已有的技能.
Solutions of Catastrophic Forgetting
- Experience Replay
- Example:
- GPT-2 --(task 1 training Data ) ----> GPT-2 ----(task 2 training Data + a little bit task 1 training Data )----> GPT-2
- Paper: Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
- Example:
- Pseudo Experience Replay
- if 找不到訓練資料 -> 讓 GPT-2 自己產生資料
- 讓 LLaMa 自問自答 MMJ-Bench : A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models
- Paraphrase
- 用自己的話換句話說
- Input: new
- Output: new --> Foundation Model (換句話說) --> Old
- Self-output
- Input: new -> foundation Model -> output: old
- Output: new
- Paper: Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
info
可能RL-based post training ( less forgetting? )
- Note: 也可以用別的model 去post training 不一定要foundation model
- Paper: I Learn Better If You Speak My Language