Post-Training & Forgetting

Terminology definition

LLM = foundation model ---(Post-Training = continual training = Alignment )----> Fine-tuned Model

Experience Replay
- Example:
  - GPT-2 --(task 1 training Data ) ----> GPT-2 ----(task 2 training Data + a little bit task 1 training Data )----> GPT-2
  - Paper: Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
Pseudo Experience Replay
- if 找不到訓練資料 -> 讓 GPT-2 自己產生資料
- 讓 LLaMa 自問自答 MMJ-Bench : A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models
Paraphrase
- 用自己的話換句話說
- Input: new
- Output: new --> Foundation Model (換句話說) --> Old
Self-output
- Input: new -> foundation Model -> output: old
- Output: new
- Paper: Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

info

可能RL-based post training ( less forgetting? )