Pretrain Alignment
Train LLM three steps
Pre-train -> Supervised Fine-tuning -> RLHF
- Alignment = Supervised Fine-tuning + RLHF = Fine-tuning
Alignment
Models Naming Tips
- Models with base mean only pre-train e.g. Llama-2-7b-base
- Models with chat , instruct mean with alignment e.g. Llama-2-7b-chat
Alignment using Datasets
- Paper:
- Don't need to be large, but need to be high quality
Knowledge Distillation
- Paper (Choose answer from teacher model method)
Alignment before and after
- Paper:
Alignment different methods
- Response Tuning
- Rule-base adapter
Self-Alignment
Pretrain
Efficient Pretrain
DataSet
DataSet Quality 重要性
說明
Textbook 是 chatGPT 生成的 may be effected the result
Rephrasing the Web
- Paper: Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
- 在有限算力,固定模型下應該看更多不同的資料
- Paper: Scaling Data-Constrained Language Models
- Paper: Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget