跳轉到主要內容

Deep Thinking

What is Deep Thinking?

深度不夠,長度來湊

Question -----> LLM -----> thinking process answer

  • <think> ....... </think> -> Verification , Explore , Planning

Example: Alpha Go

  • Alpha Go 的思考過程是用 MCTS (Monte Carlo Tree Search)

Test Time Scaling

Build Reasoning LLM Method

你可以混著用

Chain of Thought (CoT)

  • Don't need to change the model
  1. Few-shot CoT
  2. Zero-shot CoT
  3. Long CoT
  4. Supervised CoT

給Model reasoning 工作流程

  • Don't need to change the model

How to explore?

用同一個問題問LLM很多次,他會給出不同的答案

How to choose the right answer?

  1. Majority Vote (Self-consistency)
  2. Confidence(used in CoT decoding)
  3. 加上 Verification

Parallel vs. Sequential vs. Parallel + Sequential

How to verify the step?

資訊

Imitation Learning and Reinforcement Learning Post-Training 的一種特例

Imitation Learning

  • Need to change the model
  • 教 model reasoning

Reasoning process data how to come from?

use LLM to generate reasoning process data 在對的answer 情況下去把readoning process 也視為是對的,並用於訓練

Reinforcement Learning