メインコンテンツにスキップ

Transformer

Reference

Sequence to Sequence (Seq2Seq)

  • Transformer is a type of Seq2Seq model.

Application

  1. Seq2Seq for multi label classification An object can belong to multiple classes.
  2. Seq2Seq for Syntatic Parsing
  3. Deep Learning for human language Processing
  4. Seq2Seq for Object Detection

Architecture

Encoder

  • RNN, CNN, and Self-Attention are all viable choices for model encoders.
  • Input: sequence of vectors
  • Output: sequence of vectors

Transformer Encoder Architecture

  • Simplified Transformer Encoder Architecture

Residual

Problem: Vanishing gradients in deep networks. Solution: Add skip connections to allow gradients to flow through the network.

Decoder

Decoders can be broadly categorized into two types: Autoregressive and Non-Autoregressive.

  • Autogressive decoder: one by one eating words and output

  • Non-Autoregressive decoder: all words at once

"Masked" Self-Attention

只考慮之前的,因為之後的根本沒產生。

Encoder-Decoder 連接處

Cross-Attention

情報

待完成

Training Tip

Teacher Forcing

  • 先標記 -> 計算 cross-entropy loss(one-hot vector vs real probability distribution)
  • Training 時直接強迫輸出正確答案

Copy Mechanism

  • Pointer Network

Guided Attention

  • Monotonic Attention, Location-aware Attention
  • 在speech Recognition,TTS 上可能很重要
  • In some tasks, input and output are monotonically aligned.