Skip to main content

Transformer

Reference

Transformer Paper - Attention is all you need

Sequence to Sequence (Seq2Seq)

Transformer is a type of Seq2Seq model.

Application

Seq2Seq for multi label classification An object can belong to multiple classes.
Seq2Seq for Syntatic Parsing
Deep Learning for human language Processing
Seq2Seq for Object Detection

Architecture

Encoder

RNN, CNN, and Self-Attention are all viable choices for model encoders.
Input: sequence of vectors
Output: sequence of vectors

Transformer Encoder Architecture

Simplified Transformer Encoder Architecture

Residual

Problem: Vanishing gradients in deep networks. Solution: Add skip connections to allow gradients to flow through the network.

Decoder

Decoders can be broadly categorized into two types: Autoregressive and Non-Autoregressive.

Autogressive decoder: one by one eating words and output
Non-Autoregressive decoder: all words at once

"Masked" Self-Attention

只考慮之前的，因為之後的根本沒產生。

Encoder-Decoder 連接處

Cross-Attention

info

待完成

Training Tip

Teacher Forcing

先標記 -> 計算 cross-entropy loss(one-hot vector vs real probability distribution)
Training 時直接強迫輸出正確答案

Copy Mechanism

Pointer Network

Guided Attention

Monotonic Attention, Location-aware Attention
在speech Recognition,TTS 上可能很重要
In some tasks, input and output are monotonically aligned.

Reference
Sequence to Sequence (Seq2Seq)
- Application
- Architecture
Encoder
- Transformer Encoder Architecture
- Residual
Decoder
- "Masked" Self-Attention
Encoder-Decoder 連接處
- Cross-Attention
Training Tip