Analyses of Deep Learning (STATS 385)

Batch normalization

Accepts as input:

feature vector of size $W \times H \times D$
bias vector of size $D$
gain vector of size $D$

Outputs another feature vector of the same size. This layer operates on each channel of the feature vector independently. First, each channel is normalized to have a zero mean, unit variance and then it is multiplied by a gain and shifted by a bias. The purpose of this layer is to ease the optimization process.

Softmax layer

Takes the output of the classifier, applies exponent on the score assigned to each class and then normalizes the result to unit sum. The result can be interpreted as a vector of probabilities for the different classes.

back

Analyses of Deep Learning (STATS 385)

Stanford University, Fall 2019