Batch normalization

Accepts as input:

Outputs another feature vector of the same size. This layer operates on each channel of the feature vector independently. First, each channel is normalized to have a zero mean, unit variance and then it is multiplied by a gain and shifted by a bias. The purpose of this layer is to ease the optimization process.

Softmax layer

Takes the output of the classifier, applies exponent on the score assigned to each class and then normalizes the result to unit sum. The result can be interpreted as a vector of probabilities for the different classes.