Analyses of Deep Learning (STATS 385)

Sigmoid

The sigmoid, defined as $f(x) = \frac{1}{1 + e^{-x}}$ , is a non-linear function that suffers from saturation.

Saturation of activation

An activation that has an almost zero gradient at certain regions. This is an undesirable property since it results in slow learning.
image source

Tanh

This non-linearity squashes a real-valued number to the range $[-1, 1]$ . Like the sigmoid neuron, its activations saturate, but unlike the sigmoid neuron its output is zero-centered.
image source

ReLu

The most popular non-linearity in modern deep learning, partly due to its non-saturating nature, defined as $f(x) = \max(x,0)$ .
image source

Dead filter

A filter which always results in negative values that are mapped by ReLU to zero, no matter what the input is. This causes backpropagation to never update the filter and eventually, due to weight decay, it becomes zero and "dies".

Leaky ReLu

A possible fix to the dead filter problem is to define ReLU with a small slope in the negative part, i.e., $f(x) = \left\{\begin{array}{lr} ax, & \text{for } x<0\\ x, & x \geq 0 \end{array}\right\}$ .
image source

back

Analyses of Deep Learning (STATS 385)

Stanford University, Fall 2019