The sigmoid, defined as , is a non-linear function that suffers from saturation.
Saturation of activation
An activation that has an almost zero gradient at certain regions.
This is an undesirable property since it results in slow learning.
This non-linearity squashes a real-valued number to the range
Like the sigmoid neuron, its activations saturate, but unlike the sigmoid neuron its output is zero-centered.
The most popular non-linearity in modern deep learning, partly due to its non-saturating nature, defined as
A filter which always results in negative values that are mapped by ReLU to zero, no matter what the input is. This causes backpropagation to never update the filter and eventually, due to weight decay, it becomes zero and "dies".
A possible fix to the dead filter problem is to define ReLU with a small slope in the negative part, i.e.,