There are different types of activation functions used in neural networks for feature extraction. The following article gives a brief overview of the most commonly used activation functions:
The identity function is used as an activation function mostly for the input layers in a neural network. It is a linear function of the form
In this case, the output will be the same as the input. Linear functions are not used in neural networks because then it becomes the same as a regression model and then the neural networks fail to “learn” features.
Step/threshold is a commonly used activation function. A step function gives 1 as output if the input is either 0 or positive. If the input is negative, the step function gives 0 as output.
The threshold function is almost like a step function, with the only difference being that instead of 0 as a threshold an arbitrary number Ø is used.
ReLU is the most popular activation function in convolutional neural networks and deep learning.
R(z) is zero when z is less than zero and R(z) is equal to z when z is above or equal to zero. This function is not completely differentiable. It is differentiable except at a single point z=0. In this sense, the derivative of a ReLU is actually a sub-derivative.
The sigmoid function is by far the most used activation function in neural networks. The need for sigmoid function stems from the fact that many learning algorithms require the activation function to be differentiable and hence continuous. The step function is not suitable in those situations as it is not continuous. There are two broad types of sigmoid functions:
A binary sigmoid function is of the form
Here z can be substituted as z = kz where k = steepness or slope parameter of the sigmoid function. By varying the value of k, sigmoid functions with different slopes can be obtained. It has a range of (0,1).
A bipolar sigmoid function is of the form
The range of sigmoid functions can be varied depending on the application. However, the range of (-1,+1) is most commonly adopted.
The hyperbolic tangent function is another continuous activation function, which is bipolar in nature. It is a widely adopted activation function for a special type of neural network known as the backpropagation network. The hyperbolic tangent function is of the form:
This function is similar to the bipolar sigmoid function.
Note that all the activation functions have values ranging between 0 and 1. However, in some cases, it is desirable to have values ranging between -1 to +1. In that case, there will be a need to reframe the activation function.