Activation functions like Sigmoid, tanh, and softsign suffer from the vanishing gradient problem when used in deep networks. This happens because, in these functions, gradients become very small as the input moves away from the origin (either positively or negatively). As a result, the weights of the earlier layers in the network receive very small updates, hindering the learning process in deep networks. This is one reason why activation functions like ReLU, which avoid this issue, are often preferred in deep learning.
[Reference: Huawei HCIA-AI Certification, Deep Learning Overview – Activation Functions and Vanishing Gradient Problem., , ]
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit