What is Activation Function?
TL;DR
A function that introduces non-linearity into neural networks. ReLU, Sigmoid, and GELU are common examples.
Activation Function: Definition & Explanation
An activation function is a non-linear function applied to each neuron's output in a neural network. Without activation functions, even multi-layer networks would reduce to simple linear transformations, unable to learn complex patterns. Key activation functions include ReLU (Rectified Linear Unit, the most widely used today), Sigmoid (outputs between 0-1, used for probability representation), Tanh (outputs between -1 and 1), GELU (used in Transformers like GPT and BERT), and Swish/SiLU (proposed by Google as an improved ReLU). Choosing the right activation function significantly impacts training efficiency and model performance, and plays a crucial role in avoiding the vanishing gradient problem.