What is Attention Mechanism?
TL;DR
A neural network mechanism that dynamically computes the relevance between elements of input data. The core innovation of the Transformer.
Attention Mechanism: Definition & Explanation
The Attention Mechanism is a neural network technique that dynamically calculates how much each element of the input relates to every other element, allowing the model to 'focus' on the most important information. The Self-Attention mechanism introduced in the 2017 Transformer paper 'Attention Is All You Need' computes relationships between all words in the input text in parallel, enabling the model to understand long-range context. It uses three vectors — Query, Key, and Value — to calculate relevance scores and assigns higher weights to important information. Multi-Head Attention employs multiple attention heads to capture different aspects of relationships. This mechanism is the foundational technology of all current LLMs, including GPT, Claude, and Gemini. Improved variants such as Flash Attention and Sliding Window Attention have been developed to process long contexts more efficiently.