What is Transformer?
TL;DR
The neural network architecture that underpins modern AI models. Self-attention is its core innovation.
Transformer: Definition & Explanation
The Transformer is a neural network architecture introduced in 2017 by a Google research team in the landmark paper 'Attention Is All You Need.' Its core mechanism — self-attention — enables each element of the input to simultaneously compute its relationship to every other element, allowing efficient understanding of context. Compared to earlier RNN (Recurrent Neural Network) and LSTM architectures, the Transformer excels at processing long-range contexts and parallel computation, making large-scale training feasible. GPT, BERT, Gemini, Claude, and virtually all major LLMs today are built on the Transformer architecture. Its applications have expanded well beyond natural language processing to include computer vision (Vision Transformer), audio processing, and more.