What is MoE (Mixture of Experts)?

TL;DR

An AI architecture that efficiently processes inputs by routing them to specialized sub-networks.

MoE (Mixture of Experts): Definition & Explanation

MoE (Mixture of Experts) is an AI architecture composed of multiple specialized sub-networks (experts) and a gating mechanism that selects the appropriate experts based on the input. Rather than using all parameters for every input, it activates only a subset of experts, keeping computational costs low even with a large total parameter count. For example, Mixtral 8x7B has 47B total parameters but only activates about 13B during inference. GPT-4 is also reported to use an MoE architecture. Since Google's Switch Transformer demonstrated the effectiveness of large-scale MoE, the approach has been adopted in models like Grok and DeepSeek-V2. MoE enables significant reductions in inference cost while maintaining total model knowledge, making it an important technique for scaling large models. Its adoption in LLM development is expected to continue expanding.

Related AI Tools

Related Terms

AI Marketing Tools by Our Team