What is Multi-Armed Bandit?
TL;DR
An optimization method that, while trying multiple options (arms), automatically shifts allocation toward the better-performing ones. It resolves A/B testing's 'explore vs. exploit' dilemma and minimizes lost opportunity. Used in CRO, recommendations and ad serving.
Multi-Armed Bandit: Definition & Explanation
A Multi-Armed Bandit is an optimization method named after the problem of playing several slot machines (arms) within a limited number of pulls to maximize reward. While trying each option (a landing-page variant, ad creative, recommended item, etc.), it automatically shifts traffic toward those performing best on a metric such as click-through or conversion rate.\n\nThe crux is the 'exploration vs. exploitation' dilemma: should you try options you haven't sampled enough (explore), or bet on the best-performing so far (exploit)? Balancing this minimizes overall lost opportunity (regret). Representative algorithms include epsilon-greedy (random exploration with some probability), UCB (Upper Confidence Bound — preferentially try uncertain options), and Thompson sampling (Bayesian sampling from probability distributions).\n\nVersus A/B testing: A/B testing allocates evenly, waits for statistical significance, then switches all traffic to the winner — clear learning but large opportunity cost during the test. A bandit shifts allocation toward good arms during the test, so opportunity cost is small, but clean effect measurement and causal interpretation are weaker.\n\nApplication areas: CRO and landing-page optimization, ad-creative serving, recommendations, price/coupon optimization, news-headline testing. A development that uses context is the 'contextual bandit,' choosing the best arm by user attributes. Caveats: (★) effective for short campaigns or many options, but use A/B testing when you want clear causality, (★) you must design for reward delays and non-stationarity (the best changes over time). 2026 trends: auto-optimization becoming standard in CRO, recommendations and selecting LLM outputs.