Sep 16, 2022

Without going too much into the math, the solution to the bandit problem is easy to understand: the optimal strategy is to start with a period of exploration, where you pull levers at random and gather information. When you have more information about what works and what doesn’t, you shift to spending the majority of your time pulling the best lever (exploitation), but you keep exploring the other options in case your current best option isn’t the very best that exists.

Here’s the thing: the exploration phase never stops. Even if, in your heart of hearts, you’re positively certain you’ve found the best possible option, you never stop experimenting, because the information you gather by experimenting is still valuable.

The only way to beat the bandit is to keep trying new things.

