Sep 16, 2022

Without going too much into the math, the solution to the bandit problem is easy to understand: the optimal strategy is to start with a period of exploration, where you pull levers at random and gather information. When you have more information about what works and what doesn’t, you shift to spending the majority of your time pulling the best lever (exploitation), but you keep exploring the other options in case your current best option isn’t the very best that exists.

Here’s the thing: the exploration phase never stops. Even if, in your heart of hearts, you’re positively certain you’ve found the best possible option, you never stop experimenting, because the information you gather by experimenting is still valuable.

The only way to beat the bandit is to keep trying new things.

Ac ut consequat semper viverra nam. Hac habitasse platea dictumst vestibulum rhoncus. Amet porttitor eget dolor morbi non. Justo eget magna fermentum iaculis eu non. Id eu nisl nunc mi ipsum faucibus vitae aliquet nec. Aliquam id diam maecenas ultricies. Non sodales neque sodales ut etiam. Amet massa vitae tortor condimentum lacinia quis. Erat imperdiet sed euismod nisi porta. Nisl suscipit adipiscing bibendum est ultricies integer quis auctor. Viverra suspendisse potenti nullam ac. Tincidunt id aliquet risus feugiat in. Varius quam quisque id diam vel. Egestas erat imperdiet sed euismod nisi. Scelerisque felis imperdiet proin fermentum leo vel orci porta non. Ut faucibus pulvinar elementum integer. Fermentum odio eu feugiat pretium nibh ipsum consequat nisl.