Epsilon-Greedy Algorithm in Bandit Problems

Introduction Bandit problems are the simplest possible reinforcement learning scenario. Here the bandit machine can have k arms and pulling each arm leaves the user a reward. One of the arms will be giving higher rewards in the long run and moreover this pattern could be changing over a time period. Think of the scenario of advertisements… Continue reading Epsilon-Greedy Algorithm in Bandit Problems