Exploration and Primal-dual Methods in Bandits and Reinforcement Learning