Provably Efficient Reinforcement Learning: From Single-Agent Mdps To Markov Games