Reinforcement Learning in Buchberger's Algorithm