Reinforcement Learning– Intelligent Weighting of Monte Carlo and Temporal Differences
In Reinforcement learning the updating of the value functions determines the information spreading across the state/state-action space which condenses the valuebased control policy. It is important to have an information propagation across the value domain in a manner that is effective. Two common ways to update the value function is Monte-Carlo updating and temporal difference updating. They are
