Paper accepted to the RLC showing the adaptive step sizes for policy gradient methods have to balance the exploration/exploitation trade-off.