RL: Policy Optimization I: REINFORCE, A2C, A3C