Policy Optimization I: REINFORCE, Actor-Critic