







in most AC algorithms, we actually just fit value function. less common to fit Q function as well.








batch:off line, monte carlo。online: bootstrap,TD

in fast emulator,use the left one




this strategy works well in the beginnning of training




