













So, the process is similar to one-to-many RNN?

learn much more efficiently than model-free method














iteratively get better






less than 300 trials ~ 25min robot time per task













visual prediction from the observation




during train of model, there is no reward. Some random motions are programmed. at the task time, there is a reward function, basically trying to move a pixel to the goal position.







