强化学习与DQN
强化学习成就
Learned the world’s best player of Backgammon (Tesauro 1995)
Learned acrobatic helicopter autopilots (Ng, Abbeel, Coates et al
2006+)
Widely used in the placement and selection of advertisements on
the web (e.g. A-B tests)
Used to make strategic decisions in Jeopardy! (IBM’s Watson
2011)
Achieved human-level performance on Atari games from pixel
-level visual input, in conjunction with deep learning (Google
Deepmind 2015)
In all these cases, performance was better than could be obtained by
any other method, and was obtained without human instruction