zoukankan      html  css  js  c++  java
  • L1_2_1_1->L1_2_1_1_1

    [comment]: # 蒙特卡洛树搜索算法(UCT): 一个程序猿进化的故事


    本文是根据的文章Introduction to Monte Carlo Tree Search by Jeff Bradberry所写。
    Jeff Bradberry还提供了一整套的例子,用python写的。
    board game server
    board game client
    Tic Tac Toe board
    AI implementation of Tic Tac Toe

    阿袁工作的第一天 - 蒙特卡罗树搜索算法 - 游戏的通用接口board 和 player


    注: perfect information games (完美信息)博弈,指的是没有任何信息被隐藏的游戏。


    class Board(object):
        Define general rules of a game.
        State: State is an object which is only be used inside the board class.
            Normally, a state include game board information (e.g. chessmen positions, action index, current action, current player, etc.)
        Action: an object to describe a move. 
        num_players: The player numbers of the board.
        num_players = 2
        def start(self):
            Start the game
            Return: the initial state
            return None
        def display(self, state, action, _unicode=True):
            Dispaly the board
            state: current state
            action: current action
            Return: display information
            return None
        def parse(self, action):
            Parse player input text into an action.
            If the input action is invalid, return None.
            The method is used by a human player to parse human input.
            action: player input action texxt.
            Return: action if input is a valid action, otherwise None.
            return None
        def next_state(self, state, action):
            Calculate the next state base on current state and action.
            state: the current state
            action: the current action
            Return: the next state
            return tuple(state)
        def is_legal(self, history, action):
            Check if an action is legal.
            The method is used by a human player to validate human input.
            history: an array of history states.
            Return: ture if the action is legal, otherwise return false.
            return (R, C) == (state[20], state[21])
        def legal_actions(self, history):
            Calculate legal action from history states.
            The method is mainly used by AI players.
            history: an array of history states.
            Return: an array of legal actions.
            return actions
        def current_player(self, state):
            Gets the current player.
            state: the current state.
            Return: the current player number.
            return None
        def winner(self, history):
            Gets the win player.
            history: an array of history states.
            Return: win player number. 0: no winner and no end, players numbers + 1: draw.
            return 0
        def winner_message(self, winner):
            Gets game result.
            winner: win player number
            Return: winner message, the game result.
            return ""


    class Player(object):
        def update(self, state):
            Update current state into all states.
            state: the current state.
        def display(self, state, action):
            Display board.
            state: the current state.
            action: the current action.
            Return: display information.
            return self.board.display(state, action)
        def winner_message(self, msg):
            Display winner message.
            msg: winner infomation
            Return: winner message
            return self.board.winner_message(msg)
        def get_action(self):
            Get player next action.
            Return: the next action.
            return action

    注:方法: diplay and winner_message用于向游戏的客户端提供board的信息。这样隔离了客户端和board。

    阿袁工作的第2天 - 蒙特卡罗树搜索算法 - MonteCarlo Player

    阿静说道,“在编写一个人工智能游戏对弈的应用中,至少需要两个具体的player,一个是human player,一个是MonteCarlo player。”
    "human player向人类玩家提供了一个交互界面。"
    “对,MonteCarlo player是一个AI player,也是我们要讨论的重点,MonteCarlo player在实现get_action中,通过board,模拟后面可能下法;并根据模拟的结果,获得一个最优的下法。”

    class MonteCarlo(object):
        def __init__(self, board, **kwargs):
            # ...
            self.calculation_time = float(kwargs.get('time', 30))
            self.max_actions = int(kwargs.get('max_actions', 1000))
            # ...
        def get_action(self):
            # ...
            # Control period of simulation
            moves = 0
            begin = time.time()
            while time.time() - begin < self.calculation_time:
                moves += 1
            # ...
        def run_simulation(self):
            # ...
            # Control number of simulation actions
            for t in range(1, self.max_actions + 1):
                # ...
            # ...



    阿袁工作的第3天 - 蒙特卡罗树搜索 - 蒙特卡罗树搜索的步骤

    “是的。所以蒙特卡罗树搜索有很多变种,我们现在学习的算法是蒙特卡罗树搜索算法的一个变种:信任度上限树(Upper Confidence bound applied to Trees(UCT))。这个我们明天研究。”

    Monte Carlo Tree Search Steps Monte Carlo Tree Search Steps cluster0 Loop: limit simulation period time. One loop one path. cluster0 Loop: limit max actions. One loop one action. Start Start reach_time_limitation Reach time limitation? Start->reach_time_limitation End End loop_meet_max_actions Meet max actions? reach_time_limitation->loop_meet_max_actions no select_best_action Select the best action and return reach_time_limitation->select_best_action yes back_propagation Back-Propagation back_propagation->reach_time_limitation loop_meet_max_actions->back_propagation yes get_children_actions Get children actions loop_meet_max_actions->get_children_actions no meet_selection_criteria Meet selection criteria? get_children_actions->meet_selection_criteria selection Selection meet_selection_criteria->selection yes expansion Expansion meet_selection_criteria->expansion no simulation Simulation selection->simulation expansion->simulation has_winner Has Winner? simulation->has_winner has_winner->back_propagation yes has_winner->loop_meet_max_actions no select_best_action->End
    “**选举(selection)**是根据当前获得所有子步骤的统计结果,选择一个最优的子步骤。” “**扩展(expansion)**在当前获得的统计结果不足以计算出下一个步骤时,随机选择一个子步骤。” “**模拟(simulation)**模拟游戏,进入下一步。” “**反向传播(Back-Propagation)**根据游戏结束的结果,计算对应路径上统计记录的值。” “从上面这张图可以看出,选举的算法很重要,这个算法可以说是来评价每个步骤的价值的。” “好了。今天,我们了解了蒙特卡罗树搜索的步骤。” “明天,可以学习Upper Confidence bound applied to Trees(UCT) - 信任度上限树算法。”

    阿袁工作的第4天 - 蒙特卡罗树搜索 - Upper Confidence bound applied to Trees(UCT) - 信任度上限树算法

    置信区间(confidence intervals)

    [ar{x}_i pm sqrt{frac{zln{n}}{n_i}} \ where: \ qquad ar{x}_i ext{ : the mean of choose i.} \ qquad n_i ext{ : the number of plays of choose i.} \ qquad n ext{ : the total number of plays.} \ qquad z ext{ : 1.96 for 95% confidence level.} ]


    阿袁工作的第5天 - 蒙特卡罗树搜索 - 图形化模拟 Upper Confidence bound applied to Trees(UCT) - 信任度上限树算法


    • 首先,初始状态下,所有的子步骤都没有统计数据。
    Monte Carlo Tree Search Steps - Initialize State Monte Carlo Tree Search Steps - Initialize State No statistics records for all children actions. L0 C L1_1 L0->L1_1 L1_2 L0->L1_2 L1_3 L0->L1_3 L1_4 L0->L1_4
    • 所以,先做扩展(Expansion),随机选择一个子步骤,不停的模拟(Simulation),直到游戏结束。然后反向传播(Back-Propagation),记录扩展步骤的统计数据。
    Monte Carlo Tree Search Steps Monte Carlo Tree Search Steps - Expansion L0 C L1_1 L0->L1_1 L1_2 0/1 L0->L1_2 L1_3 L0->L1_3 L1_4 L0->L1_4 L1_2_1 L1_2->L1_2_1 L1_2_1_1 L1_2_1->L1_2_1_1 L1_2_1_1_1 Lose L1_2_1_1->L1_2_1_1_1
    • 多次扩展(Expansion)之后,达到了选举(selection)的条件,开始选举(selection),选出最优的一个子步骤。
    Monte Carlo Tree Search Steps - Selection Monte Carlo Tree Search Steps - Selection After some expansions, all children actions are recorded. Select the one with max win rate. L0 C L1_1 2/5 L0->L1_1 L1_2 3/4 L0->L1_2 L1_3 0/1 L0->L1_3 L1_4 4/6 L0->L1_4
    • 继续扩展(Expansion)模拟(Simulation)反向传播(Back-Propagation)
    Monte Carlo Tree Search Steps - More Expansion, Simulation and Back-Propagation Monte Carlo Tree Search Steps - More Expansion, Simulation and Back-Propagation Would lead the best action is changed to another one. L0 C L1_1 2/5 L0->L1_1 L1_2 3/5 L0->L1_2 L1_3 0/1 L0->L1_3 L1_4 4/6 L0->L1_4 L1_2_1 0/1 L1_2->L1_2_1 L1_2_1_1 L1_2_1->L1_2_1_1 L1_2_1_1_1 Lose L1_2_1_1->L1_2_1_1_1


    2016年10月X日 星期六

    • 步骤价值计算
      • 是否可以在没有赢的情况下,计算价值?
      • 是否可以计算一个步骤是没有价值的,因而可以及早的砍掉它。


    • 是否AI程序可以理解规则?比如,理解马走日。
    • 是否AI程序可以算出一些领域规则。开局的方法、子力计算等。


  • 相关阅读:
    html meta标签使用总结
    JS windows.open打开窗口并居中
    phpstorm2017 激活方法
    php与mysql 绑定变量和预定义处理
  • 原文地址:https://www.cnblogs.com/steven-yang/p/5993205.html
Copyright © 2011-2022 走看看