zoukankan      html  css  js  c++  java
  • Rock Paper Scissors

    Rock Paper Scissors

    https://www.freecodecamp.org/learn/machine-learning-with-python/machine-learning-with-python-projects/rock-paper-scissors

    简单的竞赛游戏, 使用算法,学习对手的规则,战胜对手。

    For this challenge, you will create a program to play Rock, Paper, Scissors. A program that picks at random will usually win 50% of the time. To pass this challenge your program must play matches against four different bots, winning at least 60% of the games in each match.

    You can access the full project description and starter code on repl.it.

    马尔科夫链解法

    https://forum.freecodecamp.org/t/cant-beat-abbey-rock-paper-scissors-project/447449/2

    https://github.com/marius-mm/freeCodeCampProjects/blob/main/Machine%20Learning%20with%20Python/RockPaperScissors/RPS.py

    作者生成四个对手中能打败3个,但是实际测试只能打败两个。以60%为标准。

    import random
    import mchmm as mc
    import numpy as np
    
    winDict = {"R": "P", "S": "R", "P": "S"}
    
    strategy = 1
    
    
    def player(prev_play, opponent_history=[]):
        global strategy
        # firstCall
        if len(opponent_history) <= 0:
            opponent_history.append("R")
            opponent_history.append("S")
        if len(prev_play) <= 0:
            prev_play = "P"
        # /firstCall
    
        opponent_history.append(prev_play)
    
        if strategy == 1:
            memory = 800
            guess = predict(prev_play, opponent_history, memory)
    
        return guess
    
    
    def predict(prev_play, oppnent_history, memoryLength):
        if len(oppnent_history) > memoryLength:
            oppnent_history.pop(0)
    
        chain = mc.MarkovChain().from_data(oppnent_history)
        predictionNextItem = giveMostProbableNextItem(chain, prev_play)
        winningMove = winDict[predictionNextItem]
        return winningMove
    
    
    def contains_duplicates(X):
        X = np.round(X,4)
        return len(np.unique(X)) != len(X)
    
    
    def giveIndexOfState(chain, item):
        return np.where(chain.states == item)[0][0]
    
    
    def giveMostProbableNextItem(chain, lastItem):
    
    
    
        retval = chain.states[
            np.argmax(chain.observed_p_matrix[giveIndexOfState(chain, lastItem)])
        ]
    
        return retval

    据介绍需要增加马尔科夫链的长度。

    do you mean with chain length like different states? RS RR RP SR … instead of S R P ?

    That’s exactly what I mean. That’s also what Abbey is doing, so you will have to use a longer chain than she does or use her chain against her to win.

    https://forum.freecodecamp.org/t/rock-paper-scissors-help-with-abbey/452902

    abbey使用的是长度为2的马尔科夫链, 作为对手需要使用不小于二的链去竞赛。

    Abbey is a Markov chain player, using a length of 2, so you’ve got a good example there. A longer Markov chain can defeat her or an appropriate length 2 chain will work as well. As I have mentioned here before, it is possible to know who you are playing, through various means, and employ the correct algorithm against them. It’s possible to beat all the players more than 80% of the time. As you can see from reading Abbey’s code, a Markov chain algorithm isn’t very complex.

    A Markov chain isn’t the only algorithm that will work here, but it is one of the simpler ones. This project is not current fashionable machine learning (think neural nets) but old school machine learning. There is quite a bit of information on RPS strategy on the web once you get past all the RPS bot tutorials that use neural nets to recognize human hands playing RPS.

    mchmm 库

    https://github.com/maximtrp/mchmm

    Discrete Markov chains

    Initializing a Markov chain using some data.

    >>> import mchmm as mc
    >>> a = mc.MarkovChain().from_data('AABCABCBAAAACBCBACBABCABCBACBACBABABCBACBBCBBCBCBCBACBABABCBCBAAACABABCBBCBCBCBCBCBAABCBBCBCBCCCBABCBCBBABCBABCABCCABABCBABC')

    Now, we can look at the observed transition frequency matrix:

    >>> a.observed_matrix
    array([[ 7., 18.,  7.],
           [19.,  5., 29.],
           [ 5., 30.,  3.]])

    And the observed transition probability matrix:

    >>> a.observed_p_matrix
    array([[0.21875   , 0.5625    , 0.21875   ],
           [0.35849057, 0.09433962, 0.54716981],
           [0.13157895, 0.78947368, 0.07894737]])

    You can visualize your Markov chain. First, build a directed graph with graph_make() method of MarkovChain object. Then render() it.

    >>> graph = a.graph_make(
          format="png",
          graph_attr=[("rankdir", "LR")],
          node_attr=[("fontname", "Roboto bold"), ("fontsize", "20")],
          edge_attr=[("fontname", "Iosevka"), ("fontsize", "12")]
        )
    >>> graph.render()

    Here is the result:

    images/mc.png

     

    隐马尔科夫链方法

    使用hmmlearn来学习观测序列, 获得隐藏的状态规律。

    效果

    运行结果如下。

    从中可以看出,对于1 和 4 选手, 规律性出招的, 预测效果很好。

    对于2 和 3 具有对抗性质的选手, 预测效果不好。

    说明 hmm 是用于学习隐藏规律的。

    --------- you vs quincy ----------
    Final results: {'p1': 749, 'p2': 103, 'tie': 148}
    Player 1 win rate: 87.91079812206573%
    --------- you vs abbey ----------
    Final results: {'p1': 331, 'p2': 392, 'tie': 277}
    Player 1 win rate: 45.78146611341632%
    --------- you vs kris ----------
    Final results: {'p1': 455, 'p2': 398, 'tie': 147}
    Player 1 win rate: 53.341148886283705%
    --------- you vs mrugesh ----------
    Final results: {'p1': 776, 'p2': 187, 'tie': 37}
    Player 1 win rate: 80.5815160955348%

    code

    import random
    from hmmlearn import hmm
    import numpy as np
    import math
    
    states = ["0", "1", "2", "3", "4"]
    n_states = len(states)
    
    
    observations_dict = {
                            0: "R",
                            1: "P",
                            2: "S"
                        }
    n_features = len(observations_dict)
    
    
    def player(prev_play, opponent_history=[], verbose=False):
        # print("call player")
        # print(prev_play)
        # print(len(opponent_history))
    
        global n_states
    
        play_list = ["R", "P", "S"]
        win_dict = {"R": "P", "P": "S", "S": "R"}
    
        if prev_play in play_list:
            opponent_history.append(prev_play)
    
        # default
        me_play = random.choice(play_list)
    
        learning_point = 40
    
        look_back = 4
    
        if len(opponent_history) > learning_point:
            if verbose:
                print("now enter learn and predict mode")
                print(f"enter learn stage, with learning window {learning_point}")
    
            # observations = opponent_history[-learning_point:]
            observations = opponent_history[:]
            observations = [[play_list.index(x)] for x in observations]
            observations = np.array(observations)
    
            model = hmm.MultinomialHMM(n_components=n_states,
                                       n_iter=100,
                                       tol=1,
                                       verbose=False,
                                       init_params="ste")
    
            model_trained = model.fit(observations)
    
            start = model_trained.startprob_
            if verbose:
                print("-------- start ---------")
                print(start)
    
            transition = model_trained.transmat_
            if verbose:
                print("-------- transition ---------")
                print(transition)
    
            emission = model_trained.emissionprob_
            if verbose:
                print("-------- emission ---------")
                print(emission)
    
            if verbose:
                print(f"enter predict stage, with look back {look_back}")
    
            obs_now = opponent_history[-look_back:]
            obs_now = "".join(obs_now)
    
            # print(obs_now)
            options = [obs_now + v for v in play_list]
    
            options_prob = [0, 0, 0]
            for i, one_option in enumerate(options):
                one_option = list(one_option)
                one_option = [[play_list.index(x)] for x in one_option]
                one_option = np.array(one_option)
                one_prob = model_trained.score(one_option)
                options_prob[i] = one_prob
    
                if verbose:
                    print(f"possible option {one_option} with probability {one_prob}")
    
            options_prob = np.array(options_prob)
    
            best_index = np.argmax(options_prob)
            best_play = play_list[best_index]
    
            if verbose:
                print(f"opponent most possible next play is {best_play}")
    
            me_play = win_dict[best_play]
    
        return me_play

    API

    https://hmmlearn.readthedocs.io/en/latest/api.html#multinomialhmm

    MultinomialHMM

    class hmmlearn.hmm.MultinomialHMM(n_components=1, startprob_prior=1.0, transmat_prior=1.0, algorithm='viterbi', random_state=None, n_iter=10, tol=0.01, verbose=False, params='ste', init_params='ste')

    Hidden Markov Model with multinomial (discrete) emissions.

    参考

    https://github.com/alicelynch/hmm-python-meetup/blob/master/notebooks/Hidden%20Markov%20Model.ipynb

    model = hmm.MultinomialHMM(n_components=n_states,
                               n_iter=100,
                               tol=1,
                               verbose=True,
                               init_params="ste")
    
    
    model_trained = model.fit(observations)

    其它应用

    自然语言生成器

    https://github.com/mfilej/nlg-with-hmmlearn/blob/master/train.py

    股票价格预测

    https://github.com/HvyD/HMM-Stock-Predictor/blob/master/HMM%20Tesla%20Stock%20Predictor.ipynb

    https://zhuanlan.zhihu.com/p/166552799

    最大可能子序列预测法

    https://forum.freecodecamp.org/t/machine-learning-with-python-projects-rock-paper-scissors/412794/4

    比赛过程中,统计对手的最近N步,后的出手情况的次数, 根据最大概率, 来推测当前用户出手的可能性。

    对于所有的对手都有很好的效果。

    Unfortunately I don’t seem to have saved a copy :frowning:
    But basically you use the last n moves to predict what Abby will do next. The important step here ist to basically let your programme dynamically/on the fly build up a list of combinations containing the last n steps + entry n+1 (Abby’s reaction) and their counts. So you start with an empty list, and every time new data rolls in, you check, if you have this entry in the list: If yes increase it’s count by 1, otherwise set it to 1
    Say, we have the following example (n=2 for simplicity, to beat Abby you’ll need to increase n): Incoming data: [P,P,R,S,P,P,R…]
    Initially list contains
    When we have [P,P,R],(length=n+1) we enter ‘PPR’ = 1 (elements 0 to n of incoming data) into our list
    Then ‘PRS’ = 1 (elements 1 to n+1 of incoming data)
    Then ‘RSP’ = 1 (elements 2 to n+2 of incoming data)
    Then ‘SPP’ = 1
    Then we see, we already have ‘PPR’ in our list, so we increase it to 2…
    Hope this helps, otherwise please feel free to ask! Sorry, I don’t seem to have the code anymore

    wtf = {}
    
    def player(prev_play, opponent_history=[]):
      global wtf
    
      n = 5
    
      if prev_play in ["R","P","S"]:
        opponent_history.append(prev_play)
    
      guess = "R" # default, until statistic kicks in
    
      if len(opponent_history)>n:
        inp = "".join(opponent_history[-n:])
    
        if "".join(opponent_history[-(n+1):]) in wtf.keys():
          wtf["".join(opponent_history[-(n+1):])]+=1
        else:
          wtf["".join(opponent_history[-(n+1):])]=1
    
        possible =[inp+"R", inp+"P", inp+"S"]
    
        for i in possible:
          if not i in wtf.keys():
            wtf[i] = 0
    
        predict = max(possible, key=lambda key: wtf[key])
    
        if predict[-1] == "P":
          guess = "S"
        if predict[-1] == "R":
          guess = "P"
        if predict[-1] == "S":
          guess = "R"
    
    
      return guess

    RNN边赛边练法

    https://github.com/fanqingsong/boilerplate-rock-paper-scissors/blob/master/RPS.py

    采用RNN网络, 使用在线学习的技术, 在每次对方出手后, 进行在线学习, 然后根据学习后的模型, 进行预测对手下一步出手的可能性。

    结果 -- 从中看出abbey还是很难使用RNN网络去对付

    -------- you vs quincy -------------

    Final results: {'p1': 988, 'p2': 6, 'tie': 6}
    Player 1 win rate: 99.3963782696177%
    -------- you vs abbey -------------
    Final results: {'p1': 431, 'p2': 301, 'tie': 268}
    Player 1 win rate: 58.879781420765035%
    -------- you vs kris -------------
    Final results: {'p1': 768, 'p2': 227, 'tie': 5}
    Player 1 win rate: 77.1859296482412%
    -------- you vs mrugesh -------------
    Final results: {'p1': 828, 'p2': 169, 'tie': 3}
    Player 1 win rate: 83.04914744232697%

    code

    # The example function below keeps track of the opponent's history and plays whatever the opponent played two plays ago. It is not a very good player so you will need to change the code to pass the challenge.
    
    import numpy as np
    import random
    from keras.models import Sequential
    from keras.layers import Dense, Input, LSTM
    from keras.layers.core import Dense, Activation, Dropout
    from keras.utils import np_utils
    import keras as K
    
    look_back = 4
    win_dict = {"R": "P", "S": "R", "P": "S"}
    
    
    def create_nn_model():
        init = K.initializers.glorot_uniform(seed=1)
        simple_adam = K.optimizers.Adam()
    
        model = Sequential()
        # model.add(Input(shape=(look_back,)))
        model.add(LSTM(10, input_shape=(1,look_back)))
        # model.add(Dense(20, activation='relu'))
        model.add(Dense(10, activation='relu'))
        # model.add(Dropout(0.3))
        # model.add(Dense(5, activation='relu'))
        # model.add(Dropout(0.3))
        model.add(Dense(3, activation='softmax'))
        model.compile(loss='categorical_crossentropy', optimizer=simple_adam, metrics=['accuracy'])
    
        return model
    
    
    def player(prev_play, opponent_history, model, batch_x, batch_y, review_epochs=10):
        # print(f"now player1 is in turn, opponent play is {prev_play}")
    
        plays = ["R","P","S"]
        play_dict = {"R":0,"P":1,"S":2}
        plays_categorial = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
    
        opponent_history_len = len(opponent_history)
        # print(f"opponent_history_len = {opponent_history_len}")
    
        if opponent_history_len < look_back:
            if prev_play:
                opponent_history.append(prev_play)
            guess = random.randint(0,2)
            return plays[guess]
    
        one_x = [play_dict[move] for move in opponent_history[-look_back:]]
    
        one_y = play_dict[prev_play]
        one_y = plays_categorial[one_y]
    
        batch_x.append(one_x)
        batch_y.append(one_y)
    
        for i in range(0, review_epochs):
            # print(f"now train by epoch {i}")
            batch_x = np.array(batch_x)
            # print(batch_x.shape)
            batch_x_final = np.reshape(batch_x, (batch_x.shape[0], 1, batch_x.shape[1]))
            # print(batch_x.shape)
    
            batch_y = np.array(batch_y)
            # print(batch_y.shape)
    
            # print(batch_x_final.shape)
            # print(batch_y.shape)
            model.train_on_batch(batch_x_final, batch_y)
    
        opponent_history.append(prev_play)
    
        current_x = [play_dict[move] for move in opponent_history[-look_back:]]
        current_x = np.array([current_x])
        current_x = np.reshape(current_x, (current_x.shape[0], 1, current_x.shape[1]))
        predict_y = model.predict_on_batch(current_x)
        predict_y = predict_y.tolist()
        # print(predict_y)
        predict_y = predict_y[0]
        guess = np.argmax(predict_y)
        # print(guess)
    
        opponent_play = plays[guess]
        me_play = random.choice(['R', 'P', 'S'])
        me_play = win_dict.get(opponent_play, me_play)
    
        return me_play

    参考资料

    使用keras进行鸢尾花种类预测

    https://www.jianshu.com/p/1d88a6ed707e

    https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/

    RNN预测股票收盘价格示例

    https://github.com/omerbsezer/LSTM_RNN_Tutorials_with_Demo/blob/master/StockPricesPredictionProject/pricePredictionLSTM.py

    # create and fit the LSTM network, optimizer=adam, 25 neurons, dropout 0.1
    model = Sequential()
    model.add(LSTM(25, input_shape=(1, look_back)))
    model.add(Dropout(0.1))
    model.add(Dense(1))
    model.compile(loss='mse', optimizer='adam')
    model.fit(trainX, trainY, epochs=1000, batch_size=240, verbose=1)
    
    # make predictions
    trainPredict = model.predict(trainX)
    testPredict = model.predict(testX)

     

    Tensorflow 2.0 LSTM training model

    https://www.programmersought.com/article/57304583087/

    # Import library
    import tensorflow as tf
    from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
    from tensorflow import keras
    import numpy as np
    from scipy import sparse
    
    import os
    
    # Only use gpu 0
    os.environ["CUDA_VISIBLE_DEVICES"] = "1" 
    
    # Set random number seed
    tf.random.set_seed(22)
    np.random.seed(22)
    assert tf.__version__.startswith('2.')
    
    batchsz = 256 # batch size
    
    # the most frequest words
    total_words = 4096 # Number of words in the dictionary to be encoded
    max_review_len = 1995 # How many words does the sequence contain
    embedding_len = 100 # Length of each word encoding
    
    units = 64 # The dimension of the parameter output in the lstm layer
    epochs = 100  #Train 100 epches
    
    # Read in the data, here is the data stored with sparse matrix
    matrixfile = "textword_numc_sparse.npz" # Import your own text sample, the text has been converted into digital representation
    targetfile = "target_5k6mer_tfidf.txt" # label, this is the second category
    
    allmatrix = sparse.load_npz(matrixfile).toarray() 
    target = np.loadtxt(targetfile)
    print("allmatrix shape: {};target shape: {}".format(allmatrix.shape, target.shape))
    
    x = tf.convert_to_tensor(allmatrix, dtype=tf.int32)
    x = keras.preprocessing.sequence.pad_sequences(x, maxlen=max_review_len)
    y = tf.convert_to_tensor(target, dtype=tf.int32)
    
    idx = tf.range(allmatrix.shape[0])
    idx = tf.random.shuffle(idx)
    
    # Divide the training set, verification set, test set, according to the ratio of 7:1:2
    x_train, y_train = tf.gather(x, idx[:int(0.7 * len(idx))]), tf.gather(y, idx[:int(0.7 * len(idx))])
    x_val, y_val = tf.gather(x, idx[int(0.7 * len(idx)):int(0.8 * len(idx))]), tf.gather(y, idx[int(0.7 * len(idx)):int(0.8 * len(idx))])
    x_test, y_test = tf.gather(x, idx[int(0.8 * len(idx)):]), tf.gather(y, idx[int(0.8 * len(idx)):])
    print(x_train.shape,x_val.shape,x_test.shape)
    
    db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    db_train = db_train.shuffle(6000).batch(batchsz, drop_remainder=True).repeat()
    db_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
    db_val = db_val.batch(batchsz, drop_remainder=True)
    db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
    db_test = db_test.batch(batchsz, drop_remainder=True)
    
    # Build a model
    network = Sequential([layers.Embedding(total_words, embedding_len,input_length=max_review_len),
                          layers.LSTM(units, dropout=0.5, return_sequences=True, unroll=True),
                          layers.LSTM(units, dropout=0.5, unroll=True),
                          # If using gru, just replace the upper two layers with
                          # layers.GRU(units, dropout=0.5, return_sequences=True, unroll=True),
                          # layers.GRU(units, dropout=0.5, unroll=True),
                          layers.Flatten(),
                          #layers.Dense(128, activation=tf.nn.relu),
                          #layers.Dropout(0.6),
                          layers.Dense(1, activation='sigmoid')])
    
    # View model sumaary
    network.build(input_shape=(None, max_review_len))
    network.summary()
    
    # Compile
    network.compile(optimizer=keras.optimizers.Adam(0.001),
                      loss=tf.losses.BinaryCrossentropy(),
                      metrics=['accuracy'])
    
    #Training, note that setps_per_epoches is set here, repeat() is required in db_train, otherwise there is warning, see my article for details: https://blog.csdn.net/weixin_44022515/article/details/103884654
    
    network.fit(db_train, epochs=epochs, validation_data=db_val,steps_per_epoch=x_train.shape[0]//batchsz)
    
    network.evaluate(db_test)

    keras 在线学习接口 train_on_batch

    https://www.programmersought.com/article/2809219970/

    for batch_no in range(100):
        X_train, Y_train = np.random.rand(32, 3), np.random.rand(32, 1)
        logs = model.train_on_batch(X_train, Y_train)

    https://keras.io/api/models/model_training_apis/#trainonbatch-method

    Runs a single gradient update on a single batch of data.

    出处:http://www.cnblogs.com/lightsong/ 本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。
  • 相关阅读:
    51nod 最长公共子序列Lcs
    输入挂
    HDU 圆桌会议
    畅通工程
    异形卵
    Python中的多态如何理解?(转帖,让我很理解。)【外加自我看法】(这次修改后应该就是标准答案了)
    Python短路逻辑or的巧妙使用。
    Python三元表达式
    稍微记号下Python的赋值技巧。
    刚看到一个字符串的替换命令,makeslate,记号一下(用处大?应该不算)!
  • 原文地址:https://www.cnblogs.com/lightsong/p/14725190.html
Copyright © 2011-2022 走看看