zoukankan      html  css  js  c++  java
  • 对于中变量与类的使用感想

    ------------恢复内容开始------------

    在编写强化学习算法时,逻辑思想是:将exp信息放在一个字典里;exp成功失败的计数放在一个字典里,里面套列表;最后将exp信息放在一个根据exp中lp值进行排序的列表中。

    Exp_info = {‘name’:xxx,

    ‘lp’:xxx

    }

    Exp_dic = {xxx:[1,0],www:[0,1]}

    Order_list = [Exp_info,Exp_info,......]

    然后复现了当时的代码:


    """
    import numpy as np

    test_dict = {
        'a': True,
        'b': False,
        'c': True,
        'd': False,
        'e': False
    }
    hp = 0
    old_lp = 0
    p = 0.005
    exp_tool_info = {'name': '',
                     'lp': 0}
    dict_exp = {}
    ORDER_dict_list = []
    for i in range(3):
        for k, v in test_dict.items():

            exp_tool_info['name'] = k
            dict_exp.setdefault(k, [])

            if v:
                dict_exp[k].append(1)
            else:
                dict_exp[k].append(0)
            hp = np.mean(dict_exp[k])
            old_lp = exp_tool_info['lp']
            print(old_lp)
            exp_tool_info['lp'] = 0.095 * old_lp + 0.005 * hp
            temp_list = [i['name'] for i in ORDER_dict_list]
            print(temp_list)
            if exp_tool_info['name'] not in temp_list:
                ORDER_dict_list.append(exp_tool_info)
                print('========', ORDER_dict_list)
            ORDER_dict_list.sort(key=lambda x: x['lp'], reverse=True)  # 降序
            print('++++++++++++++')

    print(ORDER_dict_list)

    结果:

    0

    []

    ======== [{'name': 'a', 'lp': 0.005}]

    ++++++++++++++

    0.005

    ['b']

    ++++++++++++++

    0.000475

    ['c']

    ++++++++++++++

    0.005045125

    ['d']

    ++++++++++++++

    0.000479286875

    ['e']

    ++++++++++++++

    4.5532253125e-05

    ['a']

    ++++++++++++++

    0.005004325564046875

    ['b']

    ++++++++++++++

    0.0004754109285844531

    ['c']

    ++++++++++++++

    0.005045164038215523

    ['d']

    ++++++++++++++

    0.00047929058363047473

    ['e']

    ++++++++++++++

    4.55326054448951e-05

    ['a']

    ++++++++++++++

    0.005004325597517265

    ['b']

    ++++++++++++++

    0.0004754109317641402

    ['c']

    ++++++++++++++

    0.005045164038517593

    ['d']

    ++++++++++++++

    0.00047929058365917134

    ['e']

    ++++++++++++++

    [{'name': 'e', 'lp': 4.5532605447621275e-05}]

    {'a': [1, 1, 1], 'b': [0, 0, 0], 'c': [1, 1, 1], 'd': [0, 0, 0], 'e': [0, 0, 0]}

    根据结果发现,每个exp的lp都在增加,而且Order_list只有一个值。本想着每次更新name,也给它更新lp,这样就可以得到关于Exp_info的Order_list.

    遇到了一个出乎意料意料的结果,每次的exp的概率都有在变化,但第一次失败的竟然也有概率竟然不是0,这很困惑。

    经过分析,发现,每次的exp_info地址没变,就是lp都是获取上一个的,所以才每次都变,而不是应该开始都是0,且造成name没法筛出来。

    解决方案:建立一个exp类,将exp对象的lp与name绑定到一起。这样就好了。

    class Exp(object):

        def __init__(self, name: str):

            self.name = name

            self.lp = 0

            self.exp_info = {'name': self.name,

                             'lp': self.lp,

                             'exp_success_fail_list': []

                             }

        def set_lp(self, lp):

            self.lp = lp

            self.exp_info['lp'] = self.lp

        def get_exp_tool_info(self):

            return self.exp_info

    另一个攻击类中的方法:部分代码

    def dic_lst(self, Exp):

        if len(self.EXP_ORDER):
            for i in self.EXP_ORDER:
                if i.exp_info['name'] == Exp.exp_info['name']:
                    Exp = i
                    # print('2222')
            else:
                self.EXP_ORDER.append(Exp)
                # print('111')
        else:
            self.EXP_ORDER.append(Exp)
            print(''''就走一次''')

        self.EXP_ORDER = list(set(self.EXP_ORDER))
        print('&&&&&&&&&&', self.EXP_ORDER)
        return Exp

    for i, exp in enumerate(exps):
        print('++++++++++++exp++++++++++++', exp)
        # exp_tool_info['name'] = exp
        exp_obj = Exp(exp)
        exp_obj = self.dic_lst(exp_obj)

        exp_obj.exp_info['exp_success_fail_list'].append(1)
    else:
        # dict_exp[exp].append(0)
        exp_obj.exp_info['exp_success_fail_list'].append(0)

    if len(exp_obj.exp_info['exp_success_fail_list']) > 10:
        exp_obj.exp_info['exp_success_fail_list'].pop(0)

    # hp : 历史概率
    # hp = np.mean(dict_exp[exp])
    hp = np.mean(exp_obj.exp_info['exp_success_fail_list'])

    old_lp = exp_obj.exp_info['lp']
    print('-----------old_lp-------------', old_lp)
    # old_lp = old_exp['lp']
    exp_obj.set_lp((1 - p) * old_lp + p * hp)
    print('-----------------lp---------------', exp_obj.get_exp_tool_info()['lp'])
    # if EXP_ORDER

    此分享关键是思想。

    ------------恢复内容结束------------

  • 相关阅读:
    Activity 启动模式
    Android 网络编程之Http通信
    android的消息处理机制(图+源码分析)——Looper,Handler,Message
    Android 之 ListView使用SimpleAdapter展示列表
    android 滑动翻页手势实现
    android 数据存储之SharedPerferences
    回忆Java 之 文件读写及性能比较总结
    成长,没你想象的那么迫切!
    杂想程序员
    android 数据存储之SQLite
  • 原文地址:https://www.cnblogs.com/kevin-red-heart/p/13175300.html
Copyright © 2011-2022 走看看