zoukankan      html  css  js  c++  java
  • 沉淀,再出发:结合案例看python

    沉淀,再出发:结合案例看python

    一、前言

      关于python,如果不经过大型程序开发的洗礼,我们很难说自己已经懂得了python了,因此,我们需要通过稍微结构化的编程来学习python。

    二、一个案例

       首先我们看一下需要具备的前提知识。

       2.1、新建pandas表格

    man_num = 100
    women_num = 100
    pd.DataFrame( [['w'+str(i) for i in random.sample(range(1,women_num+1),women_num)] for j in range(man_num)], index = ['m'+str(i) for i in range(1,man_num+1)], columns = ['level'+str(i) for i in range(1,women_num+1)] )

        通过上述的方式,我们创建出了一个表格:

       其中random的sample方法,我们可以从下面的例子中理解,在上面就是挑选自身的一种置换来填充每一个单元,for j in range(man_num)其实就是重复多少行的意思:

    import random
    
    list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    for i in range(3):
        slice = random.sample(list, 5)  # 从list中随机获取5个元素,作为一个片断返回
        print(slice)
        print(list, '
    ')  # 原有序列并没有改变

          2.2、reset_index()和to_csv()

        def to_csv(self, path=None, index=True, sep=",", na_rep='',
                   float_format=None, header=False, index_label=None,
                   mode='w', encoding=None, compression=None, date_format=None,
                   decimal='.'):
            """
            Write Series to a comma-separated values (csv) file
            Parameters
            ----------
            path : string or file handle, default None
                File path or object, if None is provided the result is returned as
                a string.
            na_rep : string, default ''
                Missing data representation
            float_format : string, default None
                Format string for floating point numbers
            header : boolean, default False
                Write out series name
            index : boolean, default True
                Write row names (index)
            index_label : string or sequence, default None
                Column label for index column(s) if desired. If None is given, and
                `header` and `index` are True, then the index names are used. A
                sequence should be given if the DataFrame uses MultiIndex.
            mode : Python write mode, default 'w'
            sep : character, default ","
                Field delimiter for the output file.
            encoding : string, optional
                a string representing the encoding to use if the contents are
                non-ascii, for python versions prior to 3
            compression : string, optional
                A string representing the compression to use in the output file.
                Allowed values are 'gzip', 'bz2', 'zip', 'xz'. This input is only
                used when the first argument is a filename.
            date_format: string, default None
                Format string for datetime objects.
            decimal: string, default '.'
                Character recognized as decimal separator. E.g. use ',' for
                European data
            """
            from pandas.core.frame import DataFrame
            df = DataFrame(self)
            # result is only a string if no path provided, otherwise None
            result = df.to_csv(path, index=index, sep=sep, na_rep=na_rep,
                               float_format=float_format, header=header,
                               index_label=index_label, mode=mode,
                               encoding=encoding, compression=compression,
                               date_format=date_format, decimal=decimal)
            if path is None:
               return result

         2.3、stack()和unstack()

         在用pandas进行数据重排时,经常用到stack和unstack两个函数。stack的意思是堆叠,堆积,unstack即“不要堆叠”。常见的数据的层次化结构有两种,一种是表格,一种是“花括号”,即下面这样的l两种形式:

     

           表格在行列方向上均有索引(类似于DataFrame),花括号结构只有“列方向”上的索引(类似于层次化的Series),结构更加偏向于堆叠(Series-stack,方便记忆)。stack函数会将数据从”表格结构“变成”花括号结构“,即将其行索引变成列索引,反之,unstack函数将数据从”花括号结构“变成”表格结构“,即要将其中一层的列索引变成行索引。

    import numpy as np
    import pandas as pd
    from pandas import Series,DataFrame
    data=DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['street1','street2']),columns=pd.Index(['one','two','three']))
    print(data)
    print('-----------------------------------------
    ')
    data2=data.stack()
    data3=data2.unstack()
    print(data2)
    print('-----------------------------------------
    ')
    print(data3)

       2.4、DataFrame上的concat()

        2.5、案例

    1    有一座城市,当地风俗是,想结婚的男子必须先向心仪的女子求婚,而女子则需要等待求婚。
    2    牧师每年会邀请人数相同的适婚男女参与一次集体相亲。一次相亲活动可能有很多轮,男子会首先向自己最爱的女子求婚,女子则会在所有的追求者中选择她的最爱;
    如果男子被拒绝,下一轮会向他第二喜欢的女子求婚;上一轮已经订婚的女子如果得到她更爱的人的求婚,则会毫不留情地抛弃未婚夫,和更爱的人在一起。
    被抛弃的男子需要重新参与求婚。如此反复,等大家都订婚,就举办集体婚礼。
    3 假设: 4 1)参加求婚的男女数量保持一致 5 2)每个男子都按喜爱程度对女子进行排序,比如最爱a,其次爱b,再次爱c 6 3)每个女子也同样给每个男子排序

        2.6、安装 pyecharts和pyecharts_snapshot

        2.7、创造样例

     1 import pandas as pd
     2 import random
     3 
     4 from configuration import MAN_NUM,WOMAN_NUM
     5 
     6 
     7 def create_sample():
     8     man_num = MAN_NUM
     9     women_num = WOMAN_NUM
    10 
    11     #设置男女生喜好样本
    12     print('==============================生成样本数据==============================')
    13     man = pd.DataFrame( [['w'+str(i) for i in random.sample(range(1,women_num+1),women_num)] 
    14                           for i in range(man_num)],
    15                         index = ['m'+str(i) for i in range(1,man_num+1)],
    16                         columns = ['level'+str(i) for i in range(1,women_num+1)]
    17                         )
    18 
    19     women = pd.DataFrame( [['m'+str(i) for i in random.sample(range(1,man_num+1),man_num)] 
    20                           for i in range(women_num)],
    21                         index = ['w'+str(i) for i in range(1,women_num+1)],
    22                         columns = ['level'+str(i) for i in range(1,man_num+1)]
    23                         )
    24     return (man,women)
    25 
    26 if __name__ == '__main__':
    27     create_sample(man,women)

       2.8、生成映射表

     1 import pandas as pd
     2 
     3 #设置姻缘关系表
     4 def create_mapping_table(man,women):
     5 
     6     man_ismapping = pd.DataFrame({
     7             'man_id':man.index,
     8             'target':'n',
     9             'love_level':0,
    10             'range':0
    11             }).set_index('man_id')
    12 
    13     women_ismapping = pd.DataFrame({
    14             'women_id':women.index,
    15             'target':'n',
    16             'love_level':0,
    17             'range':0
    18             }).set_index('women_id')
    19     return (man_ismapping,women_ismapping)
    20 
    21 if __name__ == '__main__':
    22     create_mapping_table(man,women)

        2.9、创建目录,完成初始化

     1 from configuration import  TEST_NUM
     2 import os
     3 from scr.create_sample import create_sample
     4 from scr.create_mapping_table import create_mapping_table
     5 import pandas as pd
     6 
     7 def loop_script():
     8     test_num = TEST_NUM
     9     if not os.path.exists('./data'):
    10         os.makedirs('./data')
    11 
    12     for i in range(1,test_num+1):
    13         print('==============================开始创建测试文件夹{}=============================='.format(i))
    14         path = './data/test' + str(i)
    15         if not os.path.exists(path):
    16             os.makedirs(path)
    17         sample_data = create_sample()
    18         man = sample_data[0]
    19         women = sample_data[1]
    20         man.reset_index().to_csv(path+'/'+'man_sample.csv', index=0)
    21         women.reset_index().to_csv(path+'/'+'woman_sample.csv', index=0)
    22         print('==============================样本数据生成成功==============================')
    23         print('==============================创建婚姻关系表==============================')
    24         man = pd.read_csv(path+'/'+'man_sample.csv').set_index('index')
    25         women = pd.read_csv(path+'/'+'woman_sample.csv').set_index('index')
    26         mapping_data = create_mapping_table(man, women)
    27         man_ismapping = mapping_data[0]
    28         women_ismapping = mapping_data[1]
    29         man_ismapping.reset_index().to_csv(path+'/'+'man_ismapping.csv', index=0)
    30         women_ismapping.reset_index().to_csv(path+'/'+'women_ismapping.csv', index=0)
    31         print('==============================婚姻关系表创建完成==============================')

       2.10、业务逻辑计算

     1 import pandas as pd
     2 from configuration import  TEST_NUM
     3 
     4 def calculation():
     5     test_num =  TEST_NUM
     6     for i in range(1,test_num+1):
     7         path = './data/test' + str(i)
     8         man = pd.read_csv(path + '/' + 'man_sample.csv').set_index('index')
     9         women = pd.read_csv(path + '/' + 'woman_sample.csv').set_index('index')
    10         man_ismapping = pd.read_csv(path + '/' + 'man_ismapping.csv').set_index('man_id')
    11         women_ismapping = pd.read_csv(path + '/' + 'women_ismapping.csv').set_index('women_id')
    12         print('==============================测试集{}模拟开始=============================='.format(i))
    13         print('==============================开始模拟求婚过程==============================')
    14         level_num = 0
    15         while man_ismapping['love_level'].min() == 0:
    16             level_num += 1
    17             print('==============================开始第{}天婚姻配对=============================='.format(level_num))
    18             u_mapping_man = man_ismapping[man_ismapping.target == 'n'].index.tolist()
    19 
    20             if level_num < 2:
    21                 level_col = 'level' + str(level_num)
    22                 man_choose = man[man.index.isin(u_mapping_man)][level_col].to_frame().reset_index()
    23                 man_choose.columns = ['man_id', 'women_id']
    24                 man_choose['range'] = 1
    25             else:
    26                 m_id = u_mapping_man
    27                 l = []
    28                 for man_id in m_id:
    29                     col_n = int(man_ismapping[man_ismapping.index == man_id].range[0])
    30                     level_col = 'level' + str(col_n + 1)
    31                     women_id = man[man.index == man_id][level_col][0]
    32                     rg = col_n + 1
    33                     l.append([man_id, women_id, rg])
    34                 man_choose = pd.DataFrame(l, columns=['man_id', 'women_id', 'range'])
    35 
    36             for r in range(0, len(man_choose)):
    37                 relationship = man_choose[man_choose.index == r]
    38                 m = [i for i in relationship['man_id']][0]
    39                 w = [i for i in relationship['women_id']][0]
    40                 find = women[women.index == w].unstack().reset_index()
    41                 find.columns = ['level', 'women_id', 'man_id']
    42                 find = int([i for i in find[find['man_id'] == m]['level']][0].split('level')[1])
    43                 o_love_level = [i for i in women_ismapping[women_ismapping.index == w]['love_level']][0]
    44                 rg = [i for i in relationship['range']][0]
    45                 if o_love_level == 0:
    46                     women_ismapping.loc[w, 'love_level'] = find
    47                     women_ismapping.loc[w, 'target'] = m
    48                     women_ismapping.loc[w, 'range'] = level_num
    49                     man_ismapping.loc[m, 'love_level'] = rg
    50                     man_ismapping.loc[m, 'target'] = w
    51                     man_ismapping.loc[m, 'range'] = rg
    52                 elif o_love_level > find:
    53                     m_o = women_ismapping.loc[w, 'target']
    54                     man_ismapping.loc[m_o, 'love_level'] = 0
    55                     man_ismapping.loc[m_o, 'target'] = 'n'
    56                     man_ismapping.loc[m, 'love_level'] = rg
    57                     man_ismapping.loc[m, 'target'] = w
    58                     man_ismapping.loc[m, 'range'] = rg
    59                     women_ismapping.loc[w, 'love_level'] = find
    60                     women_ismapping.loc[w, 'target'] = m
    61                     women_ismapping.loc[w, 'range'] = level_num
    62                 else:
    63                     man_ismapping.loc[m, 'range'] = rg
    64                     pass
    65 
    66         print('==============================婚姻配对完成==============================')
    67         print('共进行了{}次牵线搭桥,在第{}天举办集体婚礼。'.format(level_num, level_num + 1))
    68 
    69         print('==============================导出配对明细表==============================')
    70         man_love_level_mean = man_ismapping.love_level.mean()
    71         women_love_level_mean = women_ismapping.love_level.mean()
    72         detail = [[level_num,level_num,man_love_level_mean,women_love_level_mean]]
    73         pd.DataFrame(detail,columns=['match_range','party_time','man_love_level_mean','women_love_level_mean'])
    74                 .to_csv(path+'/'+'test_detail.csv', index=0)
    75         man_ismapping.reset_index().to_csv(path+'/'+'man_match_table.csv', index=0)
    76         women_ismapping.reset_index().to_csv(path+'/'+'woman_match_table.csv', index=0)
    77         print('==============================导出完毕==============================')
    78 
    79 if __name__ == '__main__':
    80     calculation()

       2.11、绘制图表

     1 import pandas as pd
     2 from configuration import  TEST_NUM
     3 from pyecharts import Bar
     4 import matplotlib.pyplot as plt
     5 import warnings
     6 warnings.filterwarnings("ignore")
     7 
     8 def drawing():
     9     test_num =  TEST_NUM
    10     l_table = []
    11     for i in range(1,test_num+1):
    12         path = './data/test' + str(i)
    13         man_match_table = pd.read_csv(path + '/' + 'man_match_table.csv').set_index('man_id')
    14         woman_match_table = pd.read_csv(path + '/' + 'woman_match_table.csv').set_index('women_id')
    15 
    16         man_match_table = man_match_table.groupby('love_level').count()['range'].sort_values(ascending=False)
    17         woman_match_table = woman_match_table.groupby('love_level').count()['range'].sort_values(ascending=False)
    18 
    19         man_attr = man_match_table.index.tolist()
    20         man_v = man_match_table.values.tolist()
    21         bar_man = Bar('男生匹配对象喜爱程度分布',width=900,height=500)
    22         bar_man.add('频数',man_attr,man_v,mark_line=["max"],label_color = ['#9932CC'])
    23         bar_man.render(path + '/' + '男生匹配对象喜爱程度分布.html')
    24 
    25         women_attr = woman_match_table.index.tolist()
    26         women_v = woman_match_table.values.tolist()
    27         bar_women = Bar('女生匹配对象喜爱程度分布',width=900,height=500)
    28         bar_women.add('频数',women_attr,women_v,mark_line=["max"],label_color = ['#FF3030'])
    29         bar_women.render(path + '/' + '女生匹配对象喜爱程度分布.html')
    30 
    31         detail = pd.read_csv(path + '/' + 'test_detail.csv')
    32         l_table.append(detail)
    33         detail_table = pd.concat(l_table)
    34         s_match_range = detail_table['match_range']
    35         s_man_love_level_mean = detail_table['man_love_level_mean']
    36         women_love_level_mean = detail_table['women_love_level_mean']
    37     fig = plt.figure(figsize=(10, 6), facecolor='gray')
    38     ax1 = fig.add_subplot(2, 2, 1)
    39     ax1.hist(s_match_range,bins = 20,
    40                         histtype = 'bar',
    41                         align = 'mid',
    42                         orientation = 'vertical',
    43                         alpha=0.5,
    44                         normed =False
    45                         )
    46     plt.grid(True)
    47     ax2 = fig.add_subplot(2, 2, 2)
    48     ax2.hist(s_man_love_level_mean, bins=20,
    49                          histtype='bar',
    50                          align='mid',
    51                          orientation='vertical',
    52                          alpha=0.5,
    53                          normed=False
    54                          )
    55     plt.grid(True)
    56     ax3 = fig.add_subplot(2, 2, 3)
    57     ax3.hist(women_love_level_mean, bins=20,
    58                          histtype='bar',
    59                          align='mid',
    60                          orientation='vertical',
    61                          alpha=0.5,
    62                          normed=False
    63                          )
    64     plt.grid(True)
    65     plt.savefig('./data/detail_graph.png',dpi=400)
    66     plt.show()
    67     print('==========================制图完成==========================')

         2.12、配置文件和程序入口

     1 #基本参数配置
     2 
     3 #MAN_NUM:男生样本数量
     4 #WOMAN_NUM:女生样本数量
     5 #TEST_NUM:测试次数
     6 
     7 
     8 MAN_NUM = 100
     9 WOMAN_NUM = 100
    10 TEST_NUM = 50
    1 from scr.loop_script import loop_script
    2 from scr.calculation import calculation
    3 from scr.drawing import drawing
    4 
    5 if __name__ == '__main__':
    6     loop_script()
    7     calculation()
    8     drawing()
    9     print('========================================任务结束========================================')

    三、总结

          通过一个案例,我们更好的理解了python的相关语法和应用,以及结合其他工具的强大的绘图能力和表达能力,对语言的高度浓缩和精炼等等。

      参考资料:https://mp.weixin.qq.com/s?__biz=MzAxMjUyNDQ5OA==&mid=2653557526&idx=1&sn=fa6afe368f3395afc80758f88c81868f&chksm=806e3dabb719b4bd452fe6f2389a56966a65e8e542d34ca475fc8061ace1a591f8ce27be6e8e&mpshare=1&scene=23&srcid=1016fiPYJuCJ4b33lW5V0jpx#rd

  • 相关阅读:
    request.getParameter() 、 request.getInputStream()和request.getReader() 使用体会
    HTTP之Content-Length
    关于spring3中No Session found for current thread!and Transaction的配置和管理(转)
    Java数据类型和MySql数据类型对应一览
    Spring MVC 解读——View,ViewResolver(转)
    LeetCode 441. Arranging Coins
    LeetCode 415. Add Strings
    LeetCode 400. Nth Digit
    LeetCode 367. Valid Perfect Square
    LeetCode 326. Power of Three
  • 原文地址:https://www.cnblogs.com/zyrblog/p/9796759.html
Copyright © 2011-2022 走看看