zoukankan      html  css  js  c++  java
  • 沉淀,再出发:结合案例看python

    沉淀,再出发:结合案例看python

    一、前言

      关于python,如果不经过大型程序开发的洗礼,我们很难说自己已经懂得了python了,因此,我们需要通过稍微结构化的编程来学习python。

    二、一个案例

       首先我们看一下需要具备的前提知识。

       2.1、新建pandas表格

    man_num = 100
    women_num = 100
    pd.DataFrame( [['w'+str(i) for i in random.sample(range(1,women_num+1),women_num)] for j in range(man_num)], index = ['m'+str(i) for i in range(1,man_num+1)], columns = ['level'+str(i) for i in range(1,women_num+1)] )

        通过上述的方式,我们创建出了一个表格:

       其中random的sample方法,我们可以从下面的例子中理解,在上面就是挑选自身的一种置换来填充每一个单元,for j in range(man_num)其实就是重复多少行的意思:

    import random
    
    list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    for i in range(3):
        slice = random.sample(list, 5)  # 从list中随机获取5个元素,作为一个片断返回
        print(slice)
        print(list, '
    ')  # 原有序列并没有改变

          2.2、reset_index()和to_csv()

        def to_csv(self, path=None, index=True, sep=",", na_rep='',
                   float_format=None, header=False, index_label=None,
                   mode='w', encoding=None, compression=None, date_format=None,
                   decimal='.'):
            """
            Write Series to a comma-separated values (csv) file
            Parameters
            ----------
            path : string or file handle, default None
                File path or object, if None is provided the result is returned as
                a string.
            na_rep : string, default ''
                Missing data representation
            float_format : string, default None
                Format string for floating point numbers
            header : boolean, default False
                Write out series name
            index : boolean, default True
                Write row names (index)
            index_label : string or sequence, default None
                Column label for index column(s) if desired. If None is given, and
                `header` and `index` are True, then the index names are used. A
                sequence should be given if the DataFrame uses MultiIndex.
            mode : Python write mode, default 'w'
            sep : character, default ","
                Field delimiter for the output file.
            encoding : string, optional
                a string representing the encoding to use if the contents are
                non-ascii, for python versions prior to 3
            compression : string, optional
                A string representing the compression to use in the output file.
                Allowed values are 'gzip', 'bz2', 'zip', 'xz'. This input is only
                used when the first argument is a filename.
            date_format: string, default None
                Format string for datetime objects.
            decimal: string, default '.'
                Character recognized as decimal separator. E.g. use ',' for
                European data
            """
            from pandas.core.frame import DataFrame
            df = DataFrame(self)
            # result is only a string if no path provided, otherwise None
            result = df.to_csv(path, index=index, sep=sep, na_rep=na_rep,
                               float_format=float_format, header=header,
                               index_label=index_label, mode=mode,
                               encoding=encoding, compression=compression,
                               date_format=date_format, decimal=decimal)
            if path is None:
               return result

         2.3、stack()和unstack()

         在用pandas进行数据重排时,经常用到stack和unstack两个函数。stack的意思是堆叠,堆积,unstack即“不要堆叠”。常见的数据的层次化结构有两种,一种是表格,一种是“花括号”,即下面这样的l两种形式:

     

           表格在行列方向上均有索引(类似于DataFrame),花括号结构只有“列方向”上的索引(类似于层次化的Series),结构更加偏向于堆叠(Series-stack,方便记忆)。stack函数会将数据从”表格结构“变成”花括号结构“,即将其行索引变成列索引,反之,unstack函数将数据从”花括号结构“变成”表格结构“,即要将其中一层的列索引变成行索引。

    import numpy as np
    import pandas as pd
    from pandas import Series,DataFrame
    data=DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['street1','street2']),columns=pd.Index(['one','two','three']))
    print(data)
    print('-----------------------------------------
    ')
    data2=data.stack()
    data3=data2.unstack()
    print(data2)
    print('-----------------------------------------
    ')
    print(data3)

       2.4、DataFrame上的concat()

        2.5、案例

    1    有一座城市,当地风俗是,想结婚的男子必须先向心仪的女子求婚,而女子则需要等待求婚。
    2    牧师每年会邀请人数相同的适婚男女参与一次集体相亲。一次相亲活动可能有很多轮,男子会首先向自己最爱的女子求婚,女子则会在所有的追求者中选择她的最爱;
    如果男子被拒绝,下一轮会向他第二喜欢的女子求婚;上一轮已经订婚的女子如果得到她更爱的人的求婚,则会毫不留情地抛弃未婚夫,和更爱的人在一起。
    被抛弃的男子需要重新参与求婚。如此反复,等大家都订婚,就举办集体婚礼。
    3 假设: 4 1)参加求婚的男女数量保持一致 5 2)每个男子都按喜爱程度对女子进行排序,比如最爱a,其次爱b,再次爱c 6 3)每个女子也同样给每个男子排序

        2.6、安装 pyecharts和pyecharts_snapshot

        2.7、创造样例

     1 import pandas as pd
     2 import random
     3 
     4 from configuration import MAN_NUM,WOMAN_NUM
     5 
     6 
     7 def create_sample():
     8     man_num = MAN_NUM
     9     women_num = WOMAN_NUM
    10 
    11     #设置男女生喜好样本
    12     print('==============================生成样本数据==============================')
    13     man = pd.DataFrame( [['w'+str(i) for i in random.sample(range(1,women_num+1),women_num)] 
    14                           for i in range(man_num)],
    15                         index = ['m'+str(i) for i in range(1,man_num+1)],
    16                         columns = ['level'+str(i) for i in range(1,women_num+1)]
    17                         )
    18 
    19     women = pd.DataFrame( [['m'+str(i) for i in random.sample(range(1,man_num+1),man_num)] 
    20                           for i in range(women_num)],
    21                         index = ['w'+str(i) for i in range(1,women_num+1)],
    22                         columns = ['level'+str(i) for i in range(1,man_num+1)]
    23                         )
    24     return (man,women)
    25 
    26 if __name__ == '__main__':
    27     create_sample(man,women)

       2.8、生成映射表

     1 import pandas as pd
     2 
     3 #设置姻缘关系表
     4 def create_mapping_table(man,women):
     5 
     6     man_ismapping = pd.DataFrame({
     7             'man_id':man.index,
     8             'target':'n',
     9             'love_level':0,
    10             'range':0
    11             }).set_index('man_id')
    12 
    13     women_ismapping = pd.DataFrame({
    14             'women_id':women.index,
    15             'target':'n',
    16             'love_level':0,
    17             'range':0
    18             }).set_index('women_id')
    19     return (man_ismapping,women_ismapping)
    20 
    21 if __name__ == '__main__':
    22     create_mapping_table(man,women)

        2.9、创建目录,完成初始化

     1 from configuration import  TEST_NUM
     2 import os
     3 from scr.create_sample import create_sample
     4 from scr.create_mapping_table import create_mapping_table
     5 import pandas as pd
     6 
     7 def loop_script():
     8     test_num = TEST_NUM
     9     if not os.path.exists('./data'):
    10         os.makedirs('./data')
    11 
    12     for i in range(1,test_num+1):
    13         print('==============================开始创建测试文件夹{}=============================='.format(i))
    14         path = './data/test' + str(i)
    15         if not os.path.exists(path):
    16             os.makedirs(path)
    17         sample_data = create_sample()
    18         man = sample_data[0]
    19         women = sample_data[1]
    20         man.reset_index().to_csv(path+'/'+'man_sample.csv', index=0)
    21         women.reset_index().to_csv(path+'/'+'woman_sample.csv', index=0)
    22         print('==============================样本数据生成成功==============================')
    23         print('==============================创建婚姻关系表==============================')
    24         man = pd.read_csv(path+'/'+'man_sample.csv').set_index('index')
    25         women = pd.read_csv(path+'/'+'woman_sample.csv').set_index('index')
    26         mapping_data = create_mapping_table(man, women)
    27         man_ismapping = mapping_data[0]
    28         women_ismapping = mapping_data[1]
    29         man_ismapping.reset_index().to_csv(path+'/'+'man_ismapping.csv', index=0)
    30         women_ismapping.reset_index().to_csv(path+'/'+'women_ismapping.csv', index=0)
    31         print('==============================婚姻关系表创建完成==============================')

       2.10、业务逻辑计算

     1 import pandas as pd
     2 from configuration import  TEST_NUM
     3 
     4 def calculation():
     5     test_num =  TEST_NUM
     6     for i in range(1,test_num+1):
     7         path = './data/test' + str(i)
     8         man = pd.read_csv(path + '/' + 'man_sample.csv').set_index('index')
     9         women = pd.read_csv(path + '/' + 'woman_sample.csv').set_index('index')
    10         man_ismapping = pd.read_csv(path + '/' + 'man_ismapping.csv').set_index('man_id')
    11         women_ismapping = pd.read_csv(path + '/' + 'women_ismapping.csv').set_index('women_id')
    12         print('==============================测试集{}模拟开始=============================='.format(i))
    13         print('==============================开始模拟求婚过程==============================')
    14         level_num = 0
    15         while man_ismapping['love_level'].min() == 0:
    16             level_num += 1
    17             print('==============================开始第{}天婚姻配对=============================='.format(level_num))
    18             u_mapping_man = man_ismapping[man_ismapping.target == 'n'].index.tolist()
    19 
    20             if level_num < 2:
    21                 level_col = 'level' + str(level_num)
    22                 man_choose = man[man.index.isin(u_mapping_man)][level_col].to_frame().reset_index()
    23                 man_choose.columns = ['man_id', 'women_id']
    24                 man_choose['range'] = 1
    25             else:
    26                 m_id = u_mapping_man
    27                 l = []
    28                 for man_id in m_id:
    29                     col_n = int(man_ismapping[man_ismapping.index == man_id].range[0])
    30                     level_col = 'level' + str(col_n + 1)
    31                     women_id = man[man.index == man_id][level_col][0]
    32                     rg = col_n + 1
    33                     l.append([man_id, women_id, rg])
    34                 man_choose = pd.DataFrame(l, columns=['man_id', 'women_id', 'range'])
    35 
    36             for r in range(0, len(man_choose)):
    37                 relationship = man_choose[man_choose.index == r]
    38                 m = [i for i in relationship['man_id']][0]
    39                 w = [i for i in relationship['women_id']][0]
    40                 find = women[women.index == w].unstack().reset_index()
    41                 find.columns = ['level', 'women_id', 'man_id']
    42                 find = int([i for i in find[find['man_id'] == m]['level']][0].split('level')[1])
    43                 o_love_level = [i for i in women_ismapping[women_ismapping.index == w]['love_level']][0]
    44                 rg = [i for i in relationship['range']][0]
    45                 if o_love_level == 0:
    46                     women_ismapping.loc[w, 'love_level'] = find
    47                     women_ismapping.loc[w, 'target'] = m
    48                     women_ismapping.loc[w, 'range'] = level_num
    49                     man_ismapping.loc[m, 'love_level'] = rg
    50                     man_ismapping.loc[m, 'target'] = w
    51                     man_ismapping.loc[m, 'range'] = rg
    52                 elif o_love_level > find:
    53                     m_o = women_ismapping.loc[w, 'target']
    54                     man_ismapping.loc[m_o, 'love_level'] = 0
    55                     man_ismapping.loc[m_o, 'target'] = 'n'
    56                     man_ismapping.loc[m, 'love_level'] = rg
    57                     man_ismapping.loc[m, 'target'] = w
    58                     man_ismapping.loc[m, 'range'] = rg
    59                     women_ismapping.loc[w, 'love_level'] = find
    60                     women_ismapping.loc[w, 'target'] = m
    61                     women_ismapping.loc[w, 'range'] = level_num
    62                 else:
    63                     man_ismapping.loc[m, 'range'] = rg
    64                     pass
    65 
    66         print('==============================婚姻配对完成==============================')
    67         print('共进行了{}次牵线搭桥,在第{}天举办集体婚礼。'.format(level_num, level_num + 1))
    68 
    69         print('==============================导出配对明细表==============================')
    70         man_love_level_mean = man_ismapping.love_level.mean()
    71         women_love_level_mean = women_ismapping.love_level.mean()
    72         detail = [[level_num,level_num,man_love_level_mean,women_love_level_mean]]
    73         pd.DataFrame(detail,columns=['match_range','party_time','man_love_level_mean','women_love_level_mean'])
    74                 .to_csv(path+'/'+'test_detail.csv', index=0)
    75         man_ismapping.reset_index().to_csv(path+'/'+'man_match_table.csv', index=0)
    76         women_ismapping.reset_index().to_csv(path+'/'+'woman_match_table.csv', index=0)
    77         print('==============================导出完毕==============================')
    78 
    79 if __name__ == '__main__':
    80     calculation()

       2.11、绘制图表

     1 import pandas as pd
     2 from configuration import  TEST_NUM
     3 from pyecharts import Bar
     4 import matplotlib.pyplot as plt
     5 import warnings
     6 warnings.filterwarnings("ignore")
     7 
     8 def drawing():
     9     test_num =  TEST_NUM
    10     l_table = []
    11     for i in range(1,test_num+1):
    12         path = './data/test' + str(i)
    13         man_match_table = pd.read_csv(path + '/' + 'man_match_table.csv').set_index('man_id')
    14         woman_match_table = pd.read_csv(path + '/' + 'woman_match_table.csv').set_index('women_id')
    15 
    16         man_match_table = man_match_table.groupby('love_level').count()['range'].sort_values(ascending=False)
    17         woman_match_table = woman_match_table.groupby('love_level').count()['range'].sort_values(ascending=False)
    18 
    19         man_attr = man_match_table.index.tolist()
    20         man_v = man_match_table.values.tolist()
    21         bar_man = Bar('男生匹配对象喜爱程度分布',width=900,height=500)
    22         bar_man.add('频数',man_attr,man_v,mark_line=["max"],label_color = ['#9932CC'])
    23         bar_man.render(path + '/' + '男生匹配对象喜爱程度分布.html')
    24 
    25         women_attr = woman_match_table.index.tolist()
    26         women_v = woman_match_table.values.tolist()
    27         bar_women = Bar('女生匹配对象喜爱程度分布',width=900,height=500)
    28         bar_women.add('频数',women_attr,women_v,mark_line=["max"],label_color = ['#FF3030'])
    29         bar_women.render(path + '/' + '女生匹配对象喜爱程度分布.html')
    30 
    31         detail = pd.read_csv(path + '/' + 'test_detail.csv')
    32         l_table.append(detail)
    33         detail_table = pd.concat(l_table)
    34         s_match_range = detail_table['match_range']
    35         s_man_love_level_mean = detail_table['man_love_level_mean']
    36         women_love_level_mean = detail_table['women_love_level_mean']
    37     fig = plt.figure(figsize=(10, 6), facecolor='gray')
    38     ax1 = fig.add_subplot(2, 2, 1)
    39     ax1.hist(s_match_range,bins = 20,
    40                         histtype = 'bar',
    41                         align = 'mid',
    42                         orientation = 'vertical',
    43                         alpha=0.5,
    44                         normed =False
    45                         )
    46     plt.grid(True)
    47     ax2 = fig.add_subplot(2, 2, 2)
    48     ax2.hist(s_man_love_level_mean, bins=20,
    49                          histtype='bar',
    50                          align='mid',
    51                          orientation='vertical',
    52                          alpha=0.5,
    53                          normed=False
    54                          )
    55     plt.grid(True)
    56     ax3 = fig.add_subplot(2, 2, 3)
    57     ax3.hist(women_love_level_mean, bins=20,
    58                          histtype='bar',
    59                          align='mid',
    60                          orientation='vertical',
    61                          alpha=0.5,
    62                          normed=False
    63                          )
    64     plt.grid(True)
    65     plt.savefig('./data/detail_graph.png',dpi=400)
    66     plt.show()
    67     print('==========================制图完成==========================')

         2.12、配置文件和程序入口

     1 #基本参数配置
     2 
     3 #MAN_NUM:男生样本数量
     4 #WOMAN_NUM:女生样本数量
     5 #TEST_NUM:测试次数
     6 
     7 
     8 MAN_NUM = 100
     9 WOMAN_NUM = 100
    10 TEST_NUM = 50
    1 from scr.loop_script import loop_script
    2 from scr.calculation import calculation
    3 from scr.drawing import drawing
    4 
    5 if __name__ == '__main__':
    6     loop_script()
    7     calculation()
    8     drawing()
    9     print('========================================任务结束========================================')

    三、总结

          通过一个案例,我们更好的理解了python的相关语法和应用,以及结合其他工具的强大的绘图能力和表达能力,对语言的高度浓缩和精炼等等。

      参考资料:https://mp.weixin.qq.com/s?__biz=MzAxMjUyNDQ5OA==&mid=2653557526&idx=1&sn=fa6afe368f3395afc80758f88c81868f&chksm=806e3dabb719b4bd452fe6f2389a56966a65e8e542d34ca475fc8061ace1a591f8ce27be6e8e&mpshare=1&scene=23&srcid=1016fiPYJuCJ4b33lW5V0jpx#rd

  • 相关阅读:
    简单了解一下:var 、let、const
    C# FlagAttriute 的 小妙招
    项目经验面试题
    linux面试题详解
    jvm面试题详解
    数据库面试详解
    微服务框架面试题
    框架面试题(maven、ZooKeeper、Dubbo、Nginx、Redis、Lucene、Solr、ActiveMQ、JMS
    设计模式面试题详解
    WEB方面面试题详解
  • 原文地址:https://www.cnblogs.com/zyrblog/p/9796759.html
Copyright © 2011-2022 走看看