zoukankan      html  css  js  c++  java
  • map()函数

    1 map()是python的高阶函数,python高阶函数是指可以把函数作为参数的函数,函数式编程就是指这种高度抽象的编程范式.

    要理解高阶函数,首先要明确函数可以赋给变量,函数名本身也是一个变量,也可以给其赋其它值,所以不能随便起变量名,防止与函数名冲突.

    map()函数的特殊的地方在它的第一个参数是个指向函数的变量,第二个参数是一个序列,常为list,它将list中的每一个元素输入函数,最后将每个返回值合并成一个新的list返回.

    参考:https://www.liaoxuefeng.com/wiki/897692888725344/923030148673312

    2 由于map()函数中的第二个参数向第一个参数传参时,是一个一个传,利用这个特点,可是实现字符串的分割.

    map()函数在python2中返回的是列表,而在python3中返回的是一个map对象.要想输出显示要加list()

    a = 12300000
    def shuchu(k):
        return k
    print(map(shuchu, str(a)))
    b = list(map(shuchu, str(a)))
    print(b)
    # <map object at 0x7ff47f3c0828>
    # ['1', '2', '3', '0', '0', '0', '0', '0']
    
    # 直接用字典对list是无法映射的,下面的写法错误
    d = {1:'(', -1:')'}
    c = list(map(d, [1,1,-1]))
    # TypeError: 'dict' object is not callable
    View Code

    参考:https://www.cnblogs.com/linshuhui/p/8980927.html

    3 map函数可以做两个dataframe表格的某些列的融合.

    import pandas as pd
    import numpy as np
    
    df1 = pd.DataFrame( {'A':[1,2,3,'df2的索引里没有这个,所以融合后是空'],
                         'B':['a','b','c','d'],
                         'C':['Tom','Jack','Bob','roushi']
                       })
    print(df1)
    df2 = pd.DataFrame( {'A':[1,2,3,4],
                         'B':[6,7,8,9]})
    print(df2)
    # 相当于从df1的A列与df2的索引融合后,再做映射
    df1['df1的A列与df2的索引做融合后再映射'] = df1['A'].map(df2['B'])
    print(df1)
    #                      A  B       C
    # 0                    1  a     Tom
    # 1                    2  b    Jack
    # 2                    3  c     Bob
    # 3  df2的索引里没有这个,所以融合后是空  d  roushi
    #    A  B
    # 0  1  6
    # 1  2  7
    # 2  3  8
    # 3  4  9
    #                      A  B       C  df1的A列与df2的索引做融合后再映射
    # 0                    1  a     Tom                   7.0
    # 1                    2  b    Jack                   8.0
    # 2                    3  c     Bob                   9.0
    # 3  df2的索引里没有这个,所以融合后是空  d  roushi                   NaN
    View Code

    4 基本用法

    a.字典映射

    import pandas as pd
    from pandas import Series, DataFrame
    
    data = DataFrame({'food':['bacon','pulled pork','bacon','Pastrami',
       'corned beef','Bacon','pastrami','honey ham','nova lox'],
         'ounces':[4,3,12,6,7.5,8,3,5,6]})
    meat_to_animal = {
     'bacon':'pig',
     'pulled pork':'pig',
     'pastrami':'cow',
     'corned beef':'cow',
     'honey ham':'pig',
     'nova lox':'salmon' }
    # Python lower() 方法转换字符串中所有大写字符为小写。  因为meat_to_animal中的食物是小写,food列的食物是大写
    data['animal'] = data['food'].map(str.lower).map(meat_to_animal)
    print(data)
    print(data.info())
    a = data['food'].map(lambda x: meat_to_animal[x.lower()])
    print(a)
    #           food  ounces  animal
    # 0        bacon     4.0     pig
    # 1  pulled pork     3.0     pig
    # 2        bacon    12.0     pig
    # 3     Pastrami     6.0     cow
    # 4  corned beef     7.5     cow
    # 5        Bacon     8.0     pig
    # 6     pastrami     3.0     cow
    # 7    honey ham     5.0     pig
    # 8     nova lox     6.0  salmon
    # <class 'pandas.core.frame.DataFrame'>
    # RangeIndex: 9 entries, 0 to 8
    # Data columns (total 3 columns):
    # food      9 non-null object
    # ounces    9 non-null float64
    # animal    9 non-null object
    # dtypes: float64(1), object(2)
    # memory usage: 296.0+ bytes
    # None
    # 0       pig
    # 1       pig
    # 2       pig
    # 3       cow
    # 4       cow
    # 5       pig
    # 6       cow
    # 7       pig
    # 8    salmon
    # Name: food, dtype: object
    
    import pandas as pd
    df1 = pd.DataFrame({'a':[1,2,3,4,5],
                        'b':['','','','','']})
    df2 = pd.DataFrame({'c':[5,4,2,1,2,3]})
    d = df2['c'].map(dict(zip(df1['a'],df1['b'])))
    print(d)
    # 0    五
    # 1    四
    # 2    二
    # 3    一
    # 4    二
    # 5    三
    # Name: c, dtype: object
    View Code

    b.与lambda结合使用,函数较简单时,用lambda非常快.

    import pandas as pd
    from pandas import Series, DataFrame
    
    index = pd.date_range('2017-08-15', periods=10)
    ser = Series(list(range(10)), index=index)
    print(ser)
    ser.index = ser.index.map(lambda x: x.day)
    print(ser)
    # 2017-08-15    0
    # 2017-08-16    1
    # 2017-08-17    2
    # 2017-08-18    3
    # 2017-08-19    4
    # 2017-08-20    5
    # 2017-08-21    6
    # 2017-08-22    7
    # 2017-08-23    8
    # 2017-08-24    9
    # Freq: D, dtype: int64
    # 15    0
    # 16    1
    # 17    2
    # 18    3
    # 19    4
    # 20    5
    # 21    6
    # 22    7
    # 23    8
    # 24    9
    # dtype: int64
    
    # 实现两个list中元素相乘后再求和
    # 注意这里map作用于list时,是将list中的元素一个一个的传过去
    a = [1,2,3,4]
    b = [2,3,4,5]
    sumab = sum(map(lambda x,y:x*y, a,b))
    print(sumab)
    View Code

    c.函数映射

    d.series映射

    参见3

    5 Pool和ThreadPool两个模块, 一个基于进程工作, 一个基于线程工作。

    使用Pool:

    import datetime as dt
    import matplotlib.pyplot as plt
    import dask.dataframe as dd
    from multiprocessing import Pool  
    
    listdata = []
    processnum = 12
    user_repay = pd.read_hdf('../data/user_repay_second.h5')
    
    for i in range(processnum):
        datai = fenpei(user_repay, i, processnum)
        # print(datai['index'].nunique())# 以index分箱
        listdata.append([i, datai])
    del datai
    gc.collect()
    time1 = dt.datetime.now()
    with Pool(processnum) as p:
        p.map(tfun, listdata)
    print((dt.datetime.now() - time1).total_seconds())
    del listdata
    gc.collect()
    View Code

    使用ThreadPool:

    import time
    from datetime import datetime
    from multiprocessing.dummy import Pool as ThreadPool
    from functools import partial
    
    def add(x, y):
        print(datetime.now(), "enter add func...")
        time.sleep(2)
        print(datetime.now(), "leave add func...")
        return x+y
    def add_wrap(args):
        return add(*args)
    
    if __name__ == "__main__":
        pool = ThreadPool(4) # 池的大小为4
        print(pool.map(add_wrap, [(1,2),(3,4),(5,6)]))
        #close the pool and wait for the worker to exit
        pool.close()
        pool.join()
    View Code

    参考:https://blog.csdn.net/moxiaomomo/article/details/77075125

              https://www.liaoxuefeng.com/wiki/1016959663602400/1017629247922688

  • 相关阅读:
    Golang的标准命令简述
    Golang的环境安装
    初识Golang编程语言
    基于Ambari的WebUI部署Hive服务
    基于Ambari Server部署HDP集群实战案例
    HBase shell常用命令总结
    HBase完全分布式集群搭建
    HBase工作原理概述
    面向对象-接口(interface)实战案例
    myBatis 简介
  • 原文地址:https://www.cnblogs.com/xxswkl/p/11069588.html
Copyright © 2011-2022 走看看