zoukankan      html  css  js  c++  java
  • 怎样解决python dataframe loc,iloc循环处理速度很慢的问题

    怎样解决python dataframe loc,iloc循环处理速度很慢的问题


    最近用DataFrame做大数据 处理,发现处理速度特别慢,追究原因,发现是循环处理时,loc,iloc速度都特别慢,当数据量特别大得时候真的是超级慢。查很多资料,发现没有详细说明,以下为解决办法


    使用 Pandas.Series.apply 方法,可以对一列数据快速进行处理

    Series.apply(*func*, *convert_dtype=True*, *args=()*, ***kwds*)


    To lunch typora from Terminal, you could add

    func : function
    convert_dtype : boolean, default True
        Try to find better dtype for elementwise function results. If False, leave as dtype=object
    args : tuple
        Positional arguments to pass to function in addition to the value
    Additional keyword arguments will be passed as keywords to the function


    # 首先导入数据
    >>> import pandas as pd
    >>> import numpy as np
    >>> series = pd.Series([20, 21, 12], index=['London','New York','Helsinki'])
    >>> series
    London      20
    New York    21
    Helsinki    12
    dtype: int64
    # 应用1,把每个值都*2
    >>> def square(x):
    ...     return x**2
    >>> series.apply(square)
    London      400
    New York    441
    Helsinki    144
    dtype: int64
    >>> series.apply(lambda x: x**2)
    London      400
    New York    441
    Helsinki    144
    dtype: int64
    # 应用2,相减
    >>> def subtract_custom_value(x, custom_value):
    ...     return x-custom_value
    >>> series.apply(subtract_custom_value, args=(5,))
    London      15
    New York    16
    Helsinki     7
    dtype: int64
    # 使用numpy library中得函数
    >>> series.apply(np.log)
    London      2.995732
    New York    3.044522
    Helsinki    2.484907
    dtype: float64



  • 相关阅读:
    POJ 3084 Panic Room
    HDU 4111 Alice and Bob
    POJ 2125 Destroying The Graph
    HDU 1517 A Multiplication Game
    Codeforces 258B Little Elephant and Elections
    HDU 2448 Mining Station on the Sea
    ACM MST 畅通工程再续
    ACM DS 畅通工程
    ACM DS 还是畅通工程
    ACM DS Constructing Roads
  • 原文地址:https://www.cnblogs.com/gaoss/p/7657044.html
Copyright © 2011-2022 走看看