zoukankan      html  css  js  c++  java
  • 6 ways to Sort Pandas Dataframe: Pandas Tutorial

    Often you want to sort Pandas data frame in a specific way. Typically, one may want to sort pandas data frame based on the values of one or more columns or sort based on the values of row index or row names of pandas dataframe. Pandas data frame has two useful functions

    1. sort_values(): to sort pandas data frame by one or more columns
    2. sort_index(): to sort pandas data frame by row index

    Each of these functions come with numerous options, like sorting the data frame in specific order (ascending or descending), sorting in place, sorting with missing values, sorting by specific algorithm and so on.

    Here is a quick Pandas tutorial on multiple ways of using sort_values() and sort_index() to sort pandas data frame using a real data set (gapminder).

    Let us first load the gapminder data from software carpentry URL.

    1
    2
    3
    4
    5
    # read data from url as pandas dataframe
    gapminder = pd.read_csv(data_url)
    # print the first three rows
    print(gapminder.head(n=3))

    1. How to Sort Pandas Dataframe based on the values of a column?

    We can sort pandas dataframe based on the values of a single column by specifying the column name wwe want to sort as input argument to sort_values(). For example, we can sort by the values of “lifeExp” column in the gapminder data like

    1
    sort_by_life = gapminder.sort_values('lifeExp')
    1
    2
    3
    4
    5
    print(sort_by_life.head(n=3))
              country  year        pop continent  lifeExp   gdpPercap
    1292       Rwanda  1992  7290203.0    Africa   23.599  737.068595
    0     Afghanistan  1952  8425333.0      Asia   28.801  779.445314
    552        Gambia  1952   284320.0    Africa   30.000  485.230659

    Note that by default sort_values sorts and gives a new data frame. The new sorted data frame is in ascending order (small values first and large values last). With head function we can see that the first rows have smaller life expectancy. Using tail function the sorted data frame, we can see that the last rows have higher life expectancy.

    1
    2
    3
    4
    5
    print(sort_by_life.tail(n=3))
                 country  year          pop continent  lifeExp    gdpPercap
    802            Japan  2002  127065841.0      Asia   82.000  28604.59190
    671  Hong Kong China  2007    6980412.0      Asia   82.208  39724.97867
    803            Japan  2007  127467972.0      Asia   82.603  31656.06806


    2. How to Sort Pandas Dataframe based on the values of a column (Descending order)?

    To sort a dataframe based on the values of a column but in descending order so that the largest values of the column are at the top, we can use the argument ascending=False.

    1
    sort_by_life = gapminder.sort_values('lifeExp',ascending=False)

    In this example, we can see that after sorting the dataframe by lifeExp with ascending=False, the countries with largest life expectancy are at the top.

    1
    2
    3
    4
    5
    print(sort_by_life.head(n=3))
                 country  year          pop continent  lifeExp    gdpPercap
    803            Japan  2007  127467972.0      Asia   82.603  31656.06806
    671  Hong Kong China  2007    6980412.0      Asia   82.208  39724.97867
    802            Japan  2002  127065841.0      Asia   82.000  28604.59190

    3. How to Sort Pandas Dataframe based on a column and put missing values first?

    Often a data frame might contain missing values and when sorting a data frame on a column with missing value, we might want to have rows with missing values to be at the first or at the last.

    We can specify the position we want for missing values using the argument na_position. With na_position=’first’, it will have the rows with missing values first.

    1
    sort_na_first = gapminder.sort_values('lifeExp',na_position='first')

    In this example, there are NO missing values and that is why there is no na values at the top when sorted with the option na_position=’first’.

    1
    2
    3
    4
    5
    sort_na_first.head()
              country  year        pop continent  lifeExp   gdpPercap
    1292       Rwanda  1992  7290203.0    Africa   23.599  737.068595
    0     Afghanistan  1952  8425333.0      Asia   28.801  779.445314
    552        Gambia  1952   284320.0    Africa   30.000  485.230659

    4. How to Sort Pandas Dataframe based on a column in place?

    By default sorting pandas data frame using sort_values() or sort_index() creates a new data frame. If you don’t want create a new data frame after sorting and just want to do the sort in place, you can use the argument “inplace = True”. Here is an example of sorting a pandas data frame in place without creating a new data frame.

    1
    gapminder.sort_values('lifeExp', inplace=True, ascending=False)

    We can see that the data frame sorted as lifeExp values at the top are smallest and the row indices are not in order.

    1
    2
    3
    4
    5
    print(gapminder.head(n=3))
              country  year        pop continent  lifeExp   gdpPercap
    1292       Rwanda  1992  7290203.0    Africa   23.599  737.068595
    0     Afghanistan  1952  8425333.0      Asia   28.801  779.445314
    552        Gambia  1952   284320.0    Africa   30.000  485.230659

    Note that, the row index of the sorted data frame is different from the data frame before sorting.

    5. How to Sort Pandas Dataframe based on Index (in place)?

    We can use sort_index() to sort pandas dataframe to sort by row index or names. In this example, row index are numbers and in the earlier example we sorted data frame by lifeExp and therefore the row index are jumbled up. We can sort by row index (with inplace=True option) and retrieve the original dataframe.

    1
    gapminder.sort_index(inplace=True)

    Now we can see that row indices start from 0 and sorted in ascending order. Compare it to the previous example, where the first row index is 1292 and row indices are not sorted.

    1
    2
    3
    4
    5
    print(gapminder.head(n=3))
           country  year         pop continent  lifeExp   gdpPercap
    0  Afghanistan  1952   8425333.0      Asia   28.801  779.445314
    1  Afghanistan  1957   9240934.0      Asia   30.332  820.853030
    2  Afghanistan  1962  10267083.0      Asia   31.997  853.100710

    6. How to Sort Pandas Dataframe Based on the Values of Multiple Columns?

    Often, you might want to sort a data frame based on the values of multiple columns. We can specify the columns we want to sort by as a list in the argument for sort_values(). For example, to sort by values of two columns, we can do.

    1
    sort_by_life_gdp = gapminder.sort_values(['lifeExp','gdpPercap'])
    We can see that lifeExp column is sorted in ascending order and for each values of lifeExp, gdpPercap is sorted.
    1
    2
    3
    4
    5
    print(sort_by_life_gdp.head())
              country  year        pop continent  lifeExp   gdpPercap
    1292       Rwanda  1992  7290203.0    Africa   23.599  737.068595
    0     Afghanistan  1952  8425333.0      Asia   28.801  779.445314
    552        Gambia  1952   284320.0    Africa   30.000  485.230659

    Note that when sorting by multiple columns, pandas sort_value() uses the first variable first and second variable next. We can see the difference by switching the order of column names in the list.

    1
    sort_by_life_gdp = gapminder.sort_values(['gdpPercap','lifeExp'])
    1
    2
    3
    4
    5
    print(sort_by_life_gdp.head())
                 country  year         pop continent  lifeExp   gdpPercap
    334  Congo Dem. Rep.  2002  55379852.0    Africa   44.966  241.165877
    335  Congo Dem. Rep.  2007  64606759.0    Africa   46.462  277.551859
    876          Lesotho  1952    748747.0    Africa   42.138  298.846212
  • 相关阅读:
    在Fedora 20下使用TexturePacker
    实战微信JS SDK开发:贺卡制作与播放(1)
    Fedora 20下安装Google PinYin输入法
    Netty5 时间服务器 有粘包问题
    Netty5入门学习笔记003-TCP粘包/拆包问题的解决之道(下)
    Netty5入门学习笔记002-TCP粘包/拆包问题的解决之道(上)
    11g Rac 添加日志组
    工作中的生长与完善——Leo鉴书86
    搜集直方图repeat和skewonly
    com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxError Exception
  • 原文地址:https://www.cnblogs.com/andy-0212/p/11450955.html
Copyright © 2011-2022 走看看