zoukankan      html  css  js  c++  java
  • pandas_dataframe01

    1. 如何从csv文件只读取前几行的数据
      # 只读取前2行和指定列的数据
      df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv',nrows=2,usecols=['Model','Length'])
      df
      
      #>        Model    Length
      0    Integra    177
      1    Legend    195
    2. 如何从csv文件中每隔n行来创建dataframe
      # 每隔50行读取一行数据
      df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', chunksize=50)
      df2 = pd.DataFrame()
      for chunk in df:
          # 获取series
          df2 = df2.append(chunk.iloc[0,:])
      
      #显示前5行
      print(df2.head())
      
      #>                      crim    zn  indus chas                  nox     rm   age  \
          0              0.21977   0.0   6.91    0  0.44799999999999995  5.602  62.0   
          1               0.0686   0.0   2.89    0                0.445  7.416  62.5   
          2   2.7339700000000002   0.0  19.58    0                0.871  5.597  94.9   
          3               0.0315  95.0   1.47    0  0.40299999999999997  6.975  15.3   
          4  0.19072999999999998  22.0   5.86    0                0.431  6.718  17.5   
          
                dis rad  tax ptratio       b  lstat  medv  
          0  6.0877   3  233    17.9   396.9   16.2  19.4  
          1  3.4952   2  276    18.0   396.9   6.19  33.2  
          2  1.5257   5  403    14.7  351.85  21.45  15.4  
          3  7.6534   3  402    17.0   396.9   4.56  34.9  
          4  7.8265   7  330    19.1  393.74   6.56  26.2  
    3. 如何改变导入csv文件的列值
      改变列名‘medv’的值,当列值≤25时,赋值为‘Low’;列值>25时,赋值为‘High’.
      # 使用converters参数,改变medv列的值
      df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', 
                       converters={'medv': lambda x: 'High' if float(x) > 25 else 'Low'})
      print(df.head())
      
      #>            b  lstat  medv
          0  396.90   4.98   Low  
          1  396.90   9.14   Low  
          2  392.83   4.03  High  
          3  394.63   2.94  High  
          4  396.90   5.33  High 
    4. 如何从csv文件导入指定的列
      # 导入指定的列:crim和medv
      df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', usecols=['crim', 'medv'])
      # 打印前四行dataframe信息
      print(df.head())
      
      #>          crim  medv
          0  0.00632  24.0
          1  0.02731  21.6
          2  0.02729  34.7
          3  0.03237  33.4
          4  0.06905  36.2
    5. 如何得到dataframe的行,列,每一列的类型和相应的描述统计信息
      df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
      
      #  打印dataframe的行和列
      print(df.shape)
      
      # 打印dataframe每列元素的类型显示前5行
      print(df.dtypes.head())
      
      # 统计各类型的数目,方法1
      print(df.get_dtype_counts())
      # 统计各类型的数目,方法2
      # print(df.dtypes.value_counts())
      
      # 描述每列的统计信息,如std,四分位数等
      df_stats = df.describe()
      # dataframe转化数组
      df_arr = df.values
      # 数组转化为列表
      df_list = df.values.tolist()
      
      #>    (93, 27)
          Manufacturer     object
          Model            object
          Type             object
          Min.Price       float64
          Price           float64
          dtype: object
          float64    18
          object      9
          dtype: int64
    6. 如何获取给定条件的行和列
      import numpy as np
      df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
      # print(df)
      # 获取最大值的行和列
      row, col = np.where(df.values == np.max(df.Price))
      # 行和列获取最大值
      print(df.iat[row[0], col[0]])
      df.iloc[row[0], col[0]]
      
      # 行索引和列名获取最大值
      df.at[row[0], 'Price']
      df.get_value(row[0], 'Price')
      
      #>    61.9    
  • 相关阅读:
    IntelliJIDEA永久注册使用
    并行设计模式(二)-- Master-Worker模式
    Guava之CaseFormat
    solr6.3.0升级与IK动态词库自动加载
    算法思维
    并发库应用之一 & ThreadLocal实现线程范围的共享变量
    filecoin里程碑事件
    博客园 增加打赏功能
    session-token-cookie讲解
    golang原生的RPC实现
  • 原文地址:https://www.cnblogs.com/huaobin/p/15687061.html
Copyright © 2011-2022 走看看