zoukankan      html  css  js  c++  java
  • 使用Pandas加载数据

    1.dataframe对象简述:

    dataframe为pandas中一种有行列索引的二维数据结构,可以看成在普通二维结构上加上行列id标记

    示例为创建一个2X3的dataframe:

     1 import sys
     2 import pandas as pd
     3 import numpy as np
     4 data = pd.DataFrame([[1, 2, 3],[4, 5, 6]], columns=['y0','y1','y2'], index=['x0','x1'])
     5 print ("data:
    ",data)
     6 
     7 '''
     8 data:
     9      y0  y1  y2
    10 x0   1   2   3
    11 x1   4   5   6
    12 '''

    2.利用read函数读取数据到datafame:

    pandas中的read函数可以从各种类型的文件中以及URL中读取数据到一个dataframe

    示例为从一个txt文件中读取三个特征向量,表示长方体的长宽高:

     1 import sys
     2 import pandas as pd
     3 import numpy as np
     4 filepath = "D:\Code\PyCode"
     5 filename = "in.txt"
     6 column_names = ["length", "width", "high"]
     7 #sep="..."规定了分隔符
     8 data = pd.read_table(filepath +"\"+ filename,sep=" ", names = column_names )
     9 print (data,"
    ","data.shape:",data.shape)
    10 '''
    11    length  width  high
    12 0      10     10   100
    13 1      15     11   110
    14 2      22     12   120 
    15  data.shape: (3, 3)
    16 '''

    注意:读取文件到dataframe时,若是指定列的标记,即在read函数中加入names=...,则读取到的data列索引为names指定的id,若是没有这个参数,列索引为源文件的第一行数据

    示例:

     1 import sys
     2 import pandas as pd
     3 import numpy as np
     4 filepath = "D:\Code\PyCode"
     5 filename = "in.txt"
     6 column_names = ["length", "width", "high"]
     7 #sep="..."规定了分隔符
     8 data = pd.read_table(filepath +"\"+ filename,sep=",")
     9 #data = pd.read_table(filepath +"\"+ filename,sep=",", names = column_names )
    10 print (data,"
    ","data.shape:",data.shape)
    11 '''
    12    10  10.1  100
    13 0  15    11  110
    14 1  22    12  120 
    15  data.shape: (2, 3)
    16 '''

    3.对dataframe进行列切片:

    对上面读取到的三行三列的data选取其第二列到第三列:

     1 data2 = data[column_names[1:3]]
     2 print (data2)
     3 print (data2.shape)
     4 '''
     5    length  width  high
     6 0      10     10   100
     7 1      15     11   110
     8 2      22     12   120
     9 (3, 3)
    10    width  high
    11 0     10   100
    12 1     11   110
    13 2     12   120
    14 (3, 2)
    15 '''
    16 data3 = data2[:n]#选取data2的前n行

    4.pandas中读取文件的函数(截图来自《利用python进行数据分析》):

  • 相关阅读:
    js中 offset /client /scroll总结
    python的安装和环境配置
    git详解
    Xmind
    Linux 文件搜索命令
    Linux 文件和目录命令
    Linux 系统关机重启命令
    Linux系统信息命令
    Day07
    ModuleNotFoundError: No module named 'pysqlite2'
  • 原文地址:https://www.cnblogs.com/cnXuYang/p/8392777.html
Copyright © 2011-2022 走看看