读取文件Advertising.csv,文件内容类似于:
1 ,TV,Radio,Newspaper,Sales 2 1,230.1,37.8,69.2,22.1 3 2,44.5,39.3,45.1,10.4 4 3,17.2,45.9,69.3,9.3 5 4,151.5,41.3,58.5,18.5 6 5,180.8,10.8,58.4,12.9 7 6,8.7,48.9,75,7.2 8 7,57.5,32.8,23.5,11.8 9 8,120.2,19.6,11.6,13.2 10 9,8.6,2.1,1,4.8 11 10,199.8,2.6,21.2,10.6 12 11,66.1,5.8,24.2,8.6 13 12,214.7,24,4,17.4 14 13,23.8,35.1,65.9,9.2 15 14,97.5,7.6,7.2,9.7 16 15,204.1,32.9,46,19 17 16,195.4,47.7,52.9,22.4 18 17,67.8,36.6,114,12.5 19 18,281.4,39.6,55.8,24.4 20 19,69.2,20.5,18.3,11.3 21 20,147.3,23.9,19.1,14.6
手动读取:
1 path = '8.Advertising.csv' 2 f = file(path) 3 x = [] 4 y = [] 5 for i, d in enumerate(f): 6 if i == 0: #第一行是标题栏 7 continue 8 d = d.strip() #去除首位空格 9 if not d: 10 continue 11 d = map(float, d.split(',')) #每个数据都变为float 12 x.append(d[1:-1]) 13 y.append(d[-1])
python自带库:
1 f = file(path, 'rb') 2 print f 3 d = csv.reader(f) 4 for line in d: 5 print line 6 f.close()
numpy:
1 p = np.loadtxt(path, delimiter=',', skiprows=1) 2 print p
pandas:
1 data = pd.read_csv(path) # TV、Radio、Newspaper、Sales 2 x = data[['TV', 'Radio', 'Newspaper']] 3 # x = data[['TV', 'Radio']] 4 y = data['Sales']
使用sklearn作文件预处理:
1 from sklearn.preprocessing import StandardScaler 2 le = preprocessing.LabelEncoder() 3 le.fit(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']) 4 print le.classes_ 5 y = le.transform(y)