机器学习100天-day2简单线性回归
一,数据预处理
import matplotlib.pyplot as plt import pandas as pd import numpy as np #数据预处理 dataset = pd.read_csv('D:\100Daysdatasetsstudentscores.csv') X = dataset.iloc[ : , :1 ].values Y = dataset.iloc[ : , 1 ].values from sklearn.model_selection import train_test_split X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25,random_state=0)
数据集
print(dataset.head(5)) Hours Scores 0 2.5 21 1 5.1 47 2 3.2 27 3 8.5 75 4 3.5 30
二,训练集使用简单线性回归模型训练
from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor = regressor.fit(X_train,Y_train)
三,预测结果
Y_pred = regressor.predict(X_test)
四,可视化
#训练集结果 plt.scatter(X_train,Y_train,color = 'red') plt.plot(X_train,regressor.predict(X_train),color = 'blue') plt.show() #测试集结果 plt.scatter(X_test,Y_test,color = 'red') plt.plot(X_test,regressor.predict(X_test),color = 'blue') plt.show()
训练集:
测试集: