现在sklearn里面改版了,在将来的版本中将不存在cross_validation
以前的版本是:
from sklearn import cross_validation from sklearn import datasets iris = datasets.load_iris() print(iris.data.shape, iris.target.shape) # ((150, 4), (150,)) X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=0) >>> X_train.shape, y_train.shape ((90, 4), (90,)) >>> X_test.shape, y_test.shape ((60, 4), (60,))
现在的版本是:
from sklearn.model_selection import train_test_split from sklearn import datasets iris = datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=.4, random_state=0)
test_size是样本占比。如果是整数的话就是样本的数量。random_state是随机数的种子。