zoukankan      html  css  js  c++  java
  • 使用Sklearn-train_test_split 划分数据集

    使用sklearn.model_selection.train_test_split可以在数据集上随机划分出一定比例的训练集和测试集

    1.使用形式为:

    1 from sklearn.model_selection import train_test_split 
    2 X_train, X_test, y_train, y_test = train_test_split(train_data,train_target,test_size=0.2, random_state=0)

    2.参数解释:

    train_data:样本特征集

    train_target:样本的标签集

    test_size:样本占比,测试集占数据集的比重,如果是整数的话就是样本的数量

    random_state:是随机数的种子。在同一份数据集上,相同的种子产生相同的结果,不同的种子产生不同的划分结果

    X_train,y_train:构成了训练集

    X_test,y_test:构成了测试集

    3.举例:

    生成一个包含100个样本的数据集,随机换分出20%为测试集

     1 #py36
     2 #!/usr/bin/env python
     3 # -*- coding: utf-8 -*-
     4 
     5 #from sklearn.cross_validation import train_test_split
     6 from sklearn.model_selection import train_test_split 
     7 
     8 # 生成100条数据:100个2维的特征向量,对应100个标签
     9 X = [["feature ","one "]] * 50 + [["feature ","two "]] * 50
    10 y = [1] * 50 + [2] * 50
    11 
    12 # 随机抽取20%的测试集
    13 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=1)
    14 print ("train:",len(X_train), "test:",len(X_test))
    15 
    16 # 查看被划分出的测试集
    17 for i in range(len(X_test)):
    18     print ("".join(X_test[i]), y_test[i])
    19 
    20 '''
    21 train: 80 test: 20
    22 feature two  2
    23 feature two  2
    24 feature one  1
    25 feature two  2
    26 feature two  2
    27 feature one  1
    28 feature one  1
    29 feature two  2
    30 feature two  2
    31 feature two  2
    32 feature two  2
    33 feature one  1
    34 feature two  2
    35 feature two  2
    36 feature two  2
    37 feature one  1
    38 feature one  1
    39 feature one  1
    40 feature two  2
    41 feature one  1
    42 '''
  • 相关阅读:
    [BI项目记]-文档版本管理笔记
    [BI项目记]-搭建代码管理环境之云端
    [BI项目记]-搭建代码管理环境之客户端
    [BI项目记]-搭建代码管理环境之服务端
    [BI项目记]-配置Sharepoint2013支持文档版本管理笔记
    SQL Server Database 维护计划创建完整的备份策略
    [译]SSAS下玩转PowerShell(三)
    [译]SSAS下玩转PowerShell(二)
    [译]SSAS下玩转PowerShell
    [译]SQL Server分析服务的权限配置
  • 原文地址:https://www.cnblogs.com/cnXuYang/p/8342364.html
Copyright © 2011-2022 走看看