zoukankan      html  css  js  c++  java
  • sklearn数据集

    一、sklearn数据集概述

    (一)数据集划分

     机器学习一般的数据集会划分为两个部分:

    • 训练数据
    • 测试数据

    1、训练数据

     用于训练,构建模型,一般可设定占整个数据集的75%

    2、测试数据

    在模型检验时使用,用于评估模型是否有效,一般可设定占整个数据集的25%

    (二)sklearn数据集接口介绍

     sklearn中的API中给我们提供了一些数据集供我们使用:

    sklearn.datasets.load_*
    sklearn.datasets.fetch_*

    加载获取流行数据集:

    • datasets.load_*()   获取小规模数据集,数据包含在datasets里
    • datasets.fetch_*(data_home=None)  获取大规模数据集,需要从网络上下载,函数的第一个参数是data_home,表示数据集,下载的目录,默认是 ~/scikit_learn_data/

    其返回的数据类型和参数:

    • load*和fetch*返回的数据类型datasets.base.Bunch(字典格式)
    • data:特征数据数组,是 [n_samples * n_features] 的二维numpy.ndarray 数组
    • target:标签数组,是 n_samples 的一维 numpy.ndarray 数组
    • DESCR:数据描述
    • feature_names:特征名,新闻数据,手写数字、回归数据集没有
    • target_names:标签名,回归数据集没有

    二、分类与回归数据集

    (一)sklearn分类数据集

    1、分类数据集接口的使用

    from sklearn.datasets import load_iris
    
    def classi():
        data = load_iris()
        print(data.data)  #获取特征值
        print(data.target) #获取目标值
        print(data.DESCR) #获取描述
    
    if __name__ == '__main__':
        classi()

    sklearn.datasets.load_iris加载并返回鸢尾花数据集。

    其中data参数中包含了特征值数以及样本数:

    [[5.1 3.5 1.4 0.2]
     [4.9 3.  1.4 0.2]
     [4.7 3.2 1.3 0.2]
     [4.6 3.1 1.5 0.2]
     [5.  3.6 1.4 0.2]
     [5.4 3.9 1.7 0.4]
     [4.6 3.4 1.4 0.3]
     [5.  3.4 1.5 0.2]
     [4.4 2.9 1.4 0.2]
     [4.9 3.1 1.5 0.1]
     [5.4 3.7 1.5 0.2]
     [4.8 3.4 1.6 0.2]
     [4.8 3.  1.4 0.1]
     [4.3 3.  1.1 0.1]
     [5.8 4.  1.2 0.2]
     [5.7 4.4 1.5 0.4]
     [5.4 3.9 1.3 0.4]
     [5.1 3.5 1.4 0.3]
     [5.7 3.8 1.7 0.3]
     [5.1 3.8 1.5 0.3]
     [5.4 3.4 1.7 0.2]
     [5.1 3.7 1.5 0.4]
     [4.6 3.6 1.  0.2]
     [5.1 3.3 1.7 0.5]
     [4.8 3.4 1.9 0.2]
     [5.  3.  1.6 0.2]
     [5.  3.4 1.6 0.4]
     [5.2 3.5 1.5 0.2]
     [5.2 3.4 1.4 0.2]
     [4.7 3.2 1.6 0.2]
     [4.8 3.1 1.6 0.2]
     [5.4 3.4 1.5 0.4]
     [5.2 4.1 1.5 0.1]
     [5.5 4.2 1.4 0.2]
     [4.9 3.1 1.5 0.2]
     [5.  3.2 1.2 0.2]
     [5.5 3.5 1.3 0.2]
     [4.9 3.6 1.4 0.1]
     [4.4 3.  1.3 0.2]
     [5.1 3.4 1.5 0.2]
     [5.  3.5 1.3 0.3]
     [4.5 2.3 1.3 0.3]
     [4.4 3.2 1.3 0.2]
     [5.  3.5 1.6 0.6]
     [5.1 3.8 1.9 0.4]
     [4.8 3.  1.4 0.3]
     [5.1 3.8 1.6 0.2]
     [4.6 3.2 1.4 0.2]
     [5.3 3.7 1.5 0.2]
     [5.  3.3 1.4 0.2]
     [7.  3.2 4.7 1.4]
     [6.4 3.2 4.5 1.5]
     [6.9 3.1 4.9 1.5]
     [5.5 2.3 4.  1.3]
     [6.5 2.8 4.6 1.5]
     [5.7 2.8 4.5 1.3]
     [6.3 3.3 4.7 1.6]
     [4.9 2.4 3.3 1. ]
     [6.6 2.9 4.6 1.3]
     [5.2 2.7 3.9 1.4]
     [5.  2.  3.5 1. ]
     [5.9 3.  4.2 1.5]
     [6.  2.2 4.  1. ]
     [6.1 2.9 4.7 1.4]
     [5.6 2.9 3.6 1.3]
     [6.7 3.1 4.4 1.4]
     [5.6 3.  4.5 1.5]
     [5.8 2.7 4.1 1. ]
     [6.2 2.2 4.5 1.5]
     [5.6 2.5 3.9 1.1]
     [5.9 3.2 4.8 1.8]
     [6.1 2.8 4.  1.3]
     [6.3 2.5 4.9 1.5]
     [6.1 2.8 4.7 1.2]
     [6.4 2.9 4.3 1.3]
     [6.6 3.  4.4 1.4]
     [6.8 2.8 4.8 1.4]
     [6.7 3.  5.  1.7]
     [6.  2.9 4.5 1.5]
     [5.7 2.6 3.5 1. ]
     [5.5 2.4 3.8 1.1]
     [5.5 2.4 3.7 1. ]
     [5.8 2.7 3.9 1.2]
     [6.  2.7 5.1 1.6]
     [5.4 3.  4.5 1.5]
     [6.  3.4 4.5 1.6]
     [6.7 3.1 4.7 1.5]
     [6.3 2.3 4.4 1.3]
     [5.6 3.  4.1 1.3]
     [5.5 2.5 4.  1.3]
     [5.5 2.6 4.4 1.2]
     [6.1 3.  4.6 1.4]
     [5.8 2.6 4.  1.2]
     [5.  2.3 3.3 1. ]
     [5.6 2.7 4.2 1.3]
     [5.7 3.  4.2 1.2]
     [5.7 2.9 4.2 1.3]
     [6.2 2.9 4.3 1.3]
     [5.1 2.5 3.  1.1]
     [5.7 2.8 4.1 1.3]
     [6.3 3.3 6.  2.5]
     [5.8 2.7 5.1 1.9]
     [7.1 3.  5.9 2.1]
     [6.3 2.9 5.6 1.8]
     [6.5 3.  5.8 2.2]
     [7.6 3.  6.6 2.1]
     [4.9 2.5 4.5 1.7]
     [7.3 2.9 6.3 1.8]
     [6.7 2.5 5.8 1.8]
     [7.2 3.6 6.1 2.5]
     [6.5 3.2 5.1 2. ]
     [6.4 2.7 5.3 1.9]
     [6.8 3.  5.5 2.1]
     [5.7 2.5 5.  2. ]
     [5.8 2.8 5.1 2.4]
     [6.4 3.2 5.3 2.3]
     [6.5 3.  5.5 1.8]
     [7.7 3.8 6.7 2.2]
     [7.7 2.6 6.9 2.3]
     [6.  2.2 5.  1.5]
     [6.9 3.2 5.7 2.3]
     [5.6 2.8 4.9 2. ]
     [7.7 2.8 6.7 2. ]
     [6.3 2.7 4.9 1.8]
     [6.7 3.3 5.7 2.1]
     [7.2 3.2 6.  1.8]
     [6.2 2.8 4.8 1.8]
     [6.1 3.  4.9 1.8]
     [6.4 2.8 5.6 2.1]
     [7.2 3.  5.8 1.6]
     [7.4 2.8 6.1 1.9]
     [7.9 3.8 6.4 2. ]
     [6.4 2.8 5.6 2.2]
     [6.3 2.8 5.1 1.5]
     [6.1 2.6 5.6 1.4]
     [7.7 3.  6.1 2.3]
     [6.3 3.4 5.6 2.4]
     [6.4 3.1 5.5 1.8]
     [6.  3.  4.8 1.8]
     [6.9 3.1 5.4 2.1]
     [6.7 3.1 5.6 2.4]
     [6.9 3.1 5.1 2.3]
     [5.8 2.7 5.1 1.9]
     [6.8 3.2 5.9 2.3]
     [6.7 3.3 5.7 2.5]
     [6.7 3.  5.2 2.3]
     [6.3 2.5 5.  1.9]
     [6.5 3.  5.2 2. ]
     [6.2 3.4 5.4 2.3]
     [5.9 3.  5.1 1.8]]
    data参数值

    有150行以及4列也就是有150个样本和4个特征值。

    target参数是目标值:

    [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
     2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
     2 2]
    target参数

    表示的是鸢尾花的类别,目标值就是三种类别,有150个样本,也就对应150个目标值。

    DESCR表示对特征值得描述:

    **Data Set Characteristics:**
    
        :Number of Instances: 150 (50 in each of three classes)
        :Number of Attributes: 4 numeric, predictive attributes and the class
        :Attribute Information:
            - sepal length in cm
            - sepal width in cm
            - petal length in cm
            - petal width in cm
            - class:
                    - Iris-Setosa
                    - Iris-Versicolour
                    - Iris-Virginica
    DESCR参数值

    当然,sklearn中还有其它的数据集供我们使用,比如:

    sklearn.datasets.load_digits #加载并返回数字数据集

    2、分类的大数据集

    上面的分类数据集数据量是比较小的,下面这个分类数据集的数据量是比较大的:

    sklearn.datasets.fetch_20newsgroups(data_home=None,subset=‘train’)

    参数说明:

    • subset:可选参数‘train'、'test'、'all',选择要加载的数据集,这三个参数分别表示为训练集的“训练”,测试集的“测试”,两者的“全部”。
    • data_home:表示数据集下载的位置,如果为None默认的下载位置为“~/scikit_learn_data”。

    其中:

    datasets.clear_data_home(data_home=None)

    该API可用于清除对应data_home目录下的数据。

    def  fetch_classi():
        data = fetch_20newsgroups(subset='all')
        print(data.data)
        print(data.target)
    
    
    if __name__ == '__main__':
        fetch_classi()
    def fetch_20newsgroups(data_home=None, subset='train', categories=None,
                           shuffle=True, random_state=42,
                           remove=(),
                           download_if_missing=True):
        """Load the filenames and data from the 20 newsgroups dataset 
    (classification).
    
        Download it if necessary.
    
        =================   ==========
        Classes                     20
        Samples total            18846
        Dimensionality               1
        Features                  text
        =================   ==========
    
        Read more in the :ref:`User Guide <20newsgroups_dataset>`.
    
        Parameters
        ----------
        data_home : optional, default: None
            Specify a download and cache folder for the datasets. If None,
            all scikit-learn data is stored in '~/scikit_learn_data' subfolders.
    
        subset : 'train' or 'test', 'all', optional
            Select the dataset to load: 'train' for the training set, 'test'
            for the test set, 'all' for both, with shuffled ordering.
    
        categories : None or collection of string or unicode
            If None (default), load all the categories.
            If not None, list of category names to load (other categories
            ignored).
    
        shuffle : bool, optional
            Whether or not to shuffle the data: might be important for models that
            make the assumption that the samples are independent and identically
            distributed (i.i.d.), such as stochastic gradient descent.
    
        random_state : int, RandomState instance or None (default)
            Determines random number generation for dataset shuffling. Pass an int
            for reproducible output across multiple function calls.
            See :term:`Glossary <random_state>`.
    
        remove : tuple
            May contain any subset of ('headers', 'footers', 'quotes'). Each of
            these are kinds of text that will be detected and removed from the
            newsgroup posts, preventing classifiers from overfitting on
            metadata.
    
            'headers' removes newsgroup headers, 'footers' removes blocks at the
            ends of posts that look like signatures, and 'quotes' removes lines
            that appear to be quoting another post.
    
            'headers' follows an exact standard; the other filters are not always
            correct.
    
        download_if_missing : optional, True by default
            If False, raise an IOError if the data is not locally available
            instead of trying to download the data from the source site.
    
        Returns
        -------
        bunch : Bunch object
            bunch.data: list, length [n_samples]
            bunch.target: array, shape [n_samples]
            bunch.filenames: list, length [n_classes]
            bunch.DESCR: a description of the dataset.
        """
    
        data_home = get_data_home(data_home=data_home)
        cache_path = _pkl_filepath(data_home, CACHE_NAME)
        twenty_home = os.path.join(data_home, "20news_home")
        cache = None
        if os.path.exists(cache_path):
            try:
                with open(cache_path, 'rb') as f:
                    compressed_content = f.read()
                uncompressed_content = codecs.decode(
                    compressed_content, 'zlib_codec')
                cache = pickle.loads(uncompressed_content)
            except Exception as e:
                print(80 * '_')
                print('Cache loading failed')
                print(80 * '_')
                print(e)
    
        if cache is None:
            if download_if_missing:
                logger.info("Downloading 20news dataset. "
                            "This may take a few minutes.")
                cache = _download_20newsgroups(target_dir=twenty_home,
                                               cache_path=cache_path)
            else:
                raise IOError('20Newsgroups dataset not found')
    
        if subset in ('train', 'test'):
            data = cache[subset]
        elif subset == 'all':
            data_lst = list()
            target = list()
            filenames = list()
            for subset in ('train', 'test'):
                data = cache[subset]
                data_lst.extend(data.data)
                target.extend(data.target)
                filenames.extend(data.filenames)
    
            data.data = data_lst
            data.target = np.array(target)
            data.filenames = np.array(filenames)
        else:
            raise ValueError(
                "subset can only be 'train', 'test' or 'all', got '%s'" % subset)
    
        module_path = dirname(__file__)
        with open(join(module_path, 'descr', 'twenty_newsgroups.rst')) as rst_file:
            fdescr = rst_file.read()
    
        data.DESCR = fdescr
    
        if 'headers' in remove:
            data.data = [strip_newsgroup_header(text) for text in data.data]
        if 'footers' in remove:
            data.data = [strip_newsgroup_footer(text) for text in data.data]
        if 'quotes' in remove:
            data.data = [strip_newsgroup_quoting(text) for text in data.data]
    
        if categories is not None:
            labels = [(data.target_names.index(cat), cat) for cat in categories]
            # Sort the categories to have the ordering of the labels
            labels.sort()
            labels, categories = zip(*labels)
            mask = np.in1d(data.target, labels)
            data.filenames = data.filenames[mask]
            data.target = data.target[mask]
            # searchsorted to have continuous labels
            data.target = np.searchsorted(labels, data.target)
            data.target_names = list(categories)
            # Use an object array to shuffle: avoids memory copy
            data_lst = np.array(data.data, dtype=object)
            data_lst = data_lst[mask]
            data.data = data_lst.tolist()
    
        if shuffle:
            random_state = check_random_state(random_state)
            indices = np.arange(data.target.shape[0])
            random_state.shuffle(indices)
            data.filenames = data.filenames[indices]
            data.target = data.target[indices]
            # Use an object array to shuffle: avoids memory copy
            data_lst = np.array(data.data, dtype=object)
            data_lst = data_lst[indices]
            data.data = data_lst.tolist()
    
        return data
    fetch_20newsgroups

    3、数据集的分割

      在前面机器学习的开发流程中已经知晓,数据集应该分隔成两部分,其一是训练集,其二是测试集;所以我们需要对上面的150个样本进行分割成两部分,这里需要使用另一个接口:

    sklearn.model_selection.train_test_split(*arrays, **options)

    参数说明:

    • x   数据集的特征值
    • y    数据集的标签值
    • test_size  测试集的大小,一般为float
    • random_state    随机数种子,不同的种子会造成不同的随机采样结果。相同的种子采样结果相同。
    • return   训练集特征值,测试集特征值,训练标签,测试标签(默认随机取)
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    
    def classi():
        data = load_iris()
        X_train, X_test, y_train, y_test = train_test_split(data.data,data.target,test_size=0.25)
        print("训练集的特征值和目标值
    ",X_train,y_train)
        print("测试集的特征值和目标值
    ",X_test,y_test)

    注意:train_test_split函数的返回值,分别是训练集特征值、测试集特征值、训练集目标值、测试集目标值。

    def train_test_split(*arrays, **options):
        """Split arrays or matrices into random train and test subsets
    
        Quick utility that wraps input validation and
        ``next(ShuffleSplit().split(X, y))`` and application to input data
        into a single call for splitting (and optionally subsampling) data in a
        oneliner.
    
        Read more in the :ref:`User Guide <cross_validation>`.
    
        Parameters
        ----------
        *arrays : sequence of indexables with same length / shape[0]
            Allowed inputs are lists, numpy arrays, scipy-sparse
            matrices or pandas dataframes.
    
        test_size : float, int or None, optional (default=0.25)
            If float, should be between 0.0 and 1.0 and represent the proportion
            of the dataset to include in the test split. If int, represents the
            absolute number of test samples. If None, the value is set to the
            complement of the train size. By default, the value is set to 0.25.
            The default will change in version 0.21. It will remain 0.25 only
            if ``train_size`` is unspecified, otherwise it will complement
            the specified ``train_size``.
    
        train_size : float, int, or None, (default=None)
            If float, should be between 0.0 and 1.0 and represent the
            proportion of the dataset to include in the train split. If
            int, represents the absolute number of train samples. If None,
            the value is automatically set to the complement of the test size.
    
        random_state : int, RandomState instance or None, optional (default=None)
            If int, random_state is the seed used by the random number generator;
            If RandomState instance, random_state is the random number generator;
            If None, the random number generator is the RandomState instance used
            by `np.random`.
    
        shuffle : boolean, optional (default=True)
            Whether or not to shuffle the data before splitting. If shuffle=False
            then stratify must be None.
    
        stratify : array-like or None (default=None)
            If not None, data is split in a stratified fashion, using this as
            the class labels.
    
        Returns
        -------
        splitting : list, length=2 * len(arrays)
            List containing train-test split of inputs.
    
            .. versionadded:: 0.16
                If the input is sparse, the output will be a
                ``scipy.sparse.csr_matrix``. Else, output type is the same as the
                input type.
    
        Examples
        --------
        >>> import numpy as np
        >>> from sklearn.model_selection import train_test_split
        >>> X, y = np.arange(10).reshape((5, 2)), range(5)
        >>> X
        array([[0, 1],
               [2, 3],
               [4, 5],
               [6, 7],
               [8, 9]])
        >>> list(y)
        [0, 1, 2, 3, 4]
    
        >>> X_train, X_test, y_train, y_test = train_test_split(
        ...     X, y, test_size=0.33, random_state=42)
        ...
        >>> X_train
        array([[4, 5],
               [0, 1],
               [6, 7]])
        >>> y_train
        [2, 0, 3]
        >>> X_test
        array([[2, 3],
               [8, 9]])
        >>> y_test
        [1, 4]
    
        >>> train_test_split(y, shuffle=False)
        [[0, 1, 2], [3, 4]]
    
        """
        n_arrays = len(arrays)
        if n_arrays == 0:
            raise ValueError("At least one array required as input")
        test_size = options.pop('test_size', 'default')
        train_size = options.pop('train_size', None)
        random_state = options.pop('random_state', None)
        stratify = options.pop('stratify', None)
        shuffle = options.pop('shuffle', True)
    
        if options:
            raise TypeError("Invalid parameters passed: %s" % str(options))
    
        if test_size == 'default':
            test_size = None
            if train_size is not None:
                warnings.warn("From version 0.21, test_size will always "
                              "complement train_size unless both "
                              "are specified.",
                              FutureWarning)
    
        if test_size is None and train_size is None:
            test_size = 0.25
    
        arrays = indexable(*arrays)
    
        if shuffle is False:
            if stratify is not None:
                raise ValueError(
                    "Stratified train/test split is not implemented for "
                    "shuffle=False")
    
            n_samples = _num_samples(arrays[0])
            n_train, n_test = _validate_shuffle_split(n_samples, test_size,
                                                      train_size)
    
            train = np.arange(n_train)
            test = np.arange(n_train, n_train + n_test)
    
        else:
            if stratify is not None:
                CVClass = StratifiedShuffleSplit
            else:
                CVClass = ShuffleSplit
    
            cv = CVClass(test_size=test_size,
                         train_size=train_size,
                         random_state=random_state)
    
            train, test = next(cv.split(X=arrays[0], y=stratify))
    
        return list(chain.from_iterable((safe_indexing(a, train),
                                         safe_indexing(a, test)) for a in arrays))
    train_test_split

    (二)sklearn回归数据集

    回归数据集和分类数据集的操作基本一样,只是调用的API接口不同,比如可以调用下面的回归数据集接口:

    sklearn.datasets.load_boston()    #加载并返回波士顿房价数据集
    sklearn.datasets.load_diabetes()  #加载和返回糖尿病数据集
    from sklearn.datasets import load_boston
    
    def recurr():
        data = load_boston()
        print(data.data)
        print(data.target)

    其中data参数是样本数和特征值:

    [[6.3200e-03 1.8000e+01 2.3100e+00 ... 1.5300e+01 3.9690e+02 4.9800e+00]
     [2.7310e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9690e+02 9.1400e+00]
     [2.7290e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9283e+02 4.0300e+00]
     ...
     [6.0760e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 5.6400e+00]
     [1.0959e-01 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9345e+02 6.4800e+00]
     [4.7410e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 7.8800e+00]]
    data参数值

    每一个数组就是一个样本,每一个样本有多个特征值,如下:

            - CRIM     per capita crime rate by town
            - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
            - INDUS    proportion of non-retail business acres per town
            - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
            - NOX      nitric oxides concentration (parts per 10 million)
            - RM       average number of rooms per dwelling
            - AGE      proportion of owner-occupied units built prior to 1940
            - DIS      weighted distances to five Boston employment centres
            - RAD      index of accessibility to radial highways
            - TAX      full-value property-tax rate per $10,000
            - PTRATIO  pupil-teacher ratio by town
            - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
            - LSTAT    % lower status of the population
            - MEDV     Median value of owner-occupied homes in $1000's

    target参数就是房价,它是连续的数值:

    [24.  21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 15.  18.9 21.7 20.4
     18.2 19.9 23.1 17.5 20.2 18.2 13.6 19.6 15.2 14.5 15.6 13.9 16.6 14.8
     18.4 21.  12.7 14.5 13.2 13.1 13.5 18.9 20.  21.  24.7 30.8 34.9 26.6
     25.3 24.7 21.2 19.3 20.  16.6 14.4 19.4 19.7 20.5 25.  23.4 18.9 35.4
     24.7 31.6 23.3 19.6 18.7 16.  22.2 25.  33.  23.5 19.4 22.  17.4 20.9
     24.2 21.7 22.8 23.4 24.1 21.4 20.  20.8 21.2 20.3 28.  23.9 24.8 22.9
     23.9 26.6 22.5 22.2 23.6 28.7 22.6 22.  22.9 25.  20.6 28.4 21.4 38.7
     43.8 33.2 27.5 26.5 18.6 19.3 20.1 19.5 19.5 20.4 19.8 19.4 21.7 22.8
     18.8 18.7 18.5 18.3 21.2 19.2 20.4 19.3 22.  20.3 20.5 17.3 18.8 21.4
     15.7 16.2 18.  14.3 19.2 19.6 23.  18.4 15.6 18.1 17.4 17.1 13.3 17.8
     14.  14.4 13.4 15.6 11.8 13.8 15.6 14.6 17.8 15.4 21.5 19.6 15.3 19.4
     17.  15.6 13.1 41.3 24.3 23.3 27.  50.  50.  50.  22.7 25.  50.  23.8
     23.8 22.3 17.4 19.1 23.1 23.6 22.6 29.4 23.2 24.6 29.9 37.2 39.8 36.2
     37.9 32.5 26.4 29.6 50.  32.  29.8 34.9 37.  30.5 36.4 31.1 29.1 50.
     33.3 30.3 34.6 34.9 32.9 24.1 42.3 48.5 50.  22.6 24.4 22.5 24.4 20.
     21.7 19.3 22.4 28.1 23.7 25.  23.3 28.7 21.5 23.  26.7 21.7 27.5 30.1
     44.8 50.  37.6 31.6 46.7 31.5 24.3 31.7 41.7 48.3 29.  24.  25.1 31.5
     23.7 23.3 22.  20.1 22.2 23.7 17.6 18.5 24.3 20.5 24.5 26.2 24.4 24.8
     29.6 42.8 21.9 20.9 44.  50.  36.  30.1 33.8 43.1 48.8 31.  36.5 22.8
     30.7 50.  43.5 20.7 21.1 25.2 24.4 35.2 32.4 32.  33.2 33.1 29.1 35.1
     45.4 35.4 46.  50.  32.2 22.  20.1 23.2 22.3 24.8 28.5 37.3 27.9 23.9
     21.7 28.6 27.1 20.3 22.5 29.  24.8 22.  26.4 33.1 36.1 28.4 33.4 28.2
     22.8 20.3 16.1 22.1 19.4 21.6 23.8 16.2 17.8 19.8 23.1 21.  23.8 23.1
     20.4 18.5 25.  24.6 23.  22.2 19.3 22.6 19.8 17.1 19.4 22.2 20.7 21.1
     19.5 18.5 20.6 19.  18.7 32.7 16.5 23.9 31.2 17.5 17.2 23.1 24.5 26.6
     22.9 24.1 18.6 30.1 18.2 20.6 17.8 21.7 22.7 22.6 25.  19.9 20.8 16.8
     21.9 27.5 21.9 23.1 50.  50.  50.  50.  50.  13.8 13.8 15.  13.9 13.3
     13.1 10.2 10.4 10.9 11.3 12.3  8.8  7.2 10.5  7.4 10.2 11.5 15.1 23.2
      9.7 13.8 12.7 13.1 12.5  8.5  5.   6.3  5.6  7.2 12.1  8.3  8.5  5.
     11.9 27.9 17.2 27.5 15.  17.2 17.9 16.3  7.   7.2  7.5 10.4  8.8  8.4
     16.7 14.2 20.8 13.4 11.7  8.3 10.2 10.9 11.   9.5 14.5 14.1 16.1 14.3
     11.7 13.4  9.6  8.7  8.4 12.8 10.5 17.1 18.4 15.4 10.8 11.8 14.9 12.6
     14.1 13.  13.4 15.2 16.1 17.8 14.9 14.1 12.7 13.5 14.9 20.  16.4 17.7
     19.5 20.2 21.4 19.9 19.  19.1 19.1 20.1 19.9 19.6 23.2 29.8 13.8 13.3
     16.7 12.  14.6 21.4 23.  23.7 25.  21.8 20.6 21.2 19.1 20.6 15.2  7.
      8.1 13.6 20.1 21.8 24.5 23.1 19.7 18.3 21.2 17.5 16.8 22.4 20.6 23.9
     22.  11.9]
    target数值

    其余的回归数据集的用法与之类似。

  • 相关阅读:
    python模块搜索路径
    Python数据结构
    Python文件类型
    Python循环语句
    Python条件语句
    python配置文件操作——configparser模块
    python 加密方式(MD5&sha&hashlib)
    python MySQL 获取全部数据库(DATABASE)名、表(TABLE)名
    python sqlite3查看数据库所有表(table)
    027.MFC_映射消息
  • 原文地址:https://www.cnblogs.com/shenjianping/p/12926831.html
Copyright © 2011-2022 走看看