zoukankan      html  css  js  c++  java
  • Transforming the prediction target of sklearn

    concept

    https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets

    对于监督性学习,其目标值需要进行转化,才能作为模型的目标,或者更加有效地适应模型。

    These are transformers that are not intended to be used on features, only on supervised learning targets.

    See also Transforming target in regression if you want to transform the prediction target for learning, but evaluate the model in the original (untransformed) space.

    模型自适应

    https://scikit-learn.org/stable/tutorial/basic/tutorial.html#model-persistence

    有的模型,其目标支持,原始类型(字符串,或者数值类型)。如下所示。

    对于这种模型,转换并不是必要的,但是对目标的转换时一种更加通用的做法。

    >>> from sklearn import datasets
    >>> from sklearn.svm import SVC
    >>> iris = datasets.load_iris()
    >>> clf = SVC()
    >>> clf.fit(iris.data, iris.target)
    SVC()
    
    >>> list(clf.predict(iris.data[:3]))
    [0, 0, 0]
    
    >>> clf.fit(iris.data, iris.target_names[iris.target])
    SVC()
    
    >>> list(clf.predict(iris.data[:3]))
    ['setosa', 'setosa', 'setosa']

    LabelBinarizer

    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html#sklearn.preprocessing.LabelBinarizer

         一些回归和二值分类算法,需要使用此工具,将目标转换,进而支持multiclass分类。

    Binarize labels in a one-vs-all fashion.

    Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.

    At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.

    At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.

    code:

    from sklearn import preprocessing
    import numpy as np
    
    lb = preprocessing.LabelBinarizer()
    lb.fit([1, 2, 6, 4, 2])
    
    print(lb.classes_)
    
    print(lb.transform([1, 6]))
    
    transformed_label =  np.array([[1, 0, 0, 0],[0, 0, 0, 1]]) 
    
    print(lb.inverse_transform(transformed_label))

    output

    [1 2 4 6]
    [[1 0 0 0]
     [0 0 0 1]]
    [1 6]

    MultiLabelBinarizer

    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer

          将多标记的目标转换为 二值型目标。

    Transform between iterable of iterables and a multilabel format.

    Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

    In multilabel learning, the joint set of binary classification tasks is expressed with a label binary indicator array: each sample is one row of a 2d array of shape (n_samples, n_classes) with binary values where the one, i.e. the non zero elements, corresponds to the subset of labels for that sample. An array such as np.array([[1, 0, 0], [0, 1, 1], [0, 0, 0]]) represents label 0 in the first sample, labels 1 and 2 in the second sample, and no labels in the third sample.

    Producing multilabel data as a list of sets of labels may be more intuitive. The MultiLabelBinarizer transformer can be used to convert between a collection of collections of labels and the indicator format:

    >>> from sklearn.preprocessing import MultiLabelBinarizer
    >>> y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
    >>> MultiLabelBinarizer().fit_transform(y)
    array([[0, 0, 1, 1, 1],
           [0, 0, 1, 0, 0],
           [1, 1, 0, 1, 0],
           [1, 1, 1, 1, 1],
           [1, 1, 1, 0, 0]])

    LabelEncoder

    https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder

          将标签(字符 或者 数字), 转化为紧致型的 multiclass目标。

    Encode target labels with value between 0 and n_classes-1.

    This transformer should be used to encode target values, i.e. y, and not the input X.

    Read more in the User Guide.

    >>> from sklearn import preprocessing
    >>> le = preprocessing.LabelEncoder()
    >>> le.fit([1, 2, 2, 6])
    LabelEncoder()
    >>> le.classes_
    array([1, 2, 6])
    >>> le.transform([1, 1, 2, 6])
    array([0, 0, 1, 2])
    >>> le.inverse_transform([0, 0, 1, 2])
    array([1, 1, 2, 6])
    >>> le = preprocessing.LabelEncoder()
    >>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
    LabelEncoder()
    >>> list(le.classes_)
    ['amsterdam', 'paris', 'tokyo']
    >>> le.transform(["tokyo", "tokyo", "paris"])
    array([2, 2, 1])
    >>> list(le.inverse_transform([2, 2, 1]))
    ['tokyo', 'tokyo', 'paris']
  • 相关阅读:
    意外发现,VC断点可加在构造函数的左括号上
    C++中的INL
    如何用DELPHI编程修改外部EXE文件的版本信
    j2ee面试宝典翻译(1)
    华为总裁任正非:允许小部分力量去颠覆性创新
    QStringList与QString互转
    QTreeView只显示指定驱动器及其目录,隐藏所有兄弟节点
    Protected Functions 是理解OO的难点和关键
    技术人员的创业陷阱:我能,但不管用户在哪里!
    大陆的创业环境和风气的确产生巨大变化,大众创业“蔚然成风”
  • 原文地址:https://www.cnblogs.com/lightsong/p/14202349.html
Copyright © 2011-2022 走看看