multiclass
https://scikit-learn.org/stable/modules/multiclass.html#multiclass-classification
多类分类面向的目标是,多余两类的, 每一个样本只能被分为一类。
区别于二值分类, 其目标类别有多个。
Multiclass classification is a classification task with more than two classes. Each sample can only be labeled as one class.
For example, classification using features extracted from a set of images of fruit, where each image may either be of an orange, an apple, or a pear. Each image is one sample and is labeled as one of the 3 possible classes. Multiclass classification makes the assumption that each sample is assigned to one and only one label - one sample cannot, for example, be both a pear and an apple.
While all scikit-learn classifiers are capable of multiclass classification, the meta-estimators offered by
sklearn.multiclass
permit changing the way they handle more than two classes because this may have an effect on classifier performance (either in terms of generalization error or required computational resources).
Target format
例如目标分类范围是多个, 苹果 梨子 橘子。
一般字符型目标分类,需要使用LabelBinarizer工具,转换为二值矩阵。
Valid multiclass representations for
type_of_target
(y
) are:
1d or column vector containing more than two discrete values. An example of a vector
y
for 4 samples:>>> import numpy as np >>> y = np.array(['apple', 'pear', 'apple', 'orange']) >>> print(y) ['apple' 'pear' 'apple' 'orange']
Dense or sparse binary matrix of shape
(n_samples, n_classes)
with a single sample per row, where each column represents one class. An example of both a dense and sparse binary matrixy
for 4 samples, where the columns, in order, are apple, orange, and pear:>>> import numpy as np >>> from sklearn.preprocessing import LabelBinarizer >>> y = np.array(['apple', 'pear', 'apple', 'orange']) >>> y_dense = LabelBinarizer().fit_transform(y) >>> print(y_dense) [[1 0 0] [0 0 1] [1 0 0] [0 1 0]] >>> from scipy import sparse >>> y_sparse = sparse.csr_matrix(y_dense) >>> print(y_sparse) (0, 0) 1 (1, 2) 1 (2, 0) 1 (3, 1) 1
OneVsRestClassifier
https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass.OneVsRestClassifier
二值分类器适用于多值分类, 但是可以通过堆积模型的方法, 来构造多值分类器。
例如线性SVC模型,带入此集成分类器,可以获得多值分类效果。
将数据做 one vs rest划分, 得到与类别数相同的 多分学习数据(主类样本 和 其它类样本), 每份数据上使用一个线性模型来学习, 预测当前主类。
预测阶段, 在哪一个模型上的得分高,就判断为哪一个类。
The one-vs-rest strategy, also known as one-vs-all, is implemented in
OneVsRestClassifier
. The strategy consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. In addition to its computational efficiency (onlyn_classes
classifiers are needed), one advantage of this approach is its interpretability. Since each class is represented by one and only one classifier, it is possible to gain knowledge about the class by inspecting its corresponding classifier. This is the most commonly used strategy and is a fair default choice.
>>> from sklearn import datasets >>> from sklearn.multiclass import OneVsRestClassifier >>> from sklearn.svm import LinearSVC >>> X, y = datasets.load_iris(return_X_y=True) >>> OneVsRestClassifier(LinearSVC(random_state=0)).fit(X, y).predict(X) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
OneVsOneClassifier
https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsOneClassifier.html#sklearn.multiclass.OneVsOneClassifier
二值分类器适用于多值分类, 但是可以通过堆积模型的方法, 来构造多值分类器。
例如线性SVC模型,带入此集成分类器,可以获得多值分类效果。
与 one vs rest 不同, 在于数据划分方法, 按照类别两两组合, 得到多份学习数据, 每个模型在每份数据上学习, 预测结果是两者类别之一。
预测截断, 将所有模型都跑一遍, 得到众多分类结论, 通过投票, 谁得票多, 谁胜出。
OneVsOneClassifier
constructs one classifier per pair of classes. At prediction time, the class which received the most votes is selected. In the event of a tie (among two classes with an equal number of votes), it selects the class with the highest aggregate classification confidence by summing over the pair-wise classification confidence levels computed by the underlying binary classifiers.Since it requires to fit
n_classes * (n_classes - 1) / 2
classifiers, this method is usually slower than one-vs-the-rest, due to its O(n_classes^2) complexity. However, this method may be advantageous for algorithms such as kernel algorithms which don’t scale well withn_samples
. This is because each individual learning problem only involves a small subset of the data whereas, with one-vs-the-rest, the complete dataset is usedn_classes
times. The decision function is the result of a monotonic transformation of the one-versus-one classification.
>>> from sklearn import datasets >>> from sklearn.multiclass import OneVsOneClassifier >>> from sklearn.svm import LinearSVC >>> X, y = datasets.load_iris(return_X_y=True) >>> OneVsOneClassifier(LinearSVC(random_state=0)).fit(X, y).predict(X) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
OutputCodeClassifier
https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OutputCodeClassifier.html#sklearn.multiclass.OutputCodeClassifier
二值分类器适用于多值分类, 但是可以通过堆积模型的方法, 来构造多值分类器。
例如线性SVC模型,带入此集成分类器,可以获得多值分类效果。
与上面两个不同, 上面两个体现在数据划分上, 数据的不同划分, 决定的模型的数量。
此方法,在模型预测值上做文章, 模型的分类, 实际上是某个空间上的 不同的点,
例如 多分类 值 1 2 3, 完全可以映射到 多维度空间上, 得到 [0,0] [0, 1] [1, 1], 每一个维度只能是 0 和 1
对于多维空间上每一个维度值, 设置一个学习模型。
预测阶段, 对于每个预测模型输出的维度值, 计算与类的距离, 最近的一个为预测类别。
Error-Correcting Output Code-based strategies are fairly different from one-vs-the-rest and one-vs-one. With these strategies, each class is represented in a Euclidean space, where each dimension can only be 0 or 1. Another way to put it is that each class is represented by a binary code (an array of 0 and 1). The matrix which keeps track of the location/code of each class is called the code book. The code size is the dimensionality of the aforementioned space. Intuitively, each class should be represented by a code as unique as possible and a good code book should be designed to optimize classification accuracy. In this implementation, we simply use a randomly-generated code book as advocated in 3 although more elaborate methods may be added in the future.
At fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen.
In
OutputCodeClassifier
, thecode_size
attribute allows the user to control the number of classifiers which will be used. It is a percentage of the total number of classes.A number between 0 and 1 will require fewer classifiers than one-vs-the-rest. In theory,
log2(n_classes) / n_classes
is sufficient to represent each class unambiguously. However, in practice, it may not lead to good accuracy sincelog2(n_classes)
is much smaller than n_classes.A number greater than 1 will require more classifiers than one-vs-the-rest. In this case, some classifiers will in theory correct for the mistakes made by other classifiers, hence the name “error-correcting”. In practice, however, this may not happen as classifier mistakes will typically be correlated. The error-correcting output codes have a similar effect to bagging.
>>> from sklearn import datasets >>> from sklearn.multiclass import OutputCodeClassifier >>> from sklearn.svm import LinearSVC >>> X, y = datasets.load_iris(return_X_y=True) >>> clf = OutputCodeClassifier(LinearSVC(random_state=0), ... code_size=2, random_state=0) >>> clf.fit(X, y).predict(X) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
数据划分方法
https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/
One-Vs-Rest for Multi-Class Classification
For example, given a multi-class classification problem with examples for each class ‘red,’ ‘blue,’ and ‘green‘. This could be divided into three binary classification datasets as follows:
- Binary Classification Problem 1: red vs [blue, green]
- Binary Classification Problem 2: blue vs [red, green]
- Binary Classification Problem 3: green vs [red, blue]
A possible downside of this approach is that it requires one model to be created for each class. For example, three classes requires three models. This could be an issue for large datasets (e.g. millions of rows), slow models (e.g. neural networks), or very large numbers of classes (e.g. hundreds of classes).
One-Vs-One for Multi-Class Classification
For example, consider a multi-class classification problem with four classes: ‘red,’ ‘blue,’ and ‘green,’ ‘yellow.’ This could be divided into six binary classification datasets as follows:
- Binary Classification Problem 1: red vs. blue
- Binary Classification Problem 2: red vs. green
- Binary Classification Problem 3: red vs. yellow
- Binary Classification Problem 4: blue vs. green
- Binary Classification Problem 5: blue vs. yellow
- Binary Classification Problem 6: green vs. yellow
This is significantly more datasets, and in turn, models than the one-vs-rest strategy described in the previous section.
The formula for calculating the number of binary datasets, and in turn, models, is as follows:
- (NumClasses * (NumClasses – 1)) / 2
We can see that for four classes, this gives us the expected value of six binary classification problems:
- (NumClasses * (NumClasses – 1)) / 2
- (4 * (4 – 1)) / 2
- (4 * 3) / 2
- 12 / 2
- 6
目标划分方法
https://machinelearningmastery.com/error-correcting-output-codes-ecoc-for-machine-learning/
优点, 可以控制模型的数目。
The scikit-learn library provides an implementation of ECOC via the OutputCodeClassifier class.
The class takes as an argument the model to use to fit each binary classifier, and any machine learning model can be used. In this case, we will use a logistic regression model, intended for binary classification.
The class also provides the “code_size” argument that specifies the size of the encoding for the classes as a multiple of the number of classes, e.g. the number of bits to encode for each class label.
For example, if we wanted an encoding with bit strings with a length of 6 bits, and we had three classes, then we can specify the coding size as 2:
- encoding_length = code_size * num_classes
- encoding_length = 2 * 3
- encoding_length = 6
图像化描述
https://www.cnblogs.com/us-wjz/articles/11410118.html#_label1_2
OvO和OvR的区别
原理:将模型构建应用分为两个阶段:编码阶段和解码阶段;编码阶段中对K个类别中进行M次划分,每次划分将一部分数据分为正类,一部分数据分为反类,每次划分都构建出来一个模型,模型的结果是在空间中对于每个类别都定义了一个点;解码阶段中使用训练出来的模型对测试样例进行预测,将预测样本对应的点和类别之间的点求距离,选择距离最近的类别作为最终的预测类别。
LabelBinarizer
https://scikit-learn.org/stable/tutorial/basic/tutorial.html#multiclass-vs-multilabel-fitting
binary 分类器只能 应用二值分类
OneVsRestClassifier 将组合多个 Binary分类器, 扩展为 多类 场景。
其支持 数值型 目标, 也支持 LabelBinarizer 变换后的 数组。
When using
multiclass classifiers
, the learning and prediction task that is performed is dependent on the format of the target data fit upon:>>> from sklearn.svm import SVC >>> from sklearn.multiclass import OneVsRestClassifier >>> from sklearn.preprocessing import LabelBinarizer >>> X = [[1, 2], [2, 4], [4, 5], [3, 2], [3, 1]] >>> y = [0, 0, 1, 1, 2] >>> classif = OneVsRestClassifier(estimator=SVC(random_state=0)) >>> classif.fit(X, y).predict(X) array([0, 0, 1, 1, 2])
In the above case, the classifier is fit on a 1d array of multiclass labels and the
predict()
method therefore provides corresponding multiclass predictions. It is also possible to fit upon a 2d array of binary label indicators:>>> y = LabelBinarizer().fit_transform(y) >>> classif.fit(X, y).predict(X) array([[1, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 0], [0, 0, 0]])
Here, the classifier is
fit()
on a 2d binary label representation ofy
, using theLabelBinarizer
. In this casepredict()
returns a 2d array representing the corresponding multilabel predictions.Note that the fourth and fifth instances returned all zeroes, indicating that they matched none of the three labels
fit
upon.