scikit-learn - 走看看

zoukankan html css js c++ java

scikit-learn
（1）数据标准化（Standardization or Mean Removal and Variance Scaling）

进行标准化缩放的数据均值为0，具有单位方差。

scale函数提供一种便捷的标准化转换操作，如下：

[python] view plain copy

>>> from sklearn import preprocessing #导入数据预处理包

>>> X=[[1.,-1.,2.],

       [2.,0.,0.],

       [0.,1.,-1.]]

>>> X_scaled = preprocessing.scale(X)

>>> X_scaled

array([[ 0.        , -1.22474487,  1.33630621],

       [ 1.22474487,  0.        , -0.26726124],

       [-1.22474487,  1.22474487, -1.06904497]])

[python] view plain copy

>>> X_scaled.mean(axis=0)

array([ 0.,  0.,  0.])

>>> X_scaled.std(axis=0)

array([ 1.,  1.,  1.])

同样我们也可以通过preprocessing模块提供的Scaler（StandardScaler 0.15以后版本）工具类来实现这个功能：

[python] view plain copy

>>> scaler = preprocessing.StandardScaler().fit(X)

>>> scaler

StandardScaler(copy=True, with_mean=True, with_std=True)

>>> scaler.mean_

array([ 1.        ,  0.        ,  0.33333333])

>>> scaler.std_

array([ 0.81649658,  0.81649658,  1.24721913])

>>> scaler.transform(X)

array([[ 0.        , -1.22474487,  1.33630621],

       [ 1.22474487,  0.        , -0.26726124],

       [-1.22474487,  1.22474487, -1.06904497]])

（2）数据规范化（Normalization）
把数据集中的每个样本所有数值缩放到(-1,1)之间。

[python] view plain copy

>>> X = [[ 1., -1., 2.],

     [ 2., 0., 0.],

     [ 0., 1., -1.]]

>>> X_normalized = preprocessing.normalize(X, norm='l2')

>>> X_normalized

array([[ 0.40824829, -0.40824829,  0.81649658],

       [ 1.        ,  0.        ,  0.        ],

       [ 0.        ,  0.70710678, -0.70710678]])

>>> normalizer = preprocessing.Normalizer().fit(X) # fit does nothing

>>> normalizer

Normalizer(copy=True, norm='l2')

>>> normalizer.transform(X)

array([[ 0.40824829, -0.40824829,  0.81649658],

       [ 1.        ,  0.        ,  0.        ],

       [ 0.        ,  0.70710678, -0.70710678]])

>>> normalizer.transform([[-1., 1., 0.]])

array([[-0.70710678,  0.70710678,  0.        ]])

（3）二进制化（Binarization）
将数值型数据转化为布尔型的二值数据，可以设置一个阈值（threshold）

[python] view plain copy

>>> X = [[ 1., -1., 2.],

     [ 2., 0., 0.],

     [ 0., 1., -1.]]

>>> binarizer = preprocessing.Binarizer().fit(X) # fit does nothing

>>> binarizer

Binarizer(copy=True, threshold=0.0) # 默认阈值为0.0

>>> binarizer.transform(X)

array([[ 1.,  0.,  1.],

       [ 1.,  0.,  0.],

       [ 0.,  1.,  0.]])

>>> binarizer = preprocessing.Binarizer(threshold=1.1) # 设定阈值为1.1

>>> binarizer.transform(X)

array([[ 0.,  0.,  1.],

       [ 1.,  0.,  0.],

       [ 0.,  0.,  0.]])

（4）标签预处理（Label preprocessing）

4.1）标签二值化（Label binarization）

LabelBinarizer通常用于通过一个多类标签（label）列表，创建一个label指示器矩阵

[python] view plain copy

>>> lb = preprocessing.LabelBinarizer()

>>> lb.fit([1, 2, 6, 4, 2])

LabelBinarizer(neg_label=0, pos_label=1)

>>> lb.classes_

array([1, 2, 4, 6])

>>> lb.transform([1, 6])

array([[1, 0, 0, 0],

       [0, 0, 0, 1]])

上例中每个实例中只有一个标签（label），LabelBinarizer也支持每个实例数据显示多个标签：

[python] view plain copy

>>> lb.fit_transform([(1, 2), (3,)]) #(1,2)实例中就包含两个label

array([[1, 1, 0],

       [0, 0, 1]])

>>> lb.classes_

array([1, 2, 3])

4.2）标签编码（Label encoding）

[python] view plain copy

>>> from sklearn import preprocessing

>>> le = preprocessing.LabelEncoder()

>>> le.fit([1, 2, 2, 6])

LabelEncoder()

>>> le.classes_

array([1, 2, 6])

>>> le.transform([1, 1, 2, 6])

array([0, 0, 1, 2])

>>> le.inverse_transform([0, 0, 1, 2])

array([1, 1, 2, 6])

也可以用于非数值类型的标签到数值类型标签的转化：

[python] view plain copy

>>> le = preprocessing.LabelEncoder()

>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])

LabelEncoder()

>>> list(le.classes_)

['amsterdam', 'paris', 'tokyo']

>>> le.transform(["tokyo", "tokyo", "paris"])

array([2, 2, 1])

>>> list(le.inverse_transform([2, 2, 1]))

['tokyo', 'tokyo', 'paris']
查看全文

相关阅读:
css3动画入门transition、animation
vue入门教程 (vueJS2.X)
web前端开发代码规范及注意事项
 树莓派 mongodb 安装&报错处理
 mongodb Failed to start LSB: An object/document-oriented dat
js实现replaceAll功能
 mac for smartSVN9 (8,9)破解方法附smartSvn_keygen工具图文
 js可视区域图片懒加载
 Hibernate基础知识
 Hibernate缓存策略

原文地址：https://www.cnblogs.com/changbosha/p/5797664.html