zoukankan      html  css  js  c++  java
  • 异常检测-基于孤立森林算法Isolation-based Anomaly Detection-3-例子

    参考:https://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py

    代码:

    print(__doc__)
    
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.ensemble import IsolationForest
    
    rng = np.random.RandomState(42)
    
    # 构建训练数据,即100个属性值为2的样本,属性的值为 随机[0,1]数*0.3
    X = 0.3 * rng.randn(100, 2)
    # 将上面得到的值+2和-2各生成100个值在2和-2附近的样本
    #拼接后训练数据大小为(200, 2)
    X_train = np.r_[X + 2, X - 2] #按列连接矩阵,要求列相等,行拼接
    
    # 产生一些有规律的新观察值
    X = 0.3 * rng.randn(20, 2)
    #拼接后训练数据大小为(40, 2)
    X_test = np.r_[X + 2, X - 2]
    
    # 均匀分布生成异常数据集,大小为(20, 2),值的范围为[-4,4]
    X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))
    
    # 构建森林,进行采样,子采样大小为100
    # 默认参数max_features=1,则每棵树都仅使用一个属性来进行切割
    # 如果你想要选择多个属性(当你的数据是多维,即有多个属性时)则记得设置该参数
    clf = IsolationForest(behaviour='new', max_samples=100,
                          random_state=rng, contamination='auto')
    
    # 训练森林,选择属性和分割值等
    clf.fit(X_train)
    
    #然后使用该构建好的森林进行预测
    y_pred_train = clf.predict(X_train)
    print(y_pred_train)
    y_pred_test = clf.predict(X_test)
    print(y_pred_test)
    y_pred_outliers = clf.predict(X_outliers)
    print(y_pred_outliers)
    
    # 画图, the samples, and the nearest vectors to the plane
    # xx和yy大小分别为(50,50)
    xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
    # 先拉直xx和yy为大小为(2500,)的一维向量
    # 然后按行拼接xx,yy,即行数相等,列数增加;即两者拼成(2500,2)的坐标点
    # 然后得到这几个点的异常分数
    # 正常点的异常分数为整数,异常的为负数
    Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    plt.title("IsolationForest")
    # 绘制网格点的异常分数的等高线图,看图可知,颜色越浅越可能为正常点,越深越为异常点
    plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
    
    # 在等高线中标出训练点、测试点、异常点的位置,看它们是不是在对应的颜色位置
    # 可见训练点和测试点都在颜色前的区域,异常点都在颜色深的区域
    b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white',
                     s=20, edgecolor='k')
    b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green',
                     s=20, edgecolor='k')
    c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red',
                    s=20, edgecolor='k')
    
    plt.axis('tight')
    # 绘制图的坐标和图例信息
    plt.xlim((-5, 5))
    plt.ylim((-5, 5))
    plt.legend([b1, b2, c],
               ["training observations",
                "new regular observations", "new abnormal observations"],
               loc="upper left")
    plt.show()

    返回:

    Automatically created module for IPython interactive environment
    [ 1 -1  1 -1  1  1 -1 -1  1 -1 -1  1  1  1  1 -1  1 -1 -1  1  1  1 -1  1
     -1  1  1 -1  1  1  1 -1 -1  1  1 -1  1 -1  1 -1  1 -1  1  1  1  1  1 -1
      1  1  1  1  1 -1  1 -1 -1  1  1 -1  1 -1 -1  1  1 -1  1 -1  1 -1  1 -1
      1 -1  1  1  1  1 -1  1  1 -1 -1 -1  1  1  1  1  1 -1  1  1  1  1 -1  1
      1  1  1  1  1 -1  1 -1  1  1 -1 -1  1 -1 -1 -1  1  1  1 -1  1 -1 -1 -1
      1  1 -1  1 -1  1  1 -1  1  1  1 -1 -1  1  1 -1 -1 -1  1 -1  1 -1  1  1
      1  1  1 -1  1  1 -1  1  1 -1  1 -1 -1  1  1 -1  1 -1 -1 -1  1 -1  1 -1
      1 -1  1 -1  1 -1  1  1  1  1 -1 -1  1 -1 -1 -1  1 -1  1  1 -1 -1  1  1
      1  1 -1  1  1  1  1  1]
    [ 1 -1 -1  1 -1 -1 -1  1  1  1 -1 -1  1  1  1  1  1 -1 -1  1  1 -1 -1  1
     -1  1 -1  1  1  1 -1 -1  1  1  1  1  1 -1 -1  1]
    [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]

    图为:

    如果将contamination设置为0.,表示训练数据中没有异常数据,返回为:

    Automatically created module for IPython interactive environment
    [[-0.48822863 -3.37234895]
     [-3.79719405  3.70118732]
     [ 2.68784096  1.56779365]
     [-0.72837644 -2.61364544]
     [-2.74850366 -1.99805681]
     [ 0.39381332  1.71676738]
     [ 1.28157901 -1.76052882]
     [ 3.63892225  1.90317533]
     [ 0.43483242  0.89376597]
     [-0.6431995  -2.01815208]
     [-1.15221857  2.06276888]
     [-3.88485209 -3.07141888]
     [-3.63197886 -3.67416958]
     [ 2.84368467  1.62926288]
     [-0.20660937 -3.21732671]
     [-0.067073   -0.21222583]
     [-2.61438504 -0.52918681]
     [-0.81196212  0.92680078]
     [ 1.08074921 -3.63756792]
     [-1.00309908  1.00687933]]
    [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
    [ 1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1
      1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1]
    [-1 -1 -1 -1  1 -1  1  1 -1  1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1]

    可见其会将一些接近训练点的数据也预测为正常数据

    如果同时设置构建树时使用的属性为2,即max_features=2,而不是默认的1,结果为:

    [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
     1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
    [ 1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1
      1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1]
    [-1 -1 -1 -1  1 -1  1  1 -1  1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1]

    感觉训练效果更好了,测试数据基本上都能验证为正常点

    所以根据你自己的需要来配置参数吧

  • 相关阅读:
    android slidingview
    关于打工
    android开发基本流程
    android and javascript
    android listview and scrollview
    google收购的公司
    android反编译
    android资料
    colors
    【读书笔记】-- 文本可视化研究综述
  • 原文地址:https://www.cnblogs.com/wanghui-garcia/p/11475713.html
Copyright © 2011-2022 走看看