zoukankan      html  css  js  c++  java
  • 室内定位系列(四)——位置指纹法的实现(测试各种机器学习分类器)

    位置指纹法中最常用的算法是k最近邻(kNN)。本文的目的学习一下python机器学习scikit-learn的使用,尝试了各种常见的机器学习分类器,比较它们在位置指纹法中的定位效果。

    导入数据

    数据来源说明:http://www.cnblogs.com/rubbninja/p/6118430.html

    # 导入数据
    import numpy as np
    import scipy.io as scio
    offline_data = scio.loadmat('offline_data_random.mat')
    online_data = scio.loadmat('online_data.mat')
    offline_location, offline_rss = offline_data['offline_location'], offline_data['offline_rss']
    trace, rss = online_data['trace'][0:1000, :], online_data['rss'][0:1000, :]
    del offline_data
    del online_data
    
    # 定位准确度定义
    def accuracy(predictions, labels):
        return np.mean(np.sqrt(np.sum((predictions - labels)**2, 1)))
    

    knn回归

    # knn回归
    from sklearn import neighbors
    knn_reg = neighbors.KNeighborsRegressor(40, weights='uniform', metric='euclidean')
    %time knn_reg.fit(offline_rss, offline_location)
    %time predictions = knn_reg.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 92 ms
    Wall time: 182 ms
    accuracy:  2.24421479398 m
    

    Logistic regression (逻辑斯蒂回归)

    # 逻辑斯蒂回归是用来分类的
    labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
    from sklearn.linear_model import LogisticRegressionCV
    clf_l2_LR_cv = LogisticRegressionCV(Cs=20, penalty='l2', tol=0.001)
    predict_labels = clf_l2_LR.fit(offline_rss, labels).predict(rss)
    x = np.floor(predict_labels/100.0)
    y = predict_labels - x * 100
    predictions = np.column_stack((x, y)) * 100
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.08581348591 m
    

    Support Vector Machine for Regression (支持向量机)

    from sklearn import svm
    clf_x = svm.SVR(C=1000, gamma=0.01)
    clf_y = svm.SVR(C=1000, gamma=0.01)
    %time clf_x.fit(offline_rss, offline_location[:, 0])
    %time clf_y.fit(offline_rss, offline_location[:, 1])
    %time x = clf_x.predict(rss)
    %time y = clf_y.predict(rss)
    predictions = np.column_stack((x, y))
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 9min 27s
    Wall time: 12min 42s
    Wall time: 1.06 s
    Wall time: 1.05 s
    accuracy:  2.2468400825 m
    

    Support Vector Machine for Classification (支持向量机)

    from sklearn import svm
    labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
    clf_svc = svm.SVC(C=1000, tol=0.01, gamma=0.001)
    %time clf_svc.fit(offline_rss, labels)
    %time predict_labels = clf_svc.predict(rss)
    x = np.floor(predict_labels/100.0)
    y = predict_labels - x * 100
    predictions = np.column_stack((x, y)) * 100
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    Wall time: 1min 16s
    Wall time: 15 s
    accuracy:  2.50931890608 m
    

    random forest regressor (随机森林)

    from sklearn.ensemble import RandomForestRegressor
    estimator = RandomForestRegressor(n_estimators=150)
    %time estimator.fit(offline_rss, offline_location)
    %time predictions = estimator.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    Wall time: 58.6 s
    Wall time: 196 ms
    accuracy:  2.20778352008 m
    

    random forest classifier (随机森林)

    from sklearn.ensemble import RandomForestClassifier
    labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
    estimator = RandomForestClassifier(n_estimators=20, max_features=None, max_depth=20) # 内存受限,tree的数量有点少
    %time estimator.fit(offline_rss, labels)
    %time predict_labels = estimator.predict(rss)
    x = np.floor(predict_labels/100.0)
    y = predict_labels - x * 100
    predictions = np.column_stack((x, y)) * 100
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    Wall time: 39.6 s
    Wall time: 113 ms
    accuracy:  2.56860790666 m
    

    Linear Regression (线性回归)

    from sklearn.linear_model import LinearRegression
    predictions = LinearRegression().fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.83239841667 m
    

    Ridge Regression (岭回归)

    from sklearn.linear_model import RidgeCV
    clf = RidgeCV(alphas=np.logspace(-4, 4, 10))
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.83255676918 m
    

    Lasso回归

    from sklearn.linear_model import MultiTaskLassoCV
    clf = MultiTaskLassoCV(alphas=np.logspace(-4, 4, 10))
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.83244688001 m
    

    Elastic Net (弹性网回归)

    from sklearn.linear_model import MultiTaskElasticNetCV
    clf = MultiTaskElasticNetCV(alphas=np.logspace(-4, 4, 10))
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.832486036 m
    

    Bayesian Ridge Regression (贝叶斯岭回归)

    from sklearn.linear_model import BayesianRidge
    from sklearn.multioutput import MultiOutputRegressor
    clf = MultiOutputRegressor(BayesianRidge())
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    accuracy:  3.83243319129 m
    

    Gradient Boosting for regression (梯度提升)

    from sklearn import ensemble
    from sklearn.multioutput import MultiOutputRegressor
    clf = MultiOutputRegressor(ensemble.GradientBoostingRegressor(n_estimators=100, max_depth=10))
    %time clf.fit(offline_rss, offline_location)
    %time predictions = clf.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 43.4 s
    Wall time: 17 ms
    accuracy:  2.22100945095 m
    

    Multi-layer Perceptron regressor (神经网络多层感知器)

    from sklearn.neural_network import MLPRegressor
    clf = MLPRegressor(hidden_layer_sizes=(100, 100))
    %time clf.fit(offline_rss, offline_location)
    %time predictions = clf.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 1min 1s
    Wall time: 6 ms
    accuracy:  2.4517504109 m
    

    总结

    上面的几个线性回归模型显然效果太差,这里汇总一下其他的一些回归模型:

    算法 定位精度
    knn 2.24m
    logistic regression 3.09m
    support vector machine 2.25m
    random forest 2.21m
    Gradient Boosting for regression 2.22m
    Multi-layer Perceptron regressor 2.45m

    从大致的定位精度上看,KNN、SVM、RF、GBDT这四个模型比较好(上面很多算法并没有仔细地调参数,这个结果也比较粗略,神经网络完全不知道怎么去调...)。此外要注意的是,SVM训练速度慢,调参太麻烦,KNN进行预测时的时间复杂度应该是和训练数据量成正比的,从定位的实时性上应该不如RF和GBDT。


    作者:[rubbninja](http://www.cnblogs.com/rubbninja/) 出处:[http://www.cnblogs.com/rubbninja/](http://www.cnblogs.com/rubbninja/) 关于作者:目前主要研究领域为机器学习与无线定位技术,欢迎讨论与指正! 版权声明:本文版权归作者和博客园共有,转载请注明出处。
  • 相关阅读:
    [z]单次遍历带权随机选取
    [Z]CiteSeer统计的计算机领域的期刊和会议的影响因子
    神奇的make自动生成include file的功能
    简记特定容器list和forward_list算法
    插入、流和反向迭代器
    参数绑定
    保研流程记录
    VS2019界面透明、主题修改和导出设置
    C++11 lambda表达式小结
    springboot-mybatis-demo遇到的坑
  • 原文地址:https://www.cnblogs.com/rubbninja/p/6186847.html
Copyright © 2011-2022 走看看