zoukankan      html  css  js  c++  java
  • 室内定位系列(四)——位置指纹法的实现(测试各种机器学习分类器)

    位置指纹法中最常用的算法是k最近邻(kNN)。本文的目的学习一下python机器学习scikit-learn的使用,尝试了各种常见的机器学习分类器,比较它们在位置指纹法中的定位效果。

    导入数据

    数据来源说明:http://www.cnblogs.com/rubbninja/p/6118430.html

    # 导入数据
    import numpy as np
    import scipy.io as scio
    offline_data = scio.loadmat('offline_data_random.mat')
    online_data = scio.loadmat('online_data.mat')
    offline_location, offline_rss = offline_data['offline_location'], offline_data['offline_rss']
    trace, rss = online_data['trace'][0:1000, :], online_data['rss'][0:1000, :]
    del offline_data
    del online_data
    
    # 定位准确度定义
    def accuracy(predictions, labels):
        return np.mean(np.sqrt(np.sum((predictions - labels)**2, 1)))
    

    knn回归

    # knn回归
    from sklearn import neighbors
    knn_reg = neighbors.KNeighborsRegressor(40, weights='uniform', metric='euclidean')
    %time knn_reg.fit(offline_rss, offline_location)
    %time predictions = knn_reg.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 92 ms
    Wall time: 182 ms
    accuracy:  2.24421479398 m
    

    Logistic regression (逻辑斯蒂回归)

    # 逻辑斯蒂回归是用来分类的
    labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
    from sklearn.linear_model import LogisticRegressionCV
    clf_l2_LR_cv = LogisticRegressionCV(Cs=20, penalty='l2', tol=0.001)
    predict_labels = clf_l2_LR.fit(offline_rss, labels).predict(rss)
    x = np.floor(predict_labels/100.0)
    y = predict_labels - x * 100
    predictions = np.column_stack((x, y)) * 100
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.08581348591 m
    

    Support Vector Machine for Regression (支持向量机)

    from sklearn import svm
    clf_x = svm.SVR(C=1000, gamma=0.01)
    clf_y = svm.SVR(C=1000, gamma=0.01)
    %time clf_x.fit(offline_rss, offline_location[:, 0])
    %time clf_y.fit(offline_rss, offline_location[:, 1])
    %time x = clf_x.predict(rss)
    %time y = clf_y.predict(rss)
    predictions = np.column_stack((x, y))
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 9min 27s
    Wall time: 12min 42s
    Wall time: 1.06 s
    Wall time: 1.05 s
    accuracy:  2.2468400825 m
    

    Support Vector Machine for Classification (支持向量机)

    from sklearn import svm
    labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
    clf_svc = svm.SVC(C=1000, tol=0.01, gamma=0.001)
    %time clf_svc.fit(offline_rss, labels)
    %time predict_labels = clf_svc.predict(rss)
    x = np.floor(predict_labels/100.0)
    y = predict_labels - x * 100
    predictions = np.column_stack((x, y)) * 100
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    Wall time: 1min 16s
    Wall time: 15 s
    accuracy:  2.50931890608 m
    

    random forest regressor (随机森林)

    from sklearn.ensemble import RandomForestRegressor
    estimator = RandomForestRegressor(n_estimators=150)
    %time estimator.fit(offline_rss, offline_location)
    %time predictions = estimator.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    Wall time: 58.6 s
    Wall time: 196 ms
    accuracy:  2.20778352008 m
    

    random forest classifier (随机森林)

    from sklearn.ensemble import RandomForestClassifier
    labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
    estimator = RandomForestClassifier(n_estimators=20, max_features=None, max_depth=20) # 内存受限,tree的数量有点少
    %time estimator.fit(offline_rss, labels)
    %time predict_labels = estimator.predict(rss)
    x = np.floor(predict_labels/100.0)
    y = predict_labels - x * 100
    predictions = np.column_stack((x, y)) * 100
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    Wall time: 39.6 s
    Wall time: 113 ms
    accuracy:  2.56860790666 m
    

    Linear Regression (线性回归)

    from sklearn.linear_model import LinearRegression
    predictions = LinearRegression().fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.83239841667 m
    

    Ridge Regression (岭回归)

    from sklearn.linear_model import RidgeCV
    clf = RidgeCV(alphas=np.logspace(-4, 4, 10))
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.83255676918 m
    

    Lasso回归

    from sklearn.linear_model import MultiTaskLassoCV
    clf = MultiTaskLassoCV(alphas=np.logspace(-4, 4, 10))
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.83244688001 m
    

    Elastic Net (弹性网回归)

    from sklearn.linear_model import MultiTaskElasticNetCV
    clf = MultiTaskElasticNetCV(alphas=np.logspace(-4, 4, 10))
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, 'm'
    
    accuracy:  3.832486036 m
    

    Bayesian Ridge Regression (贝叶斯岭回归)

    from sklearn.linear_model import BayesianRidge
    from sklearn.multioutput import MultiOutputRegressor
    clf = MultiOutputRegressor(BayesianRidge())
    predictions = clf.fit(offline_rss, offline_location).predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    accuracy:  3.83243319129 m
    

    Gradient Boosting for regression (梯度提升)

    from sklearn import ensemble
    from sklearn.multioutput import MultiOutputRegressor
    clf = MultiOutputRegressor(ensemble.GradientBoostingRegressor(n_estimators=100, max_depth=10))
    %time clf.fit(offline_rss, offline_location)
    %time predictions = clf.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 43.4 s
    Wall time: 17 ms
    accuracy:  2.22100945095 m
    

    Multi-layer Perceptron regressor (神经网络多层感知器)

    from sklearn.neural_network import MLPRegressor
    clf = MLPRegressor(hidden_layer_sizes=(100, 100))
    %time clf.fit(offline_rss, offline_location)
    %time predictions = clf.predict(rss)
    acc = accuracy(predictions, trace)
    print "accuracy: ", acc/100, "m"
    
    Wall time: 1min 1s
    Wall time: 6 ms
    accuracy:  2.4517504109 m
    

    总结

    上面的几个线性回归模型显然效果太差,这里汇总一下其他的一些回归模型:

    算法 定位精度
    knn 2.24m
    logistic regression 3.09m
    support vector machine 2.25m
    random forest 2.21m
    Gradient Boosting for regression 2.22m
    Multi-layer Perceptron regressor 2.45m

    从大致的定位精度上看,KNN、SVM、RF、GBDT这四个模型比较好(上面很多算法并没有仔细地调参数,这个结果也比较粗略,神经网络完全不知道怎么去调...)。此外要注意的是,SVM训练速度慢,调参太麻烦,KNN进行预测时的时间复杂度应该是和训练数据量成正比的,从定位的实时性上应该不如RF和GBDT。


    作者:[rubbninja](http://www.cnblogs.com/rubbninja/) 出处:[http://www.cnblogs.com/rubbninja/](http://www.cnblogs.com/rubbninja/) 关于作者:目前主要研究领域为机器学习与无线定位技术,欢迎讨论与指正! 版权声明:本文版权归作者和博客园共有,转载请注明出处。
  • 相关阅读:
    PHP实现无限极分类
    html2canvas生成并下载图片
    一次线上问题引发的过程回顾和思考,以更换两台服务器结束
    Intellij IDEA启动项目报Command line is too long. Shorten command line for XXXApplication or also for
    mq 消费消息 与发送消息传参问题
    idea 创建不了 java 文件
    Java switch 中如何使用枚举?
    Collections排序
    在idea 设置 git 的用户名
    mongodb添加字段和创建自增主键
  • 原文地址:https://www.cnblogs.com/rubbninja/p/6186847.html
Copyright © 2011-2022 走看看