zoukankan      html  css  js  c++  java
  • Python for Data Science

    Chapter 6 - Other Popular Machine Learning Models Models

    Segment 3 - Instance-based learning w/ k-Nearest Neighbor

    K-Nearest Neighbor Classification

    A supervised classifier that memorizes observations from within a test set to predict classification labels for new, unlabeled observations

    KNN makes predictions based on how similar training observations are to the new, incoming observations.

    The more similar the observation values, the more likely they will be classified with the same label.

    K-Nearest Neighbor Use Cases

    • Stock Price Prediction
    • Credit Risk Analysis
    • Predictive Trip Planning
    • Recommendation Systems

    KNN Model Assumptions

    • Dataset has little noise
    • Dataset is labeled
    • Dataset only contains relevant features
    • Dataset has distinguishable subgroups
    • Avoid using KNN on large datasets It will probably take a long time

    Setting up for classification analysis

    import numpy as np
    import pandas as pd
    import scipy
    import urllib
    import sklearn
    
    import matplotlib.pyplot as plt
    from pylab import rcParams
    
    from sklearn import neighbors
    from sklearn import preprocessing
    from sklearn.model_selection import train_test_split
    from sklearn import metrics
    
    from sklearn.neighbors import KNeighborsClassifier
    
    np.set_printoptions(precision=4, suppress=True) 
    %matplotlib inline
    rcParams['figure.figsize'] = 7, 4
    plt.style.use('seaborn-whitegrid')
    

    Importing your data

    address = '~/Data/mtcars.csv'
    
    cars = pd.read_csv(address)
    cars.columns = ['car_names','mpg','cyl','disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']
    
    X_prime = cars[['mpg','disp','hp','wt']].values
    y = cars.iloc[:,9].values
    
    X = preprocessing.scale(X_prime)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=17)
    

    Building and training your model with training data

    clf = neighbors.KNeighborsClassifier()
    clf.fit(X_train,y_train)
    print(clf)
    
    KNeighborsClassifier()
    

    Evaluating your model's predictions

    y_pred = clf.predict(X_test)
    y_expect = y_test
    
    print(metrics.classification_report(y_expect, y_pred))
    
                  precision    recall  f1-score   support
    
               0       0.80      1.00      0.89         4
               1       1.00      0.67      0.80         3
    
        accuracy                           0.86         7
       macro avg       0.90      0.83      0.84         7
    weighted avg       0.89      0.86      0.85         7
    

    Recall: a measure of your model's completeness.

    • Of all your points that were labeled 1, only 67% of the results that were retuned were truly relevant
    • Of the entire dataset, 83% of the results that were returned were truly relevant

    High precision + Low recall = Few results returned, but many of the label predictions that are returned are correct.

  • 相关阅读:
    20145226夏艺华 《Java程序设计》第9周学习总结
    20145226夏艺华 EXP5 MSF基础应用
    20145226夏艺华 《Java程序设计》第7&8周学习总结、实验一
    20145226夏艺华 网络对抗技术EXP4 恶意代码分析
    Qt 图形视图框架<二>——<QGraphicsView、QGraphicsScene>
    Qt 图形视图框架<一>——<QGraphicsItem>
    【转载】C++ 自由存储区是否等价于堆?
    QML学习(五)——<TextInput和TextEdif输入框>
    QML学习(四)——<Text显示>
    QML学习(二)——<QML语法>
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14332330.html
Copyright © 2011-2022 走看看