k-nearest neighbors algorithm - Wikipedia
- https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
- Not to be confused with k-means clustering.
- In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.
- k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.
学习笔记之scikit-learn - 浩然119 - 博客园
- https://www.cnblogs.com/pegasus923/p/9997485.html
- 1.6. Nearest Neighbors — scikit-learn 0.20.2 documentation
- https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification
Machine Learning with Python: k-Nearest Neighbor Classifier in Python
- https://www.python-course.eu/k_nearest_neighbor_classifier.php
Refining a k-Nearest-Neighbor classification
- https://www3.nd.edu/~steve/computing_with_data/17_Refining_kNN/refining_knn.html
1.13. Feature selection — scikit-learn 0.20.2 documentation
- https://scikit-learn.org/stable/modules/feature_selection.html
K近邻法(KNN)原理小结 - 刘建平Pinard - 博客园
- http://www.cnblogs.com/pinard/p/6061661.html
- 1. KNN算法三要素
- 2. KNN算法蛮力实现
- 3. KNN算法之KD树实现原理
- 4. KNN算法之球树实现原理
- 5. KNN算法的扩展
- 6. KNN算法小结
scikit-learn K近邻法类库使用小结 - 刘建平Pinard - 博客园
- https://www.cnblogs.com/pinard/p/6065607.html
- 1. scikit-learn 中KNN相关的类库概述
- 2. K近邻法和限定半径最近邻法类库参数小结
- 3. 使用KNeighborsClassifier做分类的实例
特征工程之特征选择 - 刘建平Pinard - 博客园
- https://www.cnblogs.com/pinard/p/9032759.html
特征工程之特征表达 - 刘建平Pinard - 博客园
- https://www.cnblogs.com/pinard/p/9061549.html
特征工程之特征预处理 - 刘建平Pinard - 博客园
- https://www.cnblogs.com/pinard/p/9093890.html
精确率与召回率,RoC曲线与PR曲线 - 刘建平Pinard - 博客园
- https://www.cnblogs.com/pinard/p/5993450.html
k selection
- 设定区间范围,e.g. [1, 25],测试所有k再比较结果
Feature selection
- ablation study : removing some “feature” of the model or algorithm, and seeing how that affects performance.
- 注意如果去掉一个feature之后结果并没有变化,不能说明这个feature没用,原因可能是:
- conditionally independant of the given feature : 其他feature对结果的影响跟它一样
- 不相关feature
- 注意如果去掉一个feature之后结果并没有变化,不能说明这个feature没用,原因可能是:
- test with specified features only
- 注意一个feature有可能跟其他feature一起配合才对结果有positive impact
- test with all combination of features
- 最全面的方法是覆盖所有组合,但是费时
- 折中的方法是从上面两种测试结果中选择出一个小范围有用的feature list,然后测试feature list,跟all features比较性能