參考:http://scikit-learn.org/stable/modules/metrics.html
The sklearn.metrics.pairwise submodule
implements utilities to evaluate pairwise distances(样本对的距离) or affinity of sets of samples(样本集的相似度)。
Distance metrics are functions d(a, b) such that d(a, b) < d(a, c) if
objects a and b are considered “more similar” than objects a and c.
Kernels are measures of similarity, i.e. s(a, b) > s(a, c) if
objects a and b are considered “more similar” than objects a and c.
1、Cosine similarity
向量点积的L2-norm:
if and are row vectors, their cosine similarity is defined as:
This kernel is a popular choice
for computing the similarity of documents represented as tf-idf vectors.
2、Linear kernel
If x and y are column vectors, their linear kernel is:
(x, y) = x_transport * y
3、Polynomial kernel
Conceptually, the polynomial kernels considers not only the similarity between vectors under the same dimension, but also across dimensions. When used in machine learning algorithms, this allows to account for feature interaction.
The polynomial kernel is defined as:
4、Sigmoid kernel
defined as:
5、RBF kernel
defined as:
If the
kernel is known as the Gaussian kernel of variance .
6、Chi-squared kernel
defined as:
The chi-squared kernel is a very popular choice for training non-linear SVMs in computer vision applications. It can be computed usingchi2_kernel and then passed to an sklearn.svm.SVC with kernel="precomputed":
It can also be directly used as the kernel argument: