zoukankan      html  css  js  c++  java
  • PRML 5: Kernel Methods

      A kernel function implicitly maps a data point into some high-dimensional feature space and substitutes for the inner product of two feature vectors, so that a non-linearly separable classification problem can be converted into a linearly separable one. This trick can be applied to many feature vector-based models such as SVM, which we have introduced in previous articles.

     

      To test the validity of a kernel function, we need the Mercer Theorem: function $k:mathbb{R}^m imesmathbb{R}^m ightarrowmathbb{R}$ is a Mercer kernel iff for all finite sets ${vec{x}_1,vec{x}_2,...,vec{x}_n}$, the corresponding kernel matrix is proved to be symmetric positive semi-definite.

      One of the good kernel functions is the Gaussian kernel $k(vec{x}_m,vec{x}_n)=exp{-frac{1}{2sigma^2}||vec{x}_m-vec{x}_n||^2}$, which has infinite dimensionality. Another one is the polynomial kernel $k(vec{x}_m,vec{x}_n)=(vec{x}_m^Tvec{x}_n+c)^M$ with $c>0$. In reality, we can construct a new kernel function with some simple valid kernels according to some properties.

      We can also use a generative model to define kernel functions, such as:

      (1) $k(vec{x}_m,vec{x}_n)=int p(vec{x}_m ext{ | }vec{z})cdot p(vec{x}_n ext{ | }vec{z})cdot p(vec{z})cdot dvec{z}$, where $vec{z}$ is a latent variable;

      (2) $k(vec{x}_m,vec{x}_n)=g(vec{ heta},vec{x})^TF^{-1}g(vec{ heta},vec{x})$, where $g(vec{ heta},vec{x})=igtriangledown_{vec{ heta}}ln{p(vec{x} ext{ | }vec{ heta})}$ is the Fisher score,

       and  $F=frac{1}{N}sum_{n=1}^N g(vec{ heta},vec{x}_n)g(vec{ heta},vec{x}_n)^T$ is the Fisher information matrix.

      Gaussian Process is a probabilistic discriminative model, where an assumption is made that the set of values of $y(x)$ evaluated at an arbitrary set of points ${vec{x}_1,vec{x}_2,...,vec{x}_N}$ is jointly Gaussian distributed. Here we use the kernel matrix to determine the covariance.

      Gaussian Process for Regression:

      Typically, we choose  $k(vec{x}_m,vec{x}_n)= heta_0 exp{-frac{ heta_1}{2}||vec{x}_n-vec{x}_m||^2}+ heta_2+ heta_3 vec{x}_m^Tvec{x}_n$, and assume that:

      (1) prior distribution  $p(vec{y}_N)=Gauss(vec{y}_N ext{ | }vec{0},K_N)$;
      (2) likelihood    $p(vec{t}_N ext{ | }vec{y}_N)=Gauss(vec{t}_N ext{ | }vec{y}_N,eta^{-1}I_N)$.

      Then, we have $p(vec{t}_N)=int p(vec{t}_N ext{ | }vec{y}_N)cdot p(vec{y}_N)cdot dvec{y}_N=Gauss(vec{t}_N ext{ | }vec{0},K_N+eta^{-1}I_N)$. Here, $p(vec{t}_N)$ is the likelihood of hyperparameter $vec{ heta}$, and we can make an MLE to learn $vec{ heta}$.

      Also, $p(vec{t}_{N+1})=Gauss(vec{t}_{N+1} ext{ | }vec{0},K_{N+1}+eta^{-1}I_{N+1})$. Hence, denote $vec{k}=[k(vec{x}_1,vec{x}_{N+1}),k(vec{x}_2,vec{x}_{N+1}),...,k(vec{x}_N,vec{x}_{N+1})]^T$, then we can get the conditional Gaussian  $p(vec{t}_{N+1} ext{ | }vec{t}_N) = Gauss(vec{k}^T(K_N+eta^{-1}I_N)^{-1}vec{t}_N,k(vec{x}_{N+1},vec{x}_{N+1})-vec{k}^T(K_N+eta^{-1}I_N)^{-1}vec{k}+eta^{-1})$

      Gaussian Process for Classification:

      We make an assumption that $p(t_N ext{ | }a_N)=sigma(a_N)$, and take the following steps:

      (1) Calculate $p(vec{a}_N ext{ | }vec{t}_N)$ by Laplace approximation;

      (2) Given $p(vec{a}_N ext{ | }vec{t}_N)$ and $p(vec{a}_{N+1} ext{ | }vec{t}_{N+1})$, $p(a_{N+1} ext{ | }vec{a}_N)$ is a conditional Gaussian;

      (3) $p(a_{N+1} ext{ | }vec{t}_N)=int p(a_{N+1} ext{ | }vec{a}_N)cdot p(vec{a}_N ext{ | }vec{t}_N)cdot dvec{a}_N$;

      (4) $p(t_{N+1} ext{ | }vec{t}_N)=int sigma(a_{N+1})cdot p(a_{N+1} ext{ | }vec{t}_N)cdot dvec{a}_{N+1}$.

    References:

      1. Bishop, Christopher M. Pattern Recognition and Machine Learning [M]. Singapore: Springer, 2006

     

  • 相关阅读:
    [Spark内核] 第38课:BlockManager架构原理、运行流程图和源码解密
    [Spark内核] 第37课:Task执行内幕与结果处理解密
    [Spark内核] 第36课:TaskScheduler内幕天机解密:Spark shell案例运行日志详解、TaskScheduler和SchedulerBackend、FIFO与FAIR、Task运行时本地性算法详解等
    [Spark内核] 第35课:打通 Spark 系统运行内幕机制循环流程
    [Spark性能调优] 第三章 : Spark 2.1.0 中 Sort-Based Shuffle 产生的内幕
    [Spark内核] 第34课:Stage划分和Task最佳位置算法源码彻底解密
    spring boot 日志收集整理
    mybatis 使用redis实现二级缓存(spring boot)
    MD5收集整理
    IntelliJ IDEA 2019.2最新版本免费激活码
  • 原文地址:https://www.cnblogs.com/DevinZ/p/4575988.html
Copyright © 2011-2022 走看看