zoukankan      html  css  js  c++  java
  • SVM之Python实现

    SVM Python实现

    Python实现SVM的理论知识

    • SVM原始最优化问题:

    [min_{w,b,xi}{1over{2}}{||w||}^2 + Csum_{i=1}^mxi^{(i)} ]

    [s.t. y^{(i)}(w^{T}x^{(i)} + b), i=1,2,...,m \ xi^{(i)} ge 0, i=1,2,...m ]

    • 原始问题转为对偶问题

    [min_{alpha}{1over{2}}sum_{i=1}^msum_{j=1}^{m}alpha^{(i)}alpha^{(j)}y^{(i)}y^{(j)}K(x^{(i)},x^{(j)})-sum_{i=1}^malpha^{(i)} ]

    [s.t. sum_{i=1}^malpha^{(i)}y^{(i)}=0 \ 0 le alpha^{(i)} le C, i=1,2,...,m ]

    • 对于第2点, 设(alpha^*=(alpha^*_{1},alpha^*_{2}, ..., alpha^*_{m})), 若存在(alpha^*)的一个分量(alpha*_{j}, 0 lt alpha^*_{j} lt C), 则原始问题的解(w^*,b^*)

    [w^* = sum_{i=1}^malpha^*_{i}y^{(i)}x^{(i)} \ b^* = y^{(j)} - sum_{i=1}^my^{(i)}alpha^{(i)}K(x^{(i)}, x^{(j)}) ]

    • SMO算法中使用到的公式(公式用使用下标1和2不是指在样本中的第1个样本和第2个样本, 而是指第1个参数和第2个参数, 在编程中我们使用i和j替代, i和j是在X输入样本中的样本下标)

      • 计算(E^{(i)}) -> 在calc_E函数中

      [E^{(i)} = g(x^{(i)})-y^{(i)} \ 其中g(x^{(i)}) = sum_{i=1}^malpha^{(i)}y^{(i)}K(x^{(i)},x^{(j)}) + b \ 所以E^{(i)} = (sum_{i=1}^malpha^{(i)}y^{(i)}K(x^{(i)},x^{(j)}) + b) - y^{(i)} where i = 1,2 ]

      • 计算(alpha^{new,unc}_2) -> 在iter函数中

      [alpha^{new,unc}_2 = alpha^{old}_2 + {{y_2}(E^{(i)} - E^{(j)})over{eta}} \ 其中eta = K_{11} + K_{22} - 2K_{12} \ 注意: K_{11}指的是在使用核函数映射之后的输入样本中的第i行与第j行样本, \ 同理K_{22}指的是在使用核函数映射之后的输入样本中的第i行与第j行样本... \ 注意: eta不能为小于等于0, 如果出现这种情况, 则在迭代函数中直接返回0 ]

      • 裁剪(alpha^{new,unc}_1) -> 在clip_alpha函数中

        • 如果(y^{(1)} e y^{(2)})

          [L=max(0,alpha^{old}_2-alpha^{old}_1) \ H=min(C, C + alpha^{old}_2 - alpha^{old}_1) ]

        • 如果(y^{(1)}=y^{(2)})

          [L=max(0,alpha^{old}_2+alpha^{old}_1-C) \ H=min(C, alpha^{old}_2+alpha^{old}_1) ]

        • 注意: 得到的L与H不能相同, 如果相同则直接返回0
        • 定义函数clip_alpha(alpha, L, H)
          
          def calc_alpha(alpha, L, H):
              if alpha > H:
                  alpha = H
              elif alpha < L:
                  alpha = L
              return alpha
          
        • 计算(alpha^{new}_1)

        [alpha^{new}_1=alpha^{old}_1+y^{(1)}y^{(2)}(alpha^{old}_2-alpha^{new}_2) ]

        • 计算出(alpha^{new}_1)之后, 比较(abs(alpha^{new}_1-alpha^{old}_1))与我们规定的精度的值(一般零点几), 如果小于精度则直接返回0
      • 选择第2个(alpha_2), 选择依据就是让(abs(^{(i)}-E^{(j)}))最大, 那个(j)就是我们期望的值, 从而(alpha_2=alpha_j), 定义函数select_j(Ei, i, model)

        # model封装了整个SVM公式中的参数(C, xi)与数据(X, y)
        # 其中model还有一个E属性, E为一个mx2的矩阵, 初始情况下都为0, 如果E对应的alpha被访问处理过了, 就会在赋予model.E[i, :] = [1, Ei]
        def select_j(Ei, i, model):
            j = 0
            max_delta_E = 0
            Ej = 0
            # 查找所有已经被访问过了样本
            nonzeros_indice = nonzeros(model.E[:, 0])[0]
            if len(nonzeros_indice) > 1:
                # 在for循环中查找使得abs(Ei-Ej)最大的Ej和j
                for index in nonzeros_indice:
                    # 选择的两个alpha不能来自同一个样本
                    if index == i: 
                        continue
                    E_temp = calc_E(i, model)
                    delta_E = abs(E_temp - Ei)
                    if delta_E > max_delta_E:
                        delta_E = max_delta_E
                        j = index
                        Ej = E_temp
            else:
                j = i
                while j = i:
                    Ej = int(random.uniform(0, model.m))
            return j, Ej
        
      • (alpha_1)是否违反了KKT条件

      
      if (yi * Ei > toler and alphai > 0) or (yi * Ei < -toler and alphaj < 0):
          # 违反了
          pass
      else:
          # 没有违反KKT, 直接返回0
          pass
      
    • 使用SMO算法对对偶问题求解, 求出(alpha), 从而得出(w,b), 大致思路如下

      • 初始化(alpha)向量为(m imes1), 元素为0的向量, 一开始(alpha^{(1)})的选择没有之前的依据, 所以使用从第一个(alpha)开始选取
      • 如果选入的(alpha)没有违反KKT条件则跳过, 迭代下一个(alpha)
      • 将选出的(alpha^{(1)})代入iter函数, 该函数的工作是根据当前(alpha^{(1)})选择出第二个(alpha^{(2)}), 接着根据公式更新(alpha^{(2)},alpha^{(1)}), 公式在上面已经给出, 注意选出来的(alpha_2)的在输入样本中不能与(alpha_1)是同一个
      • 迭代完所有(alpha), 下一步就是找出满足支持向量条件的(alpha), 即(0 le alpha le C), 再将他们迭代

    Python实现SVM的代码

    
    
    #!/usr/bin/env python
    from numpy import *
    
    
    class Model(object):
    
        def __init__(self, X, y, C, toler, kernel_param):
            self.X = X
            self.y = y
            self.C = C
            self.toler = toler
            self.kernel_param = kernel_param
            self.m = shape(X)[0]
            self.mapped_data = mat(zeros((self.m, self.m)))
            for i in range(self.m):
                self.mapped_data[:, i] = gaussian_kernel(self.X, X[i, :], self.kernel_param)
            self.E = mat(zeros((self.m, 2)))
            self.alphas = mat(zeros((self.m, 1)))
            self.b = 0
    
    
    def load_data(filename):
        X = []
        y = []
        with open(filename, 'r') as fd:
            for line in fd.readlines():
                nums = line.strip().split(',')
                X_temp = []
                for i in range(len(nums)):
                    if i == len(nums) - 1:
                        y.append(float(nums[i]))
                    else:
                        X_temp.append(float(nums[i]))
                X.append(X_temp)
        return mat(X), mat(y)
    
    def gaussian_kernel(X, l, kernel_param): 
        sigma = kernel_param 
        m = shape(X)[0]
        mapped_data = mat(zeros((m, 1)))
        for i in range(m):
            mapped_data[i] = exp(-sum((X[i, :] - l).T * (X[i, :] - l) / (2 * sigma ** 2)))
        return mapped_data
    
    def clip_alpha(L, H, alpha):
        if alpha > H:
            alpha = H
        elif alpha < L:
            alpha = L
        return alpha
    
    def calc_b(b1, b2):
        return (b1 + b2) / 2
    
    def calc_E(i, model):
        yi = float(model.y[i])
        gxi = float(multiply(model.alphas, model.y).T * model.mapped_data[:, i] + model.b)
        Ei = gxi - yi
        return Ei
    
    def select_j(Ei, i, model):
        nonzero_indices = nonzero(model.E[:, 0].A)[0]
        Ej = 0
        j = 0
        max_delta = 0
        if len(nonzero_indices) > 1:
            for index in nonzero_indices:
                if index == i:
                    continue
                E_temp = calc_E(index, model)
                delta = abs(E_temp - Ei)
                if delta > max_delta:
                    max_delta = delta
                    Ej = E_temp
                    j = index
        else:
            j = i
            while j == i:
                j = int(random.uniform(0, model.m))
            Ej = calc_E(j, model)
        return j, Ej
    
    def iterate(i, model):
        yi = model.y[i]
        Ei = calc_E(i, model)
        model.E[i] = [1, Ei]
        # 如果alpahi不满足KKT条件, 则进行之后的操作, 选择alphaj, 更新alphai与alphaj, 还有b
        if (yi * Ei > model.toler and model.alphas[i] > 0) or (yi * Ei < -model.toler and model.alphas[i] < model.C):
            # alphai不满足KKT条件
            # 选择alphaj
            j, Ej = select_j(Ei, i, model)
            yj = model.y[j] 
            alpha1old = model.alphas[i].copy()
            alpha2old = model.alphas[j].copy()
            eta = model.mapped_data[i, i] + model.mapped_data[j, j] - 2 * model.mapped_data[i, j]   
            if eta <= 0:
                return 0
            alpha2new_unclip = alpha2old + yj * (Ei - Ej) / eta
            if yi == yj:
                L = max(0, alpha2old + alpha1old - model.C)
                H = min(model.C, alpha1old + alpha2old)
            else:
                L = max(0, alpha2old - alpha1old)
                H = min(model.C, model.C - alpha1old + alpha2old)
            if L == H:
                return 0
            alpha2new = clip_alpha(L, H, alpha2new_unclip)
            if abs(alpha2new - alpha2old) < 0.00001:
               return 0
            alpha1new = alpha1old + yi * yj * (alpha2old - alpha2new)
            b1new = -Ei - yi * model.mapped_data[i, i] * (alpha1new - alpha1old) 
                    - yj * model.mapped_data[j, i] * (alpha2new - alpha2old) + model.b
            b2new = -Ej - yi * model.mapped_data[i, j] * (alpha1new - alpha1old) 
                    - yj * model.mapped_data[j, j] * (alpha2new - alpha2old) + model.b
            model.b = calc_b(b1new, b2new)
            model.alphas[i] = alpha1new
            model.alphas[j] = alpha2new
            model.E[i] = [1, calc_E(i, model)]
            model.E[j] = [1, calc_E(j, model)]
            return 1
        return 0
    
    def smo(X, y, C, toler, iter_num, kernel_param):
        model = Model(X, y.T, C, toler, kernel_param)
        changed_alphas = 0
        current_iter = 0
        for i in range(model.m):
            changed_alphas += iterate(i, model)
            print("iter:%d i:%d,pairs changed %d"
                  %(current_iter, i, changed_alphas))
        current_iter += 1
        print('start...') 
        while current_iter < iter_num and changed_alphas > 0:
            changed_alphas = 0
            # 处理支持向量
            alphas_indice = nonzero((model.alphas.A > 0) * (model.alphas.A < C))[0]
            for i in alphas_indice:
                changed_alphas += iterate(i, model)
                print("iter:%d i:%d,pairs changed %d"
                      %(current_iter, i, changed_alphas))
            current_iter += 1
        return model.alphas, model.b
    
    • 注意: 在测试SVM的时候, 我们也需要把测试集通过核函数转为m个特征的输入, 所以我们需要将训练集和测试集代入核函数中
  • 相关阅读:
    【拆点费用流】【HDU1853】【 Cyclic Tour】
    【最小费用最大流】【HDU1533】【Going Home】
    【最大流,二分图匹配】【hdu2063】【过山车】
    【最小费用最大流模板】【Uva10806+Spring Team PK】Dijkstra, Dijkstra,
    【最大流之sap】【HDU1532】模板题
    HDU 6130 Kolakoski 思维个屁 水题
    Codeforces 837 D Round Subset DP 思维
    Educational Codeforces Round 26
    Codeforces Round 427 B The name on the board 水题
    Codeforces Round 428 B Game of the rows 贪心 思维
  • 原文地址:https://www.cnblogs.com/megachen/p/10491461.html
Copyright © 2011-2022 走看看