zoukankan      html  css  js  c++  java
  • assignment1SVM的一些经验

    def svm_loss_vectorized(W, X, y, reg):
      """
      Structured SVM loss function, vectorized implementation.
    
      Inputs and outputs are the same as svm_loss_naive.
      """
      loss = 0.0e0
      dW = np.zeros(W.shape,dtype='float64') # initialize the gradient as zero
    
      #############################################################################
      # TODO:                                                                     #
      # Implement a vectorized version of the structured SVM loss, storing the    #
      # result in loss.                                                           #
      #############################################################################
      pass
      #############################################################################
      #                             END OF YOUR CODE                              #
      #############################################################################
      num_train = X.shape[0]
      score = np.dot(X, W)
      loss_matrix = np.maximum(0, score - score[np.arange(num_train), np.array(y)].reshape(-1, 1) + 1)
      loss_matrix[np.arange(num_train), np.array(y)] = 0
      loss = np.sum(loss_matrix)
      loss /= num_train
      loss += 0.5 * reg * np.sum(W * W)
      
      
    
      #############################################################################
      # TODO:                                                                     #
      # Implement a vectorized version of the gradient for the structured SVM     #
      # loss, storing the result in dW.                                           #
      #                                                                           #
      # Hint: Instead of computing the gradient from scratch, it may be easier    #
      # to reuse some of the intermediate values that you used to compute the     #
      # loss.                                                                     #
      #############################################################################
      num_classes = W.shape[1]
      coeff_mat = np.zeros((num_train, num_classes))
      coeff_mat[loss_matrix > 0] = 1
      coeff_mat[range(num_train), list(y)] = 0
      coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)
    
      dW = (X.T).dot(coeff_mat)
      dW /= num_train 
      dW += reg * W
      #############################################################################
      #                             END OF YOUR CODE                              #
      #############################################################################
    
      return loss, dW

    这里面,有一句很难理解:

      loss_matrix = np.maximum(0, score - score[np.arange(num_train), np.array(y)].reshape(-1, 1) + 1)
    当时看了很久,后来想通了,我们拆开来看,就不会很难了。
    score[np.arange(num_train), np.array(y)]是从分数中,把正确的分数提取出来。下图中,那个小红框,就表示当前正确的分类对应的分数。提取出来之后,就是N*1维的矩阵
    score - score[np.arange(num_train), np.array(y)].reshape(-1, 1)这个减法虽然维度不匹配,但是有boardcasting技术,后面的矩阵会自动列复制到维度N*C

      num_classes = W.shape[1]
      coeff_mat = np.zeros((num_train, num_classes))
      coeff_mat[loss_matrix > 0] = 1
      coeff_mat[range(num_train), list(y)] = 0
      coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)
    
      dW = (X.T).dot(coeff_mat)
      dW /= num_train 
      dW += reg * W
      dW = (X.T).dot(coeff_mat) 这里dW 的计算,使用向量计算。用一个取值的coeff_mat矩阵来确定取哪些x加入。看懂循环是如何操作的,就明白了这个这里取巧的从X.T来实现循环,时间倍数16倍。

    中间有几次,发现loss老是益处报错,后来才发现应该是learning rate 太大了,把-5改成-6,就可以了。原因是这里没有学习速率衰减优化策略


  • 相关阅读:
    CodeForces 242E二维线段树
    树形DP
    014 国际化
    013 属性文件
    012 BeanPostProcessor
    011 aware
    010 依赖注入
    009 IOC--初始化和销毁
    008 IOC--Bean的作用域
    007 IOC---Resource
  • 原文地址:https://www.cnblogs.com/captain-dl/p/9875556.html
Copyright © 2011-2022 走看看