zoukankan      html  css  js  c++  java
  • 【笔记】实现逻辑回归算法

    实现逻辑回归算法

    实现代码

    在python chame中创建LogisticRegression.py文件,写入想要实现的功能

    其中,可以将原先的LinearRegression复制过来,详情可见以前的关于线性回归的博客,修改类名,不用的功能直接删除,添加上sigmoid函数以及计算结果概率向量的函数,对损失函数的计算,梯度的计算,预测结果进行修改,使用这里的计算思想即可

    代码如下:

      import numpy as np
      from metrics import accuracy_score
    
    
      class LogisticRegression:
    
          def __init__(self):
              """初始化Logistic Regression模型"""
              self.coef_ = None
              self.interception_ = None
              self._theta = None
    
          def _sigmoid(self,t):
              return 1. / (1. + np.exp(-t))
    
    
          def fit(self, X_train, y_train, eta=0.01, n_iters=1e4):
              """根据训练数据集使用梯度算法训练模型"""
              assert X_train.shape[0] == y_train.shape[0], 
                  "the size of X_train must be equal to the size of y_train"
    
              def J(theta, X_b, y):
                  y_hat = self._sigmoid(X_b.dot(theta))
                  try:
                      return -np.sum(y*np.log(y_hat) + (1-y)*np.log(1-y_hat)) / len(y)
                  except:
                      return float('inf')
    
              def dJ(theta, X_b, y):
    
                  return X_b.T.dot(self._sigmoid(X_b.dot(theta)) - y) / len(X_b)
    
              def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):
    
                  theta = initial_theta
                  cur_iter = 0
    
                  while cur_iter < n_iters:
                      gradient = dJ(theta, X_b, y)
                      last_theta = theta
                      theta = theta - eta * gradient
                      if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):
                          break
      
                      cur_iter += 1
    
                  return theta
    
              X_b = np.hstack([np.ones((len(X_train), 1)), X_train])
              initial_theta = np.zeros(X_b.shape[1])
              self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)
    
              self.interception_ = self._theta[0]
              self.coef_ = self._theta[1:]
    
              return self
    
          def predict(self, X_predict):
              """给定待预测数据集X_predict, 返回表示X_predict的结果向量"""
              assert self.interception_ is not None and self.coef_ is not None, 
                  "must fit before predict!"
              assert X_predict.shape[1] == len(self.coef_), 
                  "the feature number of x_predict must be equal to X_train"
    
              proba = self.predict_proba(X_predict)
              return np.array(proba >= 0.5,dtype='int')
    
          def predict_proba(self, X_predict):
              """给定待预测数据集X_predict, 返回表示X_predict的结果概率向量"""
              assert self.interception_ is not None and self.coef_ is not None, 
                  "must fit before predict!"
              assert X_predict.shape[1] == len(self.coef_), 
                  "the feature number of x_predict must be equal to X_train"
    
              X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])
              return self._sigmoid(X_b.dot(self._theta))
    
    
          def score(self, X_test, y_test):
              """根据测试数据集X_test和y_test确定当前模型的准确度"""
    
              y_predict = self.predict(X_test)
              return accuracy_score(y_test, y_predict)
    
          def __repr__(self):
              return "LogisticRegression()"
    
    
          from matplotlib.colors import ListedColormap
          def plot_decision_boundary(model, axis):
    
              x0 = np.meshgrid(np.linspace(axis[2], axis[3], int((axis[3] - axis[2]) * 100)).reshape())
              x1 = np.meshgrid(np.linspace(axis[0], axis[1], int((axis[1] - axis[0]) * 100)).reshape())
              X_new = np.c_[x0.ravel(), x1.ravel()]
    
              y_predict = model.predict(X_new)
              zz = y_predict.reshape(x0.shape)
              custom_cmap = ListedColormap(['#EF9A9A', '#FFF59D', '#90CAF9'])
    
              plt.contourf(x0, x1, zz, linewidth=5, cmap=custom_cmap)
    

    具体使用

    (在notebook中)

    加载上相应的包,使用鸢尾花数据集,由于其有三种分类,因此只选用y<2的行,且只取前两个特征,并绘制图像

      import numpy as np
      import matplotlib.pyplot as plt
      from sklearn import datasets
    
      iris = datasets.load_iris()
    
      X = iris.data
      y = iris.target
    
      X = X[y<2,:2]
      y = y[y<2]
    
      plt.scatter(X[y==0,0],X[y==0,1],color='red')
      plt.scatter(X[y==1,0],X[y==1,1],color='blue')
    

    图像如下

    分割好数据集以后(使用种子666),调用封装好的方法,进行实例化以后对训练数据集进行fit操作

      from model_selection import train_test_split
    
      X_train,X_test,y_train,y_test = train_test_split(X,y,seed=666)
    
      from LogisticRegression import LogisticRegression
    
      log_reg = LogisticRegression()
      log_reg.fit(X_train,y_train)
    

    使用代码计算分类结果

      log_reg.score(X_test,y_test)
    

    结果如下

    分类结果中的数据

      log_reg.predict_proba(X_test)
    

    结果如下

    其中y_test中为

    然后使用概率矩阵以后的真正得到的log_reg.predict(X_test)中的结果如下

    以上为实现的逻辑回归算法的简单的应用

  • 相关阅读:
    (双指针 二分) leetcode 167. Two Sum II
    (双指针) leetcode 485. Max Consecutive Ones
    (双指针) leetcode 27. Remove Element
    (String) leetcode 67. Add Binary
    (数组) leetcode 66. Plus One
    (N叉树 BFS) leetcode429. N-ary Tree Level Order Traversal
    (N叉树 递归) leetcode 590. N-ary Tree Postorder Traversal
    (N叉树 递归) leetcode589. N-ary Tree Preorder Traversal
    (N叉树 DFS 递归 BFS) leetcode 559. Maximum Depth of N-ary Tree
    (BST 递归) leetcode98. Validate Binary Search Tree
  • 原文地址:https://www.cnblogs.com/jokingremarks/p/14321021.html
Copyright © 2011-2022 走看看