zoukankan      html  css  js  c++  java
  • 逻辑回归的分布式实现 [Logistic Regression / Machine Learning / Spark ]

    1- 问题提出


    2- 逻辑回归


    3- 理论推导


    4- Python/Spark实现

     1 # -*- coding: utf-8 -*-
     2 from pyspark import SparkContext
     3 from math import *
     4 
     5 theta = [0, 0, 0]    #初始theta值
     6 alpha = 0.001    #学习速率
     7 
     8 def inner(x, y):
     9     return sum([i*j for i,j in zip(x,y)])
    10         
    11 def func(lst):
    12     h = (1 + exp(-inner(lst, theta)))**(-1)
    13     return map(lambda x: (h - lst[-1]) * x, lst[:-1])
    14 
    15 
    16 sc = SparkContext('local')
    17 
    18 rdd = sc.textFile('/home/freyr/logisticRegression.txt')
    19         .map(lambda line: map(float, line.strip().split(',')))
    20         .map(lambda lst: [1]+lst)
    21 
    22 
    23 for i in range(400):
    24     partheta = rdd.map(func)
    25                    .reduce(lambda x,y: [i+j for i,j in zip(x,y)])
    26 
    27     for j in range(3):
    28         theta[j] = theta[j] - alpha * partheta[j]
    29 
    30 print 'theta = %s' % theta

     PS: logisticRegression.txt

  • 相关阅读:
    HUST-1350 Trie
    hihocoder-第六十一周 Combination Lock
    hihocoder-1196 : 高斯消元·二
    hihocoder-1195 : 高斯消元·一
    SPOJ
    HDU-5074
    UVALive
    POJ-2195
    UVALive
    POJ-1556
  • 原文地址:https://www.cnblogs.com/freyr/p/4501039.html
Copyright © 2011-2022 走看看