zoukankan      html  css  js  c++  java
  • 【Spark机器学习速成宝典】模型篇02逻辑斯谛回归【Logistic回归】(Python版)

    目录

      Logistic回归原理

      Logistic回归代码(Spark Python)


    Logistic回归原理

       详见博文:http://www.cnblogs.com/itmorn/p/7890468.html

     返回目录

    Logistic回归代码(Spark Python) 

      代码里数据:https://pan.baidu.com/s/1jHWKG4I 密码:acq1

    # -*-coding=utf-8 -*-  
    from pyspark import SparkConf, SparkContext
    sc = SparkContext('local')
    
    from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel
    from pyspark.mllib.regression import LabeledPoint
    
    # Load and parse the data 加载和解析数据,将每一个数转化为浮点数。每一行第一个数作为标记,后面的作为特征
    def parsePoint(line):
        values = [float(x) for x in line.split(' ')]
        return LabeledPoint(values[0], values[1:])
    
    data = sc.textFile("data/mllib/sample_svm_data.txt")
    print data.collect()[0] #1 0 2.52078447201548 0 0 0 2.004684436494304 2.00034729926846.....
    parsedData = data.map(parsePoint)
    print parsedData.collect()[0] #(1.0,[0.0,2.52078447202,0.0,0.0,0.0,2.00468....
    
    # Build the model 建立模型
    model = LogisticRegressionWithLBFGS.train(parsedData)
    
    # Evaluating the model on training data 评估模型在训练集上的误差
    labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features)))
    trainErr = labelsAndPreds.filter(lambda lp: lp[0] != lp[1]).count() / float(parsedData.count())
    print("Training Error = " + str(trainErr)) #Training Error = 0.366459627329
    
    # Save and load model 保存模型和加载模型
    model.save(sc, "pythonLogisticRegressionWithLBFGSModel")
    sameModel = LogisticRegressionModel.load(sc,"pythonLogisticRegressionWithLBFGSModel")
    
    print sameModel.predict(parsedData.collect()[0].features) #1

     返回目录

  • 相关阅读:

    python内存管理
    python-继承类执行的流程
    Redis-key的设计技巧
    Redis-误操作尝试恢复
    Python3之hashlib
    面相对象
    设计模式
    RESTful API规范
    Django中间件执行流程
  • 原文地址:https://www.cnblogs.com/itmorn/p/8023190.html
Copyright © 2011-2022 走看看