zoukankan      html  css  js  c++  java
  • pyspark数据准备

    鸢尾花数据集

    1 5.1,3.5,1.4,0.2,Iris-setosa
    2 4.9,3.0,1.4,0.2,Iris-setosa
    3 4.7,3.2,1.3,0.2,Iris-setosa
    4 4.6,3.1,1.5,0.2,Iris-setosa
    5 5.0,3.6,1.4,0.2,Iris-setosa
    6 5.4,3.9,1.7,0.4,Iris-setosa
    7 4.6,3.4,1.4,0.3,Iris-setosa
    8 5.0,3.4,1.5,0.2,Iris-setosa

    转换成libsvm格式代码

     1 import sys
     2 
     3 file = sys.argv[1]
     4 
     5 def main():
     6     with open(file,'r') as df:
     7         for line in df:
     8             ss = line.strip().split(",")
     9             if ss[4]=="Iris-setosa":
    10                 ss[4]=0
    11             if ss[4]=="Iris-versicolor":
    12                 ss[4]=1
    13             if ss[4]=="Iris-virginica":
    14                 ss[4]=2
    15             print("%d 1:%.1f 2:%.1f 3:%.1f 4:%.1f"%(ss[4],float(ss[0]),float(ss[1]),float(ss[2]),float(ss[3])))
    16 if __name__ == '__main__':
    17     try:
    18         main()
    19     except Exception as e:
    20         raise e

    libsvm格式的鸢尾花数据集

     1 0 1:5.1 2:3.5 3:1.4 4:0.2
     2 0 1:4.9 2:3.0 3:1.4 4:0.2
     3 0 1:4.7 2:3.2 3:1.3 4:0.2
     4 0 1:4.6 2:3.1 3:1.5 4:0.2
     5 0 1:5.0 2:3.6 3:1.4 4:0.2
     6 0 1:5.4 2:3.9 3:1.7 4:0.4
     7 0 1:4.6 2:3.4 3:1.4 4:0.3
     8 0 1:5.0 2:3.4 3:1.5 4:0.2
     9 0 1:4.4 2:2.9 3:1.4 4:0.2
    10 0 1:4.9 2:3.1 3:1.5 4:0.1
    11 0 1:5.4 2:3.7 3:1.5 4:0.2

    pyspark读取libsvm格式数据并转换

    
    
    >>> from pyspark.mllib.util import MLUtils
    
    >>> examples = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

     >>> examples.take(2)
     [Stage 26:>                                                         (0 + 1) / 1]

     [LabeledPoint(0.0, (4,[0,1,2,3],[5.1,3.5,1.4,0.2])), LabeledPoint(0.0, (4,[0,1,2
     ,3],[4.9,3.0,1.4,0.2]))]

     
  • 相关阅读:
    Flask第二篇——服务器相关
    Flask第一篇——URL详解
    Appium 定位方法例子(4)
    selenium 上传文件方法补充——SendKeys、win32gui
    Appium+python (3) 异常处理
    Appium+python (3) 元素定位(1)
    "http://127.0.0.1:4723/wd/hub"的解释
    Appium + Python App自动化(2)第一个脚本
    Appium+python(1)简单的介绍环境搭建
    用fiddler设置手机代理
  • 原文地址:https://www.cnblogs.com/luozeng/p/9227669.html
Copyright © 2011-2022 走看看