zoukankan      html  css  js  c++  java
  • [python][spark]wholeTextFiles 读入多个文件的例子

    $pwd 

    /home/training/mydir

    $cat file1.json

    {
    "firstName":"Fred",
    "lastName":"Flintstone",
    "userid":"123"
    }

    $cat file2.json

    {
    "firstName":"Barney",
    "lastName":"Rubble",
    "userid":"123"
    }

    [training@localhost ~]$ hdfs dfs -put /home/training/mydir
    [training@localhost ~]$
    [training@localhost ~]$ hdfs dfs -ls
    Found 4 items
    drwxrwxrwx - training supergroup 0 2017-09-23 19:26 .sparkStaging
    -rw-rw-rw- 1 training supergroup 48 2017-09-25 05:31 cats.txt
    drwxrwxrwx - training supergroup 0 2017-09-25 15:39 mydir ***
    -rw-rw-rw- 1 training supergroup 34 2017-09-23 06:16 test.txt
    [training@localhost ~]$

    myrdd1 = sc.wholeTextFiles("mydir")

    myrdd1.count()
    Out[32]: 2

    In [35]: myrdd1.take(2)

    Out[35]:
    [(u'hdfs://localhost:8020/user/training/mydir/file1.json',
    u'{ "firstName":"Fred", "lastName":"Flintstone", "userid":"123" } '),
    (u'hdfs://localhost:8020/user/training/mydir/file2.json',
    u'{ "firstName":"Barney", "lastName":"Rubble", "userid":"456" } ')]

  • 相关阅读:
    对象的思考1
    第一个php网页
    php&mysql
    python —print
    实现窗口移动
    numpy学习(二)
    numpy学习(一)
    knn算法之预测数字
    机器学习(一)之KNN算法
    matplot绘图(五)
  • 原文地址:https://www.cnblogs.com/gaojian/p/7594782.html
Copyright © 2011-2022 走看看