zoukankan      html  css  js  c++  java
  • [python][spark]wholeTextFiles 读入多个文件的例子

    $pwd 

    /home/training/mydir

    $cat file1.json

    {
    "firstName":"Fred",
    "lastName":"Flintstone",
    "userid":"123"
    }

    $cat file2.json

    {
    "firstName":"Barney",
    "lastName":"Rubble",
    "userid":"123"
    }

    [training@localhost ~]$ hdfs dfs -put /home/training/mydir
    [training@localhost ~]$
    [training@localhost ~]$ hdfs dfs -ls
    Found 4 items
    drwxrwxrwx - training supergroup 0 2017-09-23 19:26 .sparkStaging
    -rw-rw-rw- 1 training supergroup 48 2017-09-25 05:31 cats.txt
    drwxrwxrwx - training supergroup 0 2017-09-25 15:39 mydir ***
    -rw-rw-rw- 1 training supergroup 34 2017-09-23 06:16 test.txt
    [training@localhost ~]$

    myrdd1 = sc.wholeTextFiles("mydir")

    myrdd1.count()
    Out[32]: 2

    In [35]: myrdd1.take(2)

    Out[35]:
    [(u'hdfs://localhost:8020/user/training/mydir/file1.json',
    u'{ "firstName":"Fred", "lastName":"Flintstone", "userid":"123" } '),
    (u'hdfs://localhost:8020/user/training/mydir/file2.json',
    u'{ "firstName":"Barney", "lastName":"Rubble", "userid":"456" } ')]

  • 相关阅读:
    python
    python
    python
    Django学习手册
    python
    Django学习手册
    [ThinkPHP] 独立分组配置,坑!!!
    vim 代码片段:通过vundle插件管理器安装ultisnips |centos6.5|vim7.2
    CESHI
    thinkphp实现功能:验证码
  • 原文地址:https://www.cnblogs.com/gaojian/p/7594782.html
Copyright © 2011-2022 走看看