zoukankan      html  css  js  c++  java
  • 使用Hadoop WebHDFS访问HDFS

               使用Hadoop WebHDFS访问HDFS

                                       作者:尹正杰

    版权声明:原创作品,谢绝转载!否则将追究法律责任。

     

      webHDFS和HttpFS都是Hadoop的HTTP/HTTPS REST接口。这两个接口使我们能够读取HDFS数据并写入,以及执行与HDFS相关的几个管理命令。可以将它们嵌入程序,脚本或通过命令行工具(如curl或wget)来使用这两个接口。

      WebHDFS不支持高可用NameNode架构,但HttpFS支持。

    一.WebHDFS概述

      当在Hadoop集群中运行的应用程序想要访问HDFS数据时,它们使用Hadoop的本地客户端在HDFS上工作。但是,可能需要从集群外部访问HDFS,以便处理,存储和检索HDFS数据。
    
      如果应用程序需要使用本机HDFS协议,则必须在运行应用程序的服务器上安装Hadoop,并且要提供与应用程序的Java依赖。
    
      Hadoop的WebHDFS提供了一组强大的HTTP REST API。REST是一种用于构建大规模Web服务的架构风格,其允许应用程序远程访问和使用HDFS。除了便于从外部访问HDFS之外,当尝试使用两个Hadoop(每个都运行不同版本的Hadoop)集群时,WebHDFS也很有用。
    
      由于WebHDFS和MapReduce,HDFS版本无关,因为它使用REST API,所以它可以在两个集群中使用。例如,当需要使用DistCp实用程序在两个集群之间执行数据复制时,可以使用它。
    
      当使用WebHDFS远程访问HDFS数据时,不需要在客户端上安装Hadoop。可以使用curl和wget等知名工具来访问HDFS数据。WebHDFS支持直接连接到Hadoop集群执行所有HDFS操作。
    
      WebHDFS使用基本的HTTP操作,如GET,PUT,POST和DELETE来远程操作HDFS文件系统。
    
      博主推荐阅读:
        https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
    
      温馨提示:
        如果你得HDFS集群启用来了Kerberos安全认证,则你应该需要关心以下参数(修改hdfs-site..xml):
          dfs.web.authentication.kerberos.principal
          dfs.web.authentication.kerberos.keytab

    二.使用HDFS命令行工具通过WebHDFS REST API访问HDFS实战案例

      使用WebHDFS很简单,需要做的就是将HDFS文件系统URI替换为HTTP URL,接下来我们看一下几个案例。

    1>.列出"/yinzhengjie"的HDFS目录所有文件和目录

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /        #需要注意的是,我们在使用命令行工具并没有指定文件系统的名称则使用"core-site.xml"文件中"fs.defaultFS"属性定义的默认文件系统名称。
    Found 4 items
    drwxr-xr-x   - root admingroup          0 2020-08-21 16:40 /bigdata
    drwxr-xr-x   - root admingroup          0 2020-08-20 19:26 /system
    drwx------   - root admingroup          0 2020-08-14 19:19 /user
    drwxr-xr-x   - root admingroup          0 2020-08-21 18:42 /yinzhengjie
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie
    Found 3 items
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 hdfs://hadoop101.yinzhengjie.com:9000/yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/            
    Found 3 items
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie        #使用webhdfs协议访问HDFS
    Found 3 items
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie        #使用webhdfs协议访问HDFS

    2>.将本地文件上传到HDFS集群中

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 3 items
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -put /etc/fstab webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/fstab        #将本地文件"/etc/fstab"文件上传到HDFS的"/yinzhengjie/"目录
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 4 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -put /etc/fstab webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/fstab        #将本地文件"/etc/fstab"文件上传到HDFS的"/yinzhengjie/"目录

    3>.下载HDFS文件系统中的文件或目录

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 4 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# ll
    total 0
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d      #下载目录
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# ll
    total 0
    drwxr-xr-x 2 root root 229 Aug 31 14:32 yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# ll yum.repos.d/
    total 40
    -rw-r--r-- 1 root root 1664 Aug 31 14:32 CentOS-Base.repo
    -rw-r--r-- 1 root root 1309 Aug 31 14:32 CentOS-CR.repo
    -rw-r--r-- 1 root root  649 Aug 31 14:32 CentOS-Debuginfo.repo
    -rw-r--r-- 1 root root  314 Aug 31 14:32 CentOS-fasttrack.repo
    -rw-r--r-- 1 root root  630 Aug 31 14:32 CentOS-Media.repo
    -rw-r--r-- 1 root root 1331 Aug 31 14:32 CentOS-Sources.repo
    -rw-r--r-- 1 root root 5701 Aug 31 14:32 CentOS-Vault.repo
    -rw-r--r-- 1 root root  951 Aug 31 14:32 epel.repo
    -rw-r--r-- 1 root root 1050 Aug 31 14:32 epel-testing.repo
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d      #下载目录
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 4 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# ll
    total 0
    drwxr-xr-x 2 root root 229 Aug 31 14:32 yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz       #下载文件
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# ll
    total 4
    -rw-r--r-- 1 root root  69 Aug 31 14:33 wc.txt.gz
    drwxr-xr-x 2 root root 229 Aug 31 14:32 yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -get webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz       #下载文件

    4>.删除HDFS文件系统中的文件或目录

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 4 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    drwxr-xr-x   - root admingroup          0 2020-08-14 23:13 /yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm -r webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d        #删除目录
    20/08/31 14:38:12 INFO fs.TrashPolicyDefault: Moved: 'webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d' to trash at: webhdfs://hadoop101.yinzhengjie.com:50070/user/root/.Tr
    ash/Current/yinzhengjie/yum.repos.d
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 3 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm -r webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/yum.repos.d        #删除目录
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 3 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    -rw-r--r--   3 root admingroup         69 2020-08-14 23:22 /yinzhengjie/wc.txt.gz
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz            #删除文件
    20/08/31 14:38:28 INFO fs.TrashPolicyDefault: Moved: 'webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz' to trash at: webhdfs://hadoop101.yinzhengjie.com:50070/user/root/.Tras
    h/Current/yinzhengjie/wc.txt.gz
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 2 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -rm webhdfs://hadoop101.yinzhengjie.com:50070/yinzhengjie/wc.txt.gz            #删除文件

    5>.其它操作

      有了上面的4个案例打底,想必接下来让你自行探索其它使用方法估计问题不大,和我之前分享的hdfs dfs工具的使用方法基本雷同,只不过需要将hdfs协议换成webhdfs协议即可。
    
      博主推荐阅读:
        https://www.cnblogs.com/yinzhengjie2020/p/13296680.html

    三.使用curl工具通过WebHDFS REST API访问HDFS实战案例

      WebHDFS真的是一个相当全面的工具,其包括许多用于访问和使用HDFS数据的命令。接下来我们就来看如何使用curl工具通过WebHDFS REST API访问HDFS。
    
      关于curl工具的使用我这里就不赘述了,感兴趣的小伙伴可以自行参考网上的博客,该工具的基本使用方法查看我的笔记即可。curl常见的选项如下所示:
        -A/--user-agent <string>:
          设置用户代理发送给服务器
    
        -e/--referer <URL>:
          来源网址
    
        --cacert <file>:
          CA证书 (SSL)
    
        -k/--insecure:
          允许忽略证书进行 SSL 连接
    
        --compressed:
          要求返回是压缩的格式
    
        -H/--header <line>:
          自定义首部信息传递给服务器
    
        -i:
          显示页面内容,包括报文首部信息
    
        -I/--head:
          只显示响应报文首部信息
    
        -D/--dump-header <file>:
          将url的header信息存放在指定文件中
    
        --basic:
          使用HTTP基本认证
    
        -u/--user <user[:password]>:
          设置服务器的用户和密码
    
        -L:
          如果有3xx响应码,重新发请求到新位置
      
        -O:
          使用URL中默认的文件名保存文件到本地
    
        -o <file>:
          将网络文件保存为指定的文件中
    
        --limit-rate <rate>:
          设置传输速度
    
        -0/--http1.0:
          数字0,使用HTTP 1.0
    
        -v/--verbose:
          更详细
    
        -C:
          选项可对文件使用断点续传功能
    
        -c/--cookie-jar <file name>:
          将url中cookie存放在指定文件中
    
        -x/--proxy <proxyhost[:port]>:
          指定代理服务器地址
    
        -X/--request <command>:
        向服务器发送指定请求方法
    
        -U/--proxy-user <user:password>:
          代理服务器用户和密码
    
        -T:
          选项可将指定的本地文件上传到FTP服务器上
    
        --data/-d:
          方式指定使用POST方式传递数据
    
        -b name=data:
          从服务器响应set-cookie得到值,返回给服务器
     
      博主推荐阅读:
        https://www.cnblogs.com/yinzhengjie/p/7719804.html

    1>.读取HDFS中的文件(本案例读取的是"/yinzhengjie/hosts")

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 2 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=OPEN&user.name=yinzhengjie"      #op指定操作,而user.name指定访问URI的用户
    HTTP/1.1 307 TEMPORARY_REDIRECT
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 07:39:16 GMT
    Date: Mon, 31 Aug 2020 07:39:16 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 07:39:16 GMT
    Date: Mon, 31 Aug 2020 07:39:16 GMT
    Pragma: no-cache
    Content-Type: application/octet-stream
    X-FRAME-OPTIONS: SAMEORIGIN
    Set-Cookie: hadoop.auth="u=yinzhengjie&p=yinzhengjie&t=simple&e=1598895556829&s=ak8QrD/3I7HowelGDzH9uvnDeAGBihJhCbCm0wVqS2M="; Path=/; HttpOnly
    Location: http://hadoop104.yinzhengjie.com:50075/webhdfs/v1/yinzhengjie/hosts?op=OPEN&user.name=yinzhengjie&namenoderpcaddress=hadoop101.yinzhengjie.com:9000&offset=0
    Content-Length: 0
    
    HTTP/1.1 200 OK
    Access-Control-Allow-Methods: GET
    Access-Control-Allow-Origin: *
    Content-Type: application/octet-stream
    Connection: close
    Content-Length: 371
    
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    
    #Hadoop 2.x
    172.200.6.101 hadoop101.yinzhengjie.com
    172.200.6.102 hadoop102.yinzhengjie.com
    172.200.6.103 hadoop103.yinzhengjie.com
    172.200.6.104 hadoop104.yinzhengjie.com
    172.200.6.105 hadoop105.yinzhengjie.com
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=OPEN&user.name=yinzhengjie"     #op指定操作,而user.name指定访问URI的用户

    2>.检查HDFS目录的状态

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /
    Found 4 items
    drwxr-xr-x   - root admingroup          0 2020-08-21 16:40 /bigdata
    drwxr-xr-x   - root admingroup          0 2020-08-20 19:26 /system
    drwx------   - root admingroup          0 2020-08-14 19:19 /user
    drwxr-xr-x   - root admingroup          0 2020-08-31 14:38 /yinzhengjie
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 2 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=LISTSTATUS"        #查看"/yinzhengjie"目录的状态
    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 07:51:31 GMT
    Date: Mon, 31 Aug 2020 07:51:31 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 07:51:31 GMT
    Date: Mon, 31 Aug 2020 07:51:31 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Transfer-Encoding: chunked
    
    {"FileStatuses":{"FileStatus":[
    {"accessTime":1598855175268,"blockSize":536870912,"childrenNum":0,"fileId":16489,"group":"admingroup","length":490,"modificationTime":1598855175823,"owner":"root","pathSuffix":"fstab","perm
    ission":"644","replication":3,"storagePolicy":0,"type":"FILE"},{"accessTime":1598859477240,"blockSize":536870912,"childrenNum":0,"fileId":16484,"group":"admingroup","length":371,"modificationTime":1597999554986,"owner":"root","pathSuffix":"hosts","perm
    ission":"644","replication":3,"storagePolicy":0,"type":"FILE"}]}}
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=LISTSTATUS"        #查看"/yinzhengjie"目录的状态

    3>.检查HDFS文件的状态

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 2 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=GETFILESTATUS" ;echo       #查看"/yinzhengjie/hosts"文件的状态
    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 07:58:53 GMT
    Date: Mon, 31 Aug 2020 07:58:53 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 07:58:53 GMT
    Date: Mon, 31 Aug 2020 07:58:53 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Transfer-Encoding: chunked
    
    {"FileStatus":{"accessTime":1598859477240,"blockSize":536870912,"childrenNum":0,"fileId":16484,"group":"admingroup","length":371,"modificationTime":1597999554986,"owner":"root","pathSuffix"
    :"","permission":"644","replication":3,"storagePolicy":0,"type":"FILE"}}
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -L "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/hosts?op=GETFILESTATUS" ;echo       #查看"/yinzhengjie/hosts"文件的状态

    4>.创建目录

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /
    Found 4 items
    drwxr-xr-x   - root admingroup          0 2020-08-21 16:40 /bigdata
    drwxr-xr-x   - root admingroup          0 2020-08-20 19:26 /system
    drwx------   - root admingroup          0 2020-08-14 19:19 /user
    drwxr-xr-x   - root admingroup          0 2020-08-31 16:17 /yinzhengjie
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 2 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -X PUT "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?user.name=root&op=MKDIRS&permissions=751" ;echo     #创建"/yinzhengjie/webHDFS"目录
    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 08:14:10 GMT
    Date: Mon, 31 Aug 2020 08:14:10 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 08:14:10 GMT
    Date: Mon, 31 Aug 2020 08:14:10 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1598897650918&s=rp1JdtIpaV59fm8TFisjCUMH3ARerDWzI4oL+jCezrs="; Path=/; HttpOnly
    Transfer-Encoding: chunked
    
    {"boolean":true}
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 3 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-21 16:45 /yinzhengjie/hosts
    drwxr-xr-x   - root admingroup          0 2020-08-31 16:14 /yinzhengjie/webHDFS
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -X PUT "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?user.name=root&op=MKDIRS&permissions=751" ;echo 

    5>.创建并写入数据到文件

      我使用的是"Hadoop 2.10.0"版本,在尝试使用webhdfs官方的方法创建文件或者往已有的文件追加内容均失败了,官方提供的2个方法需要发送2次HTTP请求,但我在测试多次均无法创建,若有成功的小伙伴请不吝赐教。
    
      参考连接:
        https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File
        https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Append_to_a_File

    6>.删除目录或文件

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 3 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
    drwxr-xr-x   - root admingroup          0 2020-08-31 18:07 /yinzhengjie/webHDFS
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE  "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?op=DELETE&user.name=root";echo     #删除目录
    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 10:07:56 GMT
    Date: Mon, 31 Aug 2020 10:07:56 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 10:07:56 GMT
    Date: Mon, 31 Aug 2020 10:07:56 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1598904476157&s=4aHgz6EwyJfdmjlwOtkXs+8Je94BybNxDUYoon7FIWE="; Path=/; HttpOnly
    Transfer-Encoding: chunked
    
    {"boolean":true}
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 2 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/webHDFS?op=DELETE&user.name=root";echo     #删除目录
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 2 items
    -rw-r--r--   3 root admingroup        490 2020-08-31 14:26 /yinzhengjie/fstab
    -rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE  "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/fstab?op=DELETE&user.name=root";echo       #删除文件
    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 10:08:52 GMT
    Date: Mon, 31 Aug 2020 10:08:52 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 10:08:52 GMT
    Date: Mon, 31 Aug 2020 10:08:52 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1598904532486&s=MCjvGp705lVZcZx7hc5UCeERNoRDGC5rsW5E/USXi6c="; Path=/; HttpOnly
    Transfer-Encoding: chunked
    
    {"boolean":true}
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/
    Found 1 items
    -rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i -X DELETE "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie/fstab?op=DELETE&user.name=root";echo       #删除文件

    7>.检查目录配额

    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /yinzhengjie
           QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
            none             inf            none             inf            1            2                742 /yinzhengjie
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i   "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=GETCONTENTSUMMARY" ;echo 
    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 10:30:13 GMT
    Date: Mon, 31 Aug 2020 10:30:13 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 10:30:13 GMT
    Date: Mon, 31 Aug 2020 10:30:13 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Transfer-Encoding: chunked
    
    {"ContentSummary":{"directoryCount":1,"fileCount":2,"length":742,"quota":-1,"spaceConsumed":29631,"spaceQuota":-1,"typeQuota":{}}}
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -setSpaceQuota 10g /yinzhengjie/
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -setQuota 50 /yinzhengjie/
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /yinzhengjie
           QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
              50              47            10 G          10.0 G            1            2                742 /yinzhengjie
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i   "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=GETCONTENTSUMMARY" ;echo 
    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Mon, 31 Aug 2020 10:30:52 GMT
    Date: Mon, 31 Aug 2020 10:30:52 GMT
    Pragma: no-cache
    Expires: Mon, 31 Aug 2020 10:30:52 GMT
    Date: Mon, 31 Aug 2020 10:30:52 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Transfer-Encoding: chunked
    
    {"ContentSummary":{"directoryCount":1,"fileCount":2,"length":742,"quota":50,"spaceConsumed":29631,"spaceQuota":10737418240,"typeQuota":{}}}
    [root@hadoop105.yinzhengjie.com ~]# 
    [root@hadoop105.yinzhengjie.com ~]# curl -i "http://hadoop101.yinzhengjie.com:50070/webhdfs/v1/yinzhengjie?op=GETCONTENTSUMMARY" ;echo

    8>.其它操作

      博主推荐阅读:
        https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

  • 相关阅读:
    ubuntu怎么安装下载工具uget+aria2 for firefox
    #pragma once
    opencv3在CMakeLists.txt中的调用问题
    opencv之Mat数据类型
    windows10下笔记本电脑外接显示器设置
    ubuntu16.04下笔记本电脑扩展双屏安装过程
    【问题收录】Ubuntu14.04连接两个双显示器失败的解决方案
    获取jsapi_ticket
    微信公众号中高德地图显示路线开发
    微信公众号中高德地图显示路线
  • 原文地址:https://www.cnblogs.com/yinzhengjie2020/p/13352498.html
Copyright © 2011-2022 走看看