zoukankan      html  css  js  c++  java
  • 【原创】集群远程访问和管理以及查看作业和文件的一些技巧

      其实,这是一个懒人去研究的东西,因为如果冬天喜欢去实验室或者机房,不呆在宿舍或者家里,就没有远程的问题了,但是,总有不巧的时候,这个时候你就只有在远程命令行里去看一切和操作一切了。远程操作的第一步从配置ssh远程访问集群开始:

    通过ssh远程访问集群

      有2个前提:

    1. 集群中需要有机器在公网路由配置了DHCP转发
    2. 配置了DHCP转发的机器需要开启了SSH服务,SSH的服务端口是22

      例如,我一般是将master(NameNode/JobTracker)配置DHCP转发,在路由处的配置:外部端口:aa  IP:masterIP 内部端口:22;然后我们在任何地方都可以通过类似:ssh -P aa hadoop@机器的公网IP 来访问集群。其中hadoop是机器master的用户名,意思就是通过aa端口将请求转发到机器的公网IP的22号端口,然后以hadoop用户名登录。

      当然,如果你配置过hadoop集群,就应该知道如何配置ssh免密码登录,所以将master的公钥放到经常访问集群的个人电脑:~/.ssh/authorized_keys里面,这样以后直接一个命令,密码都不用输入就可以登录集群了,登录上master,然后master到其他集群机器也是免密码登录的。

    集群管理的一些好工具

    1. pdsh
      这是一个神器,可以在一台机器上去执行分布式shell操作整个集群的机器
      $ pdsh -h
      Usage: pdsh [-options] command ...
      -S                return largest of remote command return values
      -h                output usage menu and quit
      -V                output version information and quit
      -q                list the option settings and quit
      -b                disable ^C status feature (batch mode)
      -d                enable extra debug information from ^C status
      -l user           execute remote commands as user
      -t seconds        set connect timeout (default is 10 sec)
      -u seconds        set command timeout (no default)
      -f n              use fanout of n nodes
      -w host,host,...  set target node list on command line
      -x host,host,...  set node exclusion list on command line
      -R name           set rcmd module to name
      -M name,...       select one or more misc modules to initialize first
      -N                disable hostname: labels on output lines
      -L                list info on all loaded modules and exit
      -g query,...      target nodes using genders query
      -X query,...      exclude nodes using genders query
      -F file           use alternate genders file `file'
      -i                request alternate or canonical hostnames if applicable
      -a                target all nodes except those with "pdsh_all_skip" attribute
      -A                target all nodes listed in genders database
      available rcmd modules: ssh,rsh,exec (default: rsh)
      pdsh -w ssh:brix-[00-09],lbt,gbt uptime

      上面这条命令可以在brix-00到brix-09以及lbt和gbt所有机器上执行uptime命令,并会在当前机器上打印出来。但是,我这里将pdsh定义了下别名,常规情况下这样应该执行会报错,将pdsh替换成下面这样就可以了

      alias pdsh='PDSH_RCMD_TYPE=ssh pdsh'

      然后执行结果如下:

    2. gbt:  17:33:21 up  2:31,  1 user,  load average: 0.00, 0.01, 0.05
      lbt:  17:33:18 up  2:27,  2 users,  load average: 0.00, 0.02, 0.05
      brix-02:  17:33:21 up  2:31,  0 users,  load average: 0.00, 0.01, 0.05
      brix-01:  17:33:21 up  2:31,  0 users,  load average: 0.03, 0.02, 0.05
      brix-00:  17:33:21 up  2:33,  4 users,  load average: 0.08, 0.05, 0.09
      brix-03:  17:33:20 up  2:31,  0 users,  load average: 0.00, 0.01, 0.05
      brix-04:  17:33:21 up  2:31,  0 users,  load average: 0.01, 0.04, 0.05
      brix-08:  17:33:21 up  2:31,  0 users,  load average: 0.04, 0.06, 0.05
      brix-09:  17:33:20 up  2:31,  0 users,  load average: 0.10, 0.06, 0.06
      brix-07:  17:33:21 up  2:31,  0 users,  load average: 0.03, 0.06, 0.05
      brix-05:  17:33:21 up  2:31,  0 users,  load average: 0.08, 0.04, 0.05
      brix-06:  17:33:21 up  2:31,  0 users,  load average: 0.05, 0.04, 0.05
      pdsh -w ssh:brix-[00-09],lbt,gbt scp brix-00:~/HadoopInstall/test.txt ~/HadoopInstall/

      上面这条示例可以将brix-00上的test.txt文件拷贝到brix-09以及lbt和gbt机器上。

    3. scp
      上面的命令已经展示了scp如何结合pdsh一起来使用了,这里不再细说,下面贴上scp的一些指令。
      usage: scp [-12346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]
                 [-l limit] [-o ssh_option] [-P port] [-S program]
                 [[user@]host1:]file1 ... [[user@]host2:]file2

     如何通过命令行查看HDFS上文件的健康情况和数据块分布

    $ hadoop fsck /ftTest/totalWiki  -files -blocks -locations
    Warning: $HADOOP_HOME is deprecated.
    
    FSCK started by hadoop from /192.168.1.230 for path /ftTest/totalWiki at Wed Nov 18 17:42:27 CST 2015
    /ftTest/totalWiki 3259108351 bytes, 25 block(s):  OK
    0. blk_-3539743872639772968_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.63:50010, 192.168.1.235:50010]
    1. blk_-7700661535252568451_1003 len=134217728 repl=3 [192.168.1.231:50010, 192.168.1.232:50010, 192.168.1.238:50010]
    2. blk_-3214646852454192434_1003 len=134217728 repl=3 [192.168.1.237:50010, 192.168.1.236:50010, 192.168.1.238:50010]
    3. blk_-8860437510624268282_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.239:50010, 192.168.1.235:50010]
    4. blk_-1765246693355320434_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.66:50010, 192.168.1.232:50010]
    5. blk_9063781070378080202_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.234:50010]
    6. blk_8687961040692226467_1003 len=134217728 repl=3 [192.168.1.234:50010, 192.168.1.237:50010, 192.168.1.239:50010]
    7. blk_-5717347662754027031_1003 len=134217728 repl=3 [192.168.1.236:50010, 192.168.1.232:50010, 192.168.1.63:50010]
    8. blk_-5624359065285533759_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.231:50010]
    9. blk_622948206607478459_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.63:50010, 192.168.1.236:50010]
    10. blk_-4154428280295153090_1003 len=134217728 repl=3 [192.168.1.232:50010, 192.168.1.235:50010, 192.168.1.63:50010]
    11. blk_6638201995439663469_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.63:50010, 192.168.1.237:50010]
    12. blk_-3282418422086241856_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.233:50010]
    13. blk_2802846523093904336_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.239:50010, 192.168.1.237:50010]
    14. blk_-7425405918846384842_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.66:50010, 192.168.1.234:50010]
    15. blk_-8997936298966969491_1003 len=134217728 repl=3 [192.168.1.237:50010, 192.168.1.235:50010, 192.168.1.238:50010]
    16. blk_-827035362476515573_1003 len=134217728 repl=3 [192.168.1.239:50010, 192.168.1.63:50010, 192.168.1.235:50010]
    17. blk_-5734389503841877028_1003 len=134217728 repl=3 [192.168.1.231:50010, 192.168.1.235:50010, 192.168.1.66:50010]
    18. blk_1446125973144404377_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.238:50010, 192.168.1.235:50010]
    19. blk_-7161959344923757995_1003 len=134217728 repl=3 [192.168.1.66:50010, 192.168.1.238:50010, 192.168.1.234:50010]
    20. blk_-2171786920309180709_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.66:50010, 192.168.1.237:50010]
    21. blk_7184760167274632839_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.66:50010, 192.168.1.233:50010]
    22. blk_1315507788295151463_1003 len=134217728 repl=3 [192.168.1.63:50010, 192.168.1.239:50010, 192.168.1.233:50010]
    23. blk_5923416026032542888_1003 len=134217728 repl=3 [192.168.1.238:50010, 192.168.1.239:50010, 192.168.1.236:50010]
    24. blk_-8960096699099874150_1003 len=37882879 repl=3 [192.168.1.234:50010, 192.168.1.233:50010, 192.168.1.63:50010]
    
    Status: HEALTHY
     Total size:    3259108351 B
     Total dirs:    0
     Total files:    1
     Total blocks (validated):    25 (avg. block size 130364334 B)
     Minimally replicated blocks:    25 (100.0 %)
     Over-replicated blocks:    0 (0.0 %)
     Under-replicated blocks:    0 (0.0 %)
     Mis-replicated blocks:        0 (0.0 %)
     Default replication factor:    3
     Average block replication:    3.0
     Corrupt blocks:        0
     Missing replicas:        0 (0.0 %)
     Number of data-nodes:        11
     Number of racks:        2
    FSCK ended at Wed Nov 18 17:42:27 CST 2015 in 3 milliseconds
    
    
    The filesystem under path '/ftTest/totalWiki' is HEALTHY

    上面列出来文件的数据块的分布以及文件的一些健康情况。如果要继续看数据块在哪个机架,可以下面这样加一个-racks

    $ hadoop fsck /ftTest/totalWiki  -files -blocks -locations -racks
    Warning: $HADOOP_HOME is deprecated.
    
    FSCK started by hadoop from /192.168.1.230 for path /ftTest/totalWiki at Wed Nov 18 17:43:08 CST 2015
    /ftTest/totalWiki 3259108351 bytes, 25 block(s):  OK
    0. blk_-3539743872639772968_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.235:50010]
    1. blk_-7700661535252568451_1003 len=134217728 repl=3 [/rack1/192.168.1.231:50010, /rack1/192.168.1.232:50010, /rack2/192.168.1.238:50010]
    2. blk_-3214646852454192434_1003 len=134217728 repl=3 [/rack1/192.168.1.237:50010, /rack1/192.168.1.236:50010, /rack2/192.168.1.238:50010]
    3. blk_-8860437510624268282_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.235:50010]
    4. blk_-1765246693355320434_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.232:50010]
    5. blk_9063781070378080202_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.234:50010]
    6. blk_8687961040692226467_1003 len=134217728 repl=3 [/rack1/192.168.1.234:50010, /rack1/192.168.1.237:50010, /rack2/192.168.1.239:50010]
    7. blk_-5717347662754027031_1003 len=134217728 repl=3 [/rack1/192.168.1.236:50010, /rack1/192.168.1.232:50010, /rack2/192.168.1.63:50010]
    8. blk_-5624359065285533759_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.231:50010]
    9. blk_622948206607478459_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.236:50010]
    10. blk_-4154428280295153090_1003 len=134217728 repl=3 [/rack1/192.168.1.232:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.63:50010]
    11. blk_6638201995439663469_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.237:50010]
    12. blk_-3282418422086241856_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.233:50010]
    13. blk_2802846523093904336_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.237:50010]
    14. blk_-7425405918846384842_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.234:50010]
    15. blk_-8997936298966969491_1003 len=134217728 repl=3 [/rack1/192.168.1.237:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.238:50010]
    16. blk_-827035362476515573_1003 len=134217728 repl=3 [/rack2/192.168.1.239:50010, /rack2/192.168.1.63:50010, /rack1/192.168.1.235:50010]
    17. blk_-5734389503841877028_1003 len=134217728 repl=3 [/rack1/192.168.1.231:50010, /rack1/192.168.1.235:50010, /rack2/192.168.1.66:50010]
    18. blk_1446125973144404377_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.238:50010, /rack1/192.168.1.235:50010]
    19. blk_-7161959344923757995_1003 len=134217728 repl=3 [/rack2/192.168.1.66:50010, /rack2/192.168.1.238:50010, /rack1/192.168.1.234:50010]
    20. blk_-2171786920309180709_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.237:50010]
    21. blk_7184760167274632839_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.66:50010, /rack1/192.168.1.233:50010]
    22. blk_1315507788295151463_1003 len=134217728 repl=3 [/rack2/192.168.1.63:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.233:50010]
    23. blk_5923416026032542888_1003 len=134217728 repl=3 [/rack2/192.168.1.238:50010, /rack2/192.168.1.239:50010, /rack1/192.168.1.236:50010]
    24. blk_-8960096699099874150_1003 len=37882879 repl=3 [/rack1/192.168.1.234:50010, /rack1/192.168.1.233:50010, /rack2/192.168.1.63:50010]
  • 相关阅读:
    Spring注解@Resource和@Autowired区别对比
    Http请求中Content-Type讲解以及在Spring MVC中的应用
    解决SpringMVC的@ResponseBody返回中文乱码
    JVM之类加载器下篇
    JVM之类加载器中篇
    JVM之类加载器上篇
    HashMap的resize和Fail-Fast机制
    HashMap的实现原理
    Tomcat中JVM内存溢出及合理配置
    redis的主从复制,读写分离,主从切换
  • 原文地址:https://www.cnblogs.com/gslyyq/p/4975313.html
Copyright © 2011-2022 走看看