zoukankan      html  css  js  c++  java
  • Hadoop0.21.0部署安装以及mapreduce测试

    鉴于hadoop的需要。。。但是并不限于此。。。有时候闲输入密码麻烦,也可以用这种办法从一个节点通过ssh进入另一个节点。。。

    设要使master进入slave是免密码的,则可以在master(ip为192.168.169.9)中如下操作:

    命令:ssh-keygen -t rsa  然后一路回车(该命令不需要进入特定目录)

    cd进入/root/.ssh/可以看见如下(橙色是新生成的)

    id_rsa  id_rsa.pub  known_hosts

    然后用scp命令将id_rsa远程传输到slave(ip为192.168.169.10)节点中:scp  id_rsa.pub  192.168.169.10:/root/.ssh/192.168.169.9

    则可以在slave的/root/.ssh/目录下看到名为192.168.169.9的文件:

    ssh密码登陆 - 寒塘渡鹤影 - wuyanzan606——疏云冷月
     然后用以下命令:
    cat 192.168.169.9 >>authorized_keys
    这里的结果因为我已经提前做了,所以我们从上面图片也可以看出来authorized_keys也有了。。。
    这个时候就用ssh免密码进入slave了
    如果想要同时能从slave进入master过程也是一样的。。。
     
     1.安装环境:一台物理机(namenode),加两台虚拟机(datanode)。

     主机名

    IP 

    功能 

     namenode

     192.168.169.9

     NameNode,jobtraker

     datanode1

     192.168.169.10

     DataNode,tasktraker

     datanode2

     192.168.169.20

     DataNode,tasktraker

    同时在三台机子的/etc/hosts中添加以下内容以修改机器名:  

    192.168.169.9  namenode
    192.168.169.10 datanode1
    192.168.169.20 datanode2

    修改后注意解析测验下:(datanode测验同理)

    -bash-3.1# ping -c 4 namenode
    PING namenode (192.168.169.9) 56(84) bytes of data.
    64 bytes from namenode (192.168.169.9): icmp_seq=1 ttl=64 time=0.020 ms
    64 bytes from namenode (192.168.169.9): icmp_seq=2 ttl=64 time=0.009 ms
    64 bytes from namenode (192.168.169.9): icmp_seq=3 ttl=64 time=0.009 ms
    64 bytes from namenode (192.168.169.9): icmp_seq=4 ttl=64 time=0.010 ms

    --- namenode ping statistics ---
    4 packets transmitted, 4 received, 0% packet loss, time 2997ms
    rtt min/avg/max/mdev = 0.009/0.012/0.020/0.004 ms

    然后修改/etc/sysconfig/network中的域名(因为必须要域名和机器名一样)如图(这里只列出namenode的,datanode的类似,只修改HOSTNAME)

    NETWORKING=yes
    NETWORKING_IPV6=yes
    HOSTNAME=namenode

     
    2.ssh免密码认证

    具体做法上一篇博文已经讲过。。这里不再赘述,但是要注意,这里需要做到的是:

    1. master到slave的免密码认证;
    2. slave到master的免密码认证;
    3. master到master的免密码认证。

    例如我对第一条测验一下:

    -bash-3.1# ssh datanode1
    Last login: Fri Feb 17 08:32:34 2012 from namenode
    -bash-3.1# exit
    logout
    Connection to datanode1 closed.
    -bash-3.1# ssh datanode2
    Last login: Fri Feb 17 08:32:42 2012 from namenode
    -bash-3.1# exit
    logout
    Connection to datanode2 closed.

    3.安装JDK并配置环境变量,因为我的已经安装过了,所以我只需要配置环境变量了,修改/etc/profile内容如下:(注意修改后要source /etc/profile)  
    export JAVA_HOME=/usr/java/jdk1.6.0_29
    export JRE_HOME=$JAVA_HOME/jre
    export PATH=$PATH:/usr/java/jdk1.6.0_29/bin
    export CLASSPATH=:/usr/java/jdk1.6.0_29/lib:/usr/java/jdk1.6.0_29/jre/lib
    查看下版本:
    -bash-3.1# java -version
    java version "1.6.0_29"
    Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)
    写个具体例子跑一下(test.java):

    class test
    {
    public static void main(String[] args)
    {
    System.out.println("Hello,World!");
    }
    }

    测试一下:
    -bash-3.1# javac test.java 

    -bash-3.1# java test
    Hello,World!

    说明成功了。。。。。
    4.Hadoop安装配置
    下载hadoop-0.21.0.tar.gz压缩包下来之后解压缩到一定目录,然后把它放到个固定的地方,我把它放在了/usr/local/hadoop目录下,进去看下
    -bash-3.1# cd /usr/local/hadoop/hadoop-0.21.0/
    -bash-3.1# ll
    drwxrwxr-x 2 huyajun huyajun    4096 02-17 10:30 bin
    drwxrwxr-x 5 huyajun huyajun    4096 2010-08-17 c++
    drwxr-xr-x 8 huyajun huyajun    4096 2010-08-17 common
    drwxrwxr-x 2 huyajun huyajun    4096 02-16 15:54 conf
    -rw-rw-r-- 1 huyajun huyajun 1289953 2010-08-17 hadoop-common-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun  622276 2010-08-17 hadoop-common-test-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun  934881 2010-08-17 hadoop-hdfs-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun  613332 2010-08-17 hadoop-hdfs-0.21.0-sources.jar
    -rw-rw-r-- 1 huyajun huyajun    6956 2010-08-17 hadoop-hdfs-ant-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun  688026 2010-08-17 hadoop-hdfs-test-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun  419671 2010-08-17 hadoop-hdfs-test-0.21.0-sources.jar
    -rw-rw-r-- 1 huyajun huyajun 1747897 2010-08-17 hadoop-mapred-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun 1182309 2010-08-17 hadoop-mapred-0.21.0-sources.jar
    -rw-rw-r-- 1 huyajun huyajun  252064 2010-08-17 hadoop-mapred-examples-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun 1492025 2010-08-17 hadoop-mapred-test-0.21.0.jar
    -rw-rw-r-- 1 huyajun huyajun  298837 2010-08-17 hadoop-mapred-tools-0.21.0.jar
    drwxr-xr-x 8 huyajun huyajun    4096 2010-08-17 hdfs
    drwxrwxr-x 4 huyajun huyajun    4096 2010-08-17 lib
    -rw-rw-r-- 1 huyajun huyajun   13366 2010-08-17 LICENSE.txt
    drwxr-xr-x 3 root    root       4096 02-17 08:54 logs
    drwxr-xr-x 9 huyajun huyajun    4096 2010-08-17 mapred
    -rw-rw-r-- 1 huyajun huyajun     101 2010-08-17 NOTICE.txt
    -rw-rw-r-- 1 huyajun huyajun    1366 2010-08-17 README.txt
    drwxrwxr-x 8 huyajun huyajun    4096 2010-08-17 webapps
    进行hadoop相关设置,具体如下:
    a. hadoop-env.sh中加入如下行:

    export JAVA_HOME=/usr/java/jdk1.6.0_29

    b. core-site.xml 修改后如下(红色内容):hadoop.tmp.dir表示dfs所在目录

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="http://wuyanzan60688.blog.163.com/blog/configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://namenode:9000</value>
    </property>

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/wuyanzan/hadoop-1.0.1</value>
    </property>
    </configuration>

    c. hdfs-site.xml 红色内容为增加的行

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="http://wuyanzan60688.blog.163.com/blog/configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>2</value>
    </property>
    </configuration>

    d. mapred-site.xml 红色内容为增加行

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="http://wuyanzan60688.blog.163.com/blog/configuration.xsl"?>

    <!-- Put site-specific property overrides in this file. -->

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>namenode:9001</value>
    </property>
    </configuration>

    e. masters 修改后整个内容如下:

    namenode

    f. slaves 修改后整个内容如下:

    datanode1
    datanode2

    g. 添加hadoop环境变量,在/etc/profile中加入如下内容:

    HADOOP_HOME=/usr/local/hadoop/hadoop-0.21.0
    PATH=$PATH:$HADOOP_HOME/bin
    export PATH HADOOP_HOME

    然后source /etc/profile
    之后最重要的一步是,把整个hadoop文件夹scp复制到两个datanode中相应的文件夹下,为:/usr/local/
    5.hadoop启动,排错,监控
    首先在namenode上对hdfs进行格式化:
    -bash-3.1# cd /usr/local/hadoop/hadoop-0.21.0/bin/
    -bash-3.1# ls
    hadoop            hadoop-daemon.sh   hdfs            mapred            rcc        start-all.sh       start-dfs.sh     stop-all.sh       stop-dfs.sh     test.class
    hadoop-config.sh  hadoop-daemons.sh  hdfs-config.sh  mapred-config.sh  slaves.sh  start-balancer.sh  start-mapred.sh  stop-balancer.sh  stop-mapred.sh  test.java
    -bash-3.1# hadoop namenode -format
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.
    12/02/17 13:58:17 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = namenode/192.168.169.9
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 0.21.0

    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21 -r 985326; compiled by 'tomwhite' on Tue Aug 17 01:02:28 EDT 2010
    ************************************************************/
    Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) Y
    12/02/17 13:58:22 INFO namenode.FSNamesystem: defaultReplication = 2
    12/02/17 13:58:22 INFO namenode.FSNamesystem: maxReplication = 512
    12/02/17 13:58:22 INFO namenode.FSNamesystem: minReplication = 1
    12/02/17 13:58:22 INFO namenode.FSNamesystem: maxReplicationStreams = 2
    12/02/17 13:58:22 INFO namenode.FSNamesystem: shouldCheckForEnoughRacks = false
    12/02/17 13:58:22 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
    12/02/17 13:58:22 INFO namenode.FSNamesystem: fsOwner=root
    12/02/17 13:58:22 INFO namenode.FSNamesystem: supergroup=supergroup
    12/02/17 13:58:22 INFO namenode.FSNamesystem: isPermissionEnabled=true
    12/02/17 13:58:22 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    12/02/17 13:58:23 INFO common.Storage: Image file of size 110 saved in 0 seconds.
    12/02/17 13:58:23 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
    12/02/17 13:58:23 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at namenode/192.168.169.9
    ************************************************************/

    正常格式化之后接着执行:sh start-all.sh

    -bash-3.1# sh start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh
    starting namenode, logging to /usr/local/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-namenode-namenode.out
    datanode2: starting datanode, logging to /usr/local/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-datanode-datanode2.out
    datanode1: starting datanode, logging to /usr/local/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-datanode-datanode1.out
    namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-secondarynamenode-namenode.out
    starting jobtracker, logging to /usr/local/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-jobtracker-namenode.out
    datanode1: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-tasktracker-datanode1.out
    datanode2: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.21.0/bin/../logs/hadoop-root-tasktracker-datanode2.out

    然后进入conf文件夹下jps查看:

    -bash-3.1# cd ../conf/
    -bash-3.1# ls
    capacity-scheduler.xml  core-site.xml       hadoop-env.sh              hadoop-policy.xml  log4j.properties   mapred-site.xml  slaves                  ssl-server.xml.example
    configuration.xsl       fair-scheduler.xml  hadoop-metrics.properties  hdfs-site.xml      mapred-queues.xml  masters          ssl-client.xml.example  taskcontroller.cfg
    -bash-3.1# jps
    1081 NameNode
    1532 JobTracker
    1376 SecondaryNameNode
    1690 Jps

    顺便进入datanode里查看下:

    -bash-3.1# cd /usr/local/hadoop/hadoop-0.21.0/conf/
    -bash-3.1# jps
    6146 DataNode
    6227 Jps
    6040 TaskTracker

    这里我出现了一些问题,就是有时候我突然发现我DataNode没有启动起来。。。后来进datanode的日志里查看,问题是:

    ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data: namenode namespaceID = 991936739; datanode namespaceID = 1787084289

    经高人指点说是我因为多次format的问题,所以我把namenode和datanode的tmp文件夹下的hadoop-root文件夹全部删掉,重新format之后start-all就可以了。。。。但是这也有个问题就是,如果namenode的数据就因为tmp被删除掉而丢失了。。高人建议我先备份之后再删。。。至于怎么弄我因为急着把整个框架搭建起来没有细问。。。。以后再研究。。。
    总算是先搭起来了。。。。
     
    然后我来做一下简单的mapreduce的wordcount测试,测试文件是我的python自动生成的word.txt文件
    random_char.py文件如下:

    import random,sys,string

    if len(sys.argv)<2:
            print sys.argv[0],"count "
            sys.exit()
    f=open('word.txt','a')
    i=0
    while i<int(sys.argv[1]):
            i=i+1
            f.write(chr(97+random.randint(0,25)))
            f.write(" ")
    print "The word.txt have create"

    执行后可以生成70多M的一个txt文件。
    然后我们要把这个文件从linux文件系统中转到hdfs中去。
    具体命令如下:

    hadoop dfs -mkdir wuyanzan

    hadoop dfs -put word.txt /user/root/wuyanzan

    以上命令式在hdfs中新建一个文件夹,专门用于存放word.txt,我们可以ls查看一下

    -bash-3.1# hadoop dfs -ls /user/root/
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    12/02/17 14:43:43 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
    12/02/17 14:43:43 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    Found 2 items
    drwxr-xr-x   - root supergroup          0 2012-02-17 14:26 /user/root/wuyanzan

    然后进wuyanzan文件夹里头看下

    -bash-3.1# hadoop dfs -ls /user/root/wuyanzan/
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    12/02/17 14:59:10 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
    12/02/17 14:59:11 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    Found 1 items
    -rw-r--r--   2 root supergroup    7625490 2012-02-17 14:45 /user/root/wuyanzan/word.txt

    然后我们可以执行mapreduce,具体如下,输出结果放入output中:

    -bash-3.1# hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount wuyanzan output
    12/02/17 14:46:58 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
    12/02/17 14:46:58 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    12/02/17 14:46:58 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    12/02/17 14:46:58 INFO input.FileInputFormat: Total input paths to process : 1
    12/02/17 14:46:59 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    12/02/17 14:46:59 INFO mapreduce.JobSubmitter: number of splits:1
    12/02/17 14:46:59 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
    12/02/17 14:46:59 INFO mapreduce.Job: Running job: job_201202171400_0001
    12/02/17 14:47:00 INFO mapreduce.Job:  map 0% reduce 0%
    12/02/17 14:47:15 INFO mapreduce.Job:  map 66% reduce 0%
    12/02/17 14:47:18 INFO mapreduce.Job:  map 100% reduce 0%
    12/02/17 14:47:24 INFO mapreduce.Job:  map 100% reduce 100%
    12/02/17 14:47:26 INFO mapreduce.Job: Job complete: job_201202171400_0001
    12/02/17 14:47:26 INFO mapreduce.Job: Counters: 33

    完成后我们可以查看下output中的内容:

    -bash-3.1# hadoop dfs -ls /user/root/output
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    12/02/17 14:49:21 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
    12/02/17 14:49:21 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    Found 2 items
    -rw-r--r--   2 root supergroup          0 2012-02-17 14:47 /user/root/output/_SUCCESS
    -rw-r--r--   2 root supergroup        234 2012-02-17 14:47 /user/root/output/part-r-00000

    最后的统计结果放在part-r-00000中,查看下:

    -bash-3.1# hadoop dfs -cat /user/root/output/part-r-00000
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    12/02/17 14:49:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
    12/02/17 14:49:47 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    146072
    146313
    147031
    147121
    146570
    146807
    146539
    147319
    146891
    146895
    146233
    146308
    146839
    146945
    147098
    146992
    146203
    146640
    146276
    146411
    146667
    146083
    146287
    146480
    146392
    147333

    这里我必须要指出一个缺陷是:hadoop dfs命令不支持tab补全功能。。。。。这点很让人蛋疼。。。。你必须记住完整的路径。。。不然就老是报错。。。。。
    曾见错误:

    1.ssh进入到datanode,运行jps,发现TaskTracker启动起来,但是Datanode没启动,提示:
    Could not synchronize with target
    排查后发现是datanode的/etc/hosts文件中没有解析自己的localhost

    2.could only be replicated to 0 nodes, instead of 1

    这个错误我是偶然发现的,因为之前用的明明是好的,但是我因为某种原因重新format之后,在put文件就报这个错误,给我吓一跳,后来才知道:当你申请到一个HOD集群后马上尝试上传文件到HDFS时,DFSClient会警告。你要做的是启动之后等一会儿,然后再put,就不会出现了。(这个原因真的是很无语。。。)当然另外的原因别人已经说过很多:要么就是hdfs没有空间了,要么就是datanode没启动起来。

  • 相关阅读:
    SpringBoot集成Redis
    独享锁 & 共享锁
    公平锁与非公平锁
    如何上传本地代码到码云
    SpringBoot+Mybatis+Pagehelper分页
    SpringBoot集成Mybatis(0配置注解版)
    高并发下接口幂等性解决方案
    SpringBoot全局配置文件
    干货
    Spring读取外部的资源配置文件—@PropertySource和@Value实现资源文件配置
  • 原文地址:https://www.cnblogs.com/gaodong/p/3800137.html
Copyright © 2011-2022 走看看