zoukankan      html  css  js  c++  java
  • CDH版本hadoop2.6伪分布式安装

    1、基础环境配置

    主机名IP地址角色Hadoop用户
    centos05 192.168.48.105

    NameNode、ResourceManager、SecondaryNameNode、

    DataNode、NodeManager

    hadoop

    1.1、关闭防火墙和SELinux

    1.1.1、关闭防火墙

        略

    1.1.2、关闭SELinux

        略

        注:以上操作需要使用root用户

    1.2、hosts配置

      

    1 | [root@centos05 ~]#  vim/etc/hosts
    2 | ##hadoop host####
    3 | 192.168.48.105  centos05

      

    1 | [root@centos05 ~]#  vim /etc/sysconfig//network
    2 
    3 | HOSTNAME=centos05

      注:以上操作需要使用root用户,通过ping 主机名可以返回对应的IP即可

    1.3、创建主机账号及配置无密码访问

      

    新建用户,建议用adduser命令
    sudo adduser hadoop passwd hadoop 输入密码后一直按回车即可,最后输入y确定。 在创建hadoop用户的同时也创建了hadoop用户组,下面我们把hadoop用户加入到hadoop用户组 输入 sudo usermod
    -a -G hadoop hadoop 前面一个hadoop是组名,后面一个hadoop是用户名。完成后输入一下命令查询结果。 cat /etc/group 然后再把hadoop用户赋予root权限,让他可以使用sudo命令 切换到可以root的用户输入 sudo gedit /etc/sudoers sudo vi /etc/sudoers 在图形界面可以用第一个命令,是ubuntu自带的一个文字编辑器,终端命令界面使用第二个命令。有关vi编辑器的使用自行百度。 修改文件如下: # User privilege specification root ALL=(ALL) ALL hadoop ALL=(ALL) ALL 保存退出,hadoop用户就拥有了root权限
    生成私钥和公钥
    ssh-keygen -t rsa
    拷贝公钥到主机(需要输入密码)
    ssh-copy-id hadoop@hadoop
    注:以上操作需要在hadoop用户,通过hadoop用户ssh到本机主机不需要密码即可

    1.4、Java环境配置

    1.4.1、下载JDK

      略

    1.4.2、安装java

      略

    2、安装hadoop

    2.1、下载安装CDH版本的hadoop

      下载链接:http://archive-primary.cloudera.com/cdh5/cdh/5/

    2.2、安装配置hadoop

      hadoop的安装配置使用hadoop用户操作;

    • 创建目录,用于存放hadoop数据;
    [hadoop@centos05 ~]$ mkdir -p /home/hadoop/app/hadoop/hdfs/{name,data}

    2.2.1、配置core-site.xml

    [hadoop@centos05 ~]$vim  /opt/hadoop/hadoop-2.6.0/etc/hadoop/core-site.xml
    

    <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9090</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop/tmp</value> </property> </configuration>

    2.2.2、配置hdfs-site.xml

    [hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hdfs-site.xml
    
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/opt/hadoop/hdfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/opt/hadoop/hdfs/data</value>
        </property>
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
    </configuration>

    2.2.3、配置mapred-site.xml

    [hadoop@centos05 hadoop]$cd /opt/hadoop/hadoop-2.6.0/etc/hadoop
    
    [hadoop@centos05 hadoop]$cp mapred-site.xml.template mapred-site.xml
    
    [hadoop@centos05 hadoop]$vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/mapred-site.xml
    
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>

    2.2.4、配置yarn-site.xml

    [hadoop@centos05 hadoop]$  vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/yarn-site.xml
    
    <configuration>
    <!-- Site specific YARN configuration properties -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    </configuration>

    2.2.5、配置slaves

    [hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/slaves
    
    centos05

    2.2.6、配置hadoop-env

      修改hadoop-env.sh文件的JAVA_HOME环境变量,操作如下:  

    [hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hadoop-env.sh
    
    export JAVA_HOME=/opt/java/jdk1.8.0_191

    2.2.7、配置yarn-env

      修改yarn-env.sh文件的JAVA_HOME环境变量,操作如下:

    [hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hadoop-env.sh
    
    export JAVA_HOME=/opt/java/jdk1.8.0_191

    2.2.8、配置mapred-env

      修改mapred-env.sh文件的JAVA_HOME环境变量,操作如下:

    [hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hadoop-env.sh
    
    export JAVA_HOME=/opt/java/jdk1.8.0_191

    2.2.9、配置HADOOP_PREFIX

      配置HADOOP主机用户环境变量:

    [hadoop@centos05 ~]$ vim .bash_profile
    
    ####HADOOP_PREFIX####
    export HADOOP_PREFIX=/opt/hadoop/hadoop-2.6.0
    export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

      启用环境变量

    [hadoop@centos05 ~]$ source .bash_profile 

      注:通过echo $HADOOP_PREFIX命令返回hadoop的安装目录

    3、启动hadoop伪分布式

    3.1、启动hdfs和yarn

    • 格式化hdfs

      [hadoop@centos05 ~]$  hdfs namenode -format
    • 启动dfs

    • 启动yarn

      [hadoop@centos05 ~]$  start-dfs.sh

      [hadoop@centos05 ~]$ start-yarn.sh

        

    • 查看启动的进程
      [hadoop@centos05 ~]$ jps
      18265 DataNode 18615 ResourceManager 18463 SecondaryNameNode 31343 Jps 18728 NodeManager 18152 NameNode

      注:关闭dfs命令为:stop-dfs.sh     stop-yarn.sh

    3.3、启动集群

      hdfs和yarn的启动可以使用一条命令执行:  

    启动:start-all.sh
    关闭:  stop-all.sh
    • 启动后的所有进程:  

    [hadoop@centos05 ~]$ start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [centos05]
    centos05: starting namenode, logging to /opt/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-namenode-centos05.out
    centos05: starting datanode, logging to /opt/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-datanode-centos05.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to 
          /opt/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-secondarynamenode-centos05.out starting yarn daemons starting resourcemanager, logging to /opt/hadoop/hadoop-2.6.0/logs/yarn-hadoop-resourcemanager-centos05.out centos05: starting nodemanager, logging to /opt/hadoop/hadoop-2.6.0/logs/yarn-hadoop-nodemanager-centos05.out [hadoop@centos05 ~]$

     

    • 启动后的所有进程:

    [hadoop@centos05 ~]$ jps
    32640 NodeManager
    529 Jps
    32057 NameNode
    32526 ResourceManager
    32356 SecondaryNameNode
    32172 DataNode

    4、hdfs的shell操作和Wordcount演示

    4.1、简单的hdfs shell操作

    • 创建目录

      [hadoop@centos05 ~]$ hadoop fs -mkdir /input_test
      $ hadoop fs -mkdir /output_test
    • 查看目录

      [hadoop@centos05 ~]$ hadoop fs -ls /
      Found 3 items
      drwxr-xr-x   - hadoop supergroup          0 2018-11-27 23:04 /input_test
      drwxr-xr-x   - hadoop supergroup          0 2018-11-27 23:27 /output_test
      drwx------   - hadoop supergroup          0 2018-11-27 23:08 /tmp
    • 上传文件

      [hadoop@centos05 /]$ hadoop fs -put  /opt/hadoop/hadoop-2.6.0/share/doc/index.html  /input_test
    • 查看上传文件
    • [hadoop@centos05 /]$ hadoop fs -ls    /input_test/index.html
      -rw-r--r--   1 hadoop supergroup      19968 2018-11-28 10:08 /input_test/index.html

        

    • 查看文本文件内容
      [hadoop@centos05 /]$ hadoop fs -cat    /input_test/index.html

    4.2、Wordcount

      将HDFS上/input_text/index.html 使用hadoop内置Wordcount的jar包统计文档的Wordcount

    • 启动测试

      [hadoop@centos05 /]$ hadoop jar /opt/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/
      hadoop-mapreduce-examples-2.6.0-cdh5.15.1.jar wordcount
         /input_test/index.html /output_test/runcount
      18/11/28 10:18:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 18/11/28 10:18:54 INFO input.FileInputFormat: Total input paths to process : 1 18/11/28 10:18:54 INFO mapreduce.JobSubmitter: number of splits:1 18/11/28 10:18:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543369969234_0002 18/11/28 10:18:56 INFO impl.YarnClientImpl: Submitted application application_1543369969234_0002 18/11/28 10:18:56 INFO mapreduce.Job: The url to track the job:
      http://centos05:8088/proxy/application_1543369969234_0002/ 18/11/28 10:18:56 INFO mapreduce.Job: Running job: job_1543369969234_0002 18/11/28 10:19:16 INFO mapreduce.Job: Job job_1543369969234_0002 running in uber mode : false 18/11/28 10:19:16 INFO mapreduce.Job: map 0% reduce 0% 18/11/28 10:19:31 INFO mapreduce.Job: map 100% reduce 0% 18/11/28 10:19:43 INFO mapreduce.Job: map 100% reduce 100% 18/11/28 10:19:44 INFO mapreduce.Job: Job job_1543369969234_0002 completed successfully 18/11/28 10:19:45 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=13728 FILE: Number of bytes written=313427 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=20075 HDFS: Number of bytes written=11719 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=12498 Total time spent by all reduces in occupied slots (ms)=9428 Total time spent by all map tasks (ms)=12498 Total time spent by all reduce tasks (ms)=9428 Total vcore-milliseconds taken by all map tasks=12498 Total vcore-milliseconds taken by all reduce tasks=9428 Total megabyte-milliseconds taken by all map tasks=12797952 Total megabyte-milliseconds taken by all reduce tasks=9654272 Map-Reduce Framework Map input records=383 Map output records=1087 Map output bytes=18860 Map output materialized bytes=13728 Input split bytes=107 Combine input records=1087 Combine output records=504 Reduce input groups=504 Reduce shuffle bytes=13728 Reduce input records=504 Reduce output records=504 Spilled Records=1008 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=174 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=5455101952 Total committed heap usage (bytes)=165810176 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=19968 File Output Format Counters Bytes Written=11719 [hadoop@centos05 /]$
    • 查看结果
      [hadoop@centos05 /]$ hadoop fs -ls /output_test/runcount/
      
      Found 2 items
      -rw-r--r--   1 hadoop supergroup          0 2018-11-28 10:19 /output_test/runcount/_SUCCESS
      -rw-r--r--   1 hadoop supergroup      11719 2018-11-28 10:19 /output_test/runcount/part-r-00000
      
      [hadoop@centos05 /]$ hadoop fs -cat  /output_test/runcount/part-r-00000
      2018-08-09      2
      <!--    2
      <!DOCTYPE       1
      </a>    3
      </body> 1
      </div>  13
      </head> 1
      </html> 1
      </li>   84
      </style>        1
      </ul>   12
      <a      94
      <body   1
      <div    15
      ......略

    5、遇到的问题

    5.1、WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    解决:导致该问题的改版本是因为${HADOOP_PREFIX}/lib/native目录没有lib库,解决办法是到hadoop官网下载2.6的包,把lib/native目录下的数据拷贝过去。

    5.2、openssl: false Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录)!

    解决:/usr/lib64/目录下做一个libcrypto.so软连

    cd /usr/lib64/
    ln -s /usr/lib64/libcrypto.so.1.0.1e libcrypto.so
    • 使用命令export HADOOP_ROOT_LOGGER=DEBUG,console可以在终端上看到更详细的日志信息方便排查问题;
    • 以上两个问题可以使用命令检查是否为true:hadoop checknative

    注:${HADOOP_PREFIX}表示hadoop的安装目录,或者说是${HADOOP_HOME}

    6、参考资料

    http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.5/hadoop-project-dist/hadoop-common/SingleCluster.html

  • 相关阅读:
    [机器学习案例1]基于KNN手写数字识别
    Android横竖屏切换View设置不同尺寸或等比例缩放的XML解决方案
    Qt之网络编程
    Redis系列-存储篇sorted set主要操作函数小结
    Code First Migrations更新数据库结构的具体步骤
    json文件解析
    go语言文件操作
    docker容器与主机之间的文件复制
    ubuntu16.04 安装docker
    Go语言string,int,int64 ,float之间类型转换方法
  • 原文地址:https://www.cnblogs.com/fameg/p/10030658.html
Copyright © 2011-2022 走看看