zoukankan      html  css  js  c++  java
  • Hadoop伪分布式环境快速搭建

    Hadoop分支

    • Apache
    • Cloudera
    • Hortonworks

    本文是采用Cloudera分支的hadoop。

    下载cdh-5.3.6 版本

    下载地址:http://archive.cloudera.com/cdh5/cdh/5/

    各组件版本一定保持一致。

    • cdh5.3.6-snappy-lib-natirve.tar.gz
    • hadoop-2.5.0-cdh5.3.6.tar.gz
    • hive-0.13.1-cdh5.3.6.tar.gz
    • sqoop-1.4.5-cdh5.3.6.tar.gz

    安装配置

    • 配置好jdk
    • 上传到ubuntu /opt/software/cdh。
    • tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/cdh-5.3.6
    • tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz -C /opt/cdh-5.3.6

    修改hadoop-env.sh、yarn-env.sh、mapred-env.sh中JAVA_HOME 配置core-site.xml

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hp-expert.tianpo.com:8020</value>
        </property> 
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/data/tmp</value>
        </property>
    </configuration>
    

    配置hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hp-expert.tianpo.com:50090</value>
        </property>
        <property>
            <name>dfs.namenode.http-address</name>
            <value>hp-expert.tianpo.com:50070</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
            <property>
            <name>dfs.permissions</name>
            <value>false</value>
        </property>
    </configuration>    
    

    配置mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hp-expert.tianpo.com:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>hp-expert.tianpo.com:19888</value>
        </property>
    </configuration>
    

    配置yarn-site.xml

    <configuration>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hp-expert.tianpo.com</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>640800</value>
        </property>
    </configuration>
    

    配置slaves

    hp-expert.tianpo.com
    

    格式化namenode

    cd /opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6

    bin/hdfs namenode -format

    启动

    sbin/hadoop-daemon.sh start namenode

    sbin/hadoop-daemon.sh start datanode

    sbin/yarn-daemon.sh start resourcemanager

    sbin/yarn-daemon.sh start nodemanager

    sbin/mr-jobhistory-daemon.sh start historyserver

    检查jps:

    • 1905 NameNode
    • 2354 NodeManager
    • 2499 JobHistoryServer
    • 2084 ResourceManager
    • 1991 DataNode
    • 2538 Jps

    访问:http://hp-expert.tianpo.com:50070/ 如果打不开,检查是否有端口在监听:netstat –ant 50070

    检查host配置:格式为(不能以用127.0.0.1):IP 域名

    配置hive

    配置hive-env.sh

    # Set HADOOP_HOME to point to a specific hadoop install directory
    HADOOP_HOME=/opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6
    
    # Hive Configuration Directory can be controlled by:
    export HIVE_CONF_DIR=/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/conf
    

    配置hive-log4j.properties

    hive.log.threshold=ALL
    hive.root.logger=WARN,DRFA
    hive.log.dir=/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs
    hive.log.file=hive.log
    

    配置hive.site.xml(touch hive.site.xml)

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://host:3306/metadata?createDatabaseIfNotExist=true</value>
        </property>
        <property>
          <name>javax.jdo.option.ConnectionDriverName</name>
          <value>com.mysql.jdbc.Driver</value>
        </property>
        <property>  
          <name>javax.jdo.option.ConnectionUserName</name>  
          <value>***</value>  
        </property>  
    
        <property>  
          <name>javax.jdo.option.ConnectionPassword</name>  
          <value>***</value>  
        </property>
        <property>
          <name>hive.cli.print.header</name>
          <value>true</value>
        </property>
    
        <property>
          <name>hive.cli.print.current.db</name>
          <value>true</value>
        </property>
        <property>
          <name>hive.fetch.task.conversion</name>
          <value>more</value>
        </property>
    </configuration>
    

    需要把jdbc驱动上传到hive/lib下(mysql-connector-java-5.1.27.jar),注意对应的版本。

    在hdfs中创建hive数据仓库目录

    bin/hdfs dfs -mkdir -p /user/hive/warehouse
    
    bin/hdfs dfs -chomd g+w /user/hive/warehouse
    

    启动hive : bin/hive

    测试hive

    create table student(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '	';
    

    从外部加载数据:

    load data local inpath '/opt/datas/student.txt'into table student ;
    

    web站点

    • http://hp-expert.tianpo.com:50070
    • http://hp-expert.tianpo.com:8088/cluster
  • 相关阅读:
    [转]django自定义表单提交
    [django/mysql] 使用distinct在mysql中查询多条不重复记录值的解决办法
    [Django]下拉表单与模型查询
    [Django]模型提高部分--聚合(group by)和条件表达式+数据库函数
    [Django]模型学习记录篇--基础
    [Django]数据批量导入
    怎么让自己的本地php网站让别人访问到
    HTML Marquee跑马灯
    marquee标签详解
    apache的虚拟域名rewrite配置以及.htaccess的使用。
  • 原文地址:https://www.cnblogs.com/tianboblog/p/9097364.html
Copyright © 2011-2022 走看看