zoukankan      html  css  js  c++  java
  • hadoop 伪分布式单机部署练习hive

    第一步环境准备:

    jdk安装,用户用组新建

    useradd  -m hadoop 

    passwd hadoop 修改密码

    添加用户hadoop到hadoop用户组 

    wget   https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

    tar -xvf  hadoop-3.2.1.tar.gz  -C /data/projects 

    sudo chown -R hadoop:hadoop /data/projects 

    usermod  -a  -G hadoop haddop 第一个hadoop是组名,-a 防止其他用户组的hadoop离开,保持旧的用户组拥有hadoop用户状态

    单机伪分布式,免密操作

    ssh-keygen -t rsa 

    cat id_rsa.pub  >> authorized_keys

    chmod  600  authorized_keys

    修改主机名不重启

    hostname hadoop 

    配置hadoop环境变量:类比jdk

    # hadoop home
    export HADOOP_HOME=/data/projects/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    修改hadoop 配置文件:/data/projects/hadoop/etc/hadoop

    1.修改hadoop-env.sh添加如下:

    [hadoop@hadoop hadoop]$ grep JAVA_HOME hadoop-env.sh
    export JAVA_HOME=/usr/local/java/jdk1.8.0_221

    2.修改core-site.xml

    .配置默认采用的文件系统。
    (由于存储层和运算层松耦合,要为它们指定使用hadoop原生的分布式文件系统hdfs。value填入的是uri,参数是 分布式集群中主节点的地址 : 指定端口号

    2.配置hadoop的公共目录
    (指定hadoop进程运行中产生的数据存放的工作目录,NameNode、DataNode等就在本地工作目录下建子目录存放数据。但事实上在生产系统里,NameNode、DataNode等进程都应单独配置目录,而且配置的应该是磁盘挂载点,以方便挂载更多的磁盘扩展容量

    <configuration>
      <property>
    	<name>fs.defaultFS</name>
    	<value>hdfs://hadoop:9000</value>
      </property>
      <property>
    	<name>hadoop.tmp.dir</name>
    	<value>/data/projects/hadoop/tmp</value>
      </property>
    </configuration>
    

    3.修稿hdfs-site.xml,配置副本数量

    1.配置启动hadoop50070端口

    2.(客户端将文件存到hdfs的时候,会存放在多个副本。value一般指定3,但因为搭建的是伪分布式就只有一台机器,所以只能写1。)

    <configuration>
       <property>
    	<name>dfs.replication</name>
    	<value>1</value>
       </property>
      <property>
       <name>dfs.http.address</name>
       <value>192.168.110.151:50070</value>
      </property>
    </configuration>
    

     4.配置 mapred-site.xml 

    指定MapReduce程序应该放在哪个资源调度集群上运行。若不指定为yarn,那么MapReduce程序就只会在本地运行而非在整个集群中运行。

    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>
    

    5.配置 yarn-site.xml

    1.指定yarn集群中的老大(就是本机)

    2.配置yarn集群中的重节点,指定map产生的中间结果传递给reduce采用的机制是shuffle

    <configuration>
    
       <property>
    	<name>yarn.resourcemanager.hostname</name>
    	<value>hadoop</value>
       </property>
       <property>
    	<name>yarn.nodemanager.aux-services</name>
    	<value>mapreduce_shuffle</value>
       </property>
    </configuration>
    

     

    6.配置 关闭防火墙

    格式化hadoop : 

    执行hdfs namenode -format 

    2020-05-27 19:18:49,081 INFO util.GSet: 0.029999999329447746% max memory 839.5 MB = 257.9 KB
    2020-05-27 19:18:49,081 INFO util.GSet: capacity = 2^15 = 32768 entries
    2020-05-27 19:18:49,112 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1667952246-192.168.110.151-1590578329102
    2020-05-27 19:18:49,131 INFO common.Storage: Storage directory /data/projects/hadoop/tmp/dfs/name has been successfully formatted.
    2020-05-27 19:18:49,184 INFO namenode.FSImageFormatProtobuf: Saving image file /data/projects/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    2020-05-27 19:18:49,367 INFO namenode.FSImageFormatProtobuf: Image file /data/projects/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
    2020-05-27 19:18:49,399 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    2020-05-27 19:18:49,416 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
    2020-05-27 19:18:49,416 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.110.151
    ************************************************************/

    启动服务:cd  /data/projects/hadoop/sbin 执行

    [hadoop@hadoop sbin]$ start-dfs.sh
    Starting namenodes on [hadoop]
    hadoop: Warning: Permanently added 'hadoop' (ECDSA) to the list of known hosts.
    Starting datanodes
    localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
    Starting secondary namenodes [hadoop]
    [hadoop@hadoop sbin]$ start-yarn.sh
    Starting resourcemanager
    Starting nodemanagers
    [hadoop@hadoop sbin]$ jps
    57681 NameNode
    58020 SecondaryNameNode
    57800 DataNode
    58712 Jps
    58380 NodeManager
    58255 ResourceManager

    六个一个不少就成功了

  • 相关阅读:
    python-Mitmproxy抓包
    pytest-html、cov、xdist
    python-unittest添加用例的几种方式
    python-*args、**kargs用法
    One,Two,Three,Ak模板
    栈和队列小练习
    区块链入门介绍笔记
    Research on Facebook and Social Graph
    线段树板子的小修改
    htaccess远古时期技术了解一下
  • 原文地址:https://www.cnblogs.com/SunshineKimi/p/12975981.html
Copyright © 2011-2022 走看看