zoukankan      html  css  js  c++  java
  • centos7无cm安装hadoop+spark

    配置内核参数后重启生效
    # echo 'vm.swappiness=10'>> /etc/sysctl.conf

    安装JDK8
    # rpm -ivh jdk-8u211-linux-x64.rpm
    # vi /etc/profile
    export JAVA_HOME=/usr/java/jdk1.8.0_211-amd64
    export CLASSPATH=.:$JAVA_HIOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    # source /etc/profile

    总结
    1.无CM使用rpm的方式安装CDH6.2.0与之前安装CDH5.10.0基本没有太大的区别。
    2.此安装方式需要下载相关的所有rpm包到服务器,然后制作本地的yum源进行安装,下载的包的总大小在4.3G左右。
    3.同样的在安装过程中需要最先安装Zookeeper。

    ----------------------------------------------------------------------

    3.1 Zookeeper
    1.在所有节点安装Zookeeper
    # yum -y install zookeeper

    2.创建数据目录并修改属主
    # mkdir -p /var/lib/zookeeper
    # chown -R zookeeper /var/lib/zookeeper

    3.修改配置文件/etc/zookeeper/conf/zoo.cfg
    maxClientCnxns=60
    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/var/lib/zookeeper
    clientPort=2181
    dataLogDir=/var/lib/zookeeper
    minSessionTimeout=4000
    maxSessionTimeout=40000
    server.1=gp-mdw:3181:4181
    server.2=gp-data-1:3181:4181
    server.3=gp-data-2:3181:4181

    4.所有节点创建myid文件并修改属主
    【gp-mdw】# echo 1 > /var/lib/zookeeper/myid
    【gp-mdw】# chown zookeeper:zookeeper myid
    ssh gp-data-1 -e "echo 2 > /var/lib/zookeeper/myid"
    ssh gp-data-2 -e "echo 3 > /var/lib/zookeeper/myid"

    5.所有节点启动Zookeeper
    【gp-data-2】# /usr/lib/zookeeper/bin/zkServer.sh start
    【gp-data-1】# /usr/lib/zookeeper/bin/zkServer.sh start
    【gp-mdw】 # /usr/lib/zookeeper/bin/zkServer.sh start

    查看所有节点启动状态,三个节点均启动成功
    # /usr/lib/zookeeper/bin/zkServer.sh status
    JMX enabled by default
    Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
    Mode: follower

    -------------------------------------------------------------------------------

    3.2 HDFS
    1.在所有节点安装HDFS必需的包,由于只有三个节点,所以三个节点都安装DataNode
    yum -y install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-datanode
    2.在一个节点安装NameNode以及SecondaryNameNode
    yum -y install hadoop-hdfs-namenode hadoop-hdfs-secondarynamenode

    3.创建数据目录并修改属主和权限
    所有节点创建DataNode的目录
    mkdir -p /data0/dfs/dn
    chown -R hdfs:hadoop /data0/dfs/dn
    chmod 700 /data0/dfs/dn

    NameNode和SecondaryNameNode节点创建数据目录
    mkdir -p /data0/dfs/nn
    chown -R hdfs:hadoop /data0/dfs/nn
    chmod 700 /data0/dfs/nn
    mkdir -p /data0/dfs/snn
    chown -R hdfs:hadoop /data0/dfs/snn
    chmod 700 /data0/dfs/snn

    4.修改配置文件
    # /etc/hadoop/conf/core-site.xml
    <configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://gp-mdw:8020</value>
    </property>
    <property>
    <name>fs.trash.interval</name>
    <value>1</value>
    </property>
    <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
    </property>
    </configuration>

    # vi /etc/hadoop/conf/hdfs-site.xml
    <configuration>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data0/dfs/nn</value>
    </property>
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data0/dfs/dn</value>
    </property>
    <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>gp-mdw:8022</value>
    </property>
    <property>
    <name>dfs.https.address</name>
    <value>gp-mdw:9871</value>
    </property>
    <property>
    <name>dfs.secondary.http.address</name>
    <value>gp-mdw:50090</value>
    </property>
    <property>
    <name>dfs.https.port</name>
    <value>9871</value>
    </property>
    <property>
    <name>dfs.namenode.http-address</name>
    <value>gp-mdw:9870</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>
    <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
    </property>
    <property>
    <name>dfs.namenode.checkpoint.dir</name>
    <value>file:///data0/dfs/snn</value>
    </property>
    </configuration>

    5.将修改的配置文件保存并同步到所有节点
    scp /etc/hadoop/conf/core-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/core-site.xml gp-data-2:/etc/hadoop/conf
    scp /etc/hadoop/conf/hdfs-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/hdfs-site.xml gp-data-2:/etc/hadoop/conf

    6.格式化NameNode
    sudo -u hdfs hdfs namenode -format

    7.在所有节点运行命令启动HDFS
    【gp-mdw】systemctl start hadoop-hdfs-namenode
    【gp-mdw】systemctl start hadoop-hdfs-secondarynamenode
    全部: systemctl start hadoop-hdfs-datanode
    【gp-mdw】systemctl status hadoop-hdfs-namenode
    【gp-mdw】systemctl status hadoop-hdfs-secondarynamenode
    全部: systemctl status hadoop-hdfs-datanode

    8.创建/tmp临时目录,并设置目录权限,然后使用hadoop命令查看创建的目录成功
    sudo -u hdfs hadoop fs -mkdir /tmp
    sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

    9.访问NameNode的Web UI
    http://gp-mdw:8020

    --------------------------------------------------------------------

    3.3 Yarn
    1.安装Yarn的包,在一个节点安装ResourceManager和JobHistory Server,所有节点安装NodeManager
    【gp-mdw】 # yum -y install hadoop-yarn hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-mapreduce
    全部: # yum -y install hadoop-yarn hadoop-yarn-nodemanager hadoop-mapreduce

    2.创建目录并修改属主和权限
    在所有节点创建本地目录
    mkdir -p /data0/yarn/nm
    chown yarn:hadoop /data0/yarn/nm
    mkdir -p /data0/yarn/container-logs
    chown yarn:hadoop /data0/yarn/container-logs

    在HDFS上创建logs目录
    sudo -u hdfs hdfs dfs -mkdir /tmp/logs
    sudo -u hdfs hdfs dfs -chown mapred:hadoop /tmp/logs
    sudo -u hdfs hdfs dfs -chmod 1777 /tmp/logs

    在HDFS上创建/user/history目录
    sudo -u hdfs hdfs dfs -mkdir -p /user
    sudo -u hdfs hdfs dfs -chmod 777 /user
    sudo -u hdfs hdfs dfs -mkdir -p /user/history
    sudo -u hdfs hdfs dfs -chown mapred:hadoop /user/history
    sudo -u hdfs hdfs dfs -chmod 1777 /user/history
    sudo -u hdfs hdfs dfs -mkdir -p /user/history/done
    sudo -u hdfs hdfs dfs -mkdir -p /user/history/done_intermediate
    sudo -u hdfs hdfs dfs -chown -R mapred:hadoop /user/history
    sudo -u hdfs hdfs dfs -chmod 771 /user/history/done
    sudo -u hdfs hdfs dfs -chmod 1777 /user/history/done_intermediate

    3.修改配置文件
    # vi /etc/hadoop/conf/yarn-site.xml
    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///data0/yarn/nm</value>
    </property>
    <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>file:///data0/yarn/container-logs</value>
    </property>
    <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
    </property>
    <property>
    <name>yarn.application.classpath</name>
    <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>
    </property>
    <property>
    <name>yarn.resourcemanager.address</name>
    <value>gp-mdw:8032</value>
    </property>
    <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>gp-mdw:8033</value>
    </property>
    <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>gp-mdw:8030</value>
    </property>
    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>gp-mdw:8031</value>
    </property>
    <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>gp-mdw:8088</value>
    </property>
    <property>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>gp-mdw:8090</value>
    </property>
    </configuration>

    # vi /etc/hadoop/conf/mapred-site.xml
    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>gp-mdw:10020</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>gp-mdw:19888</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.webapp.https.address</name>
    <value>gp-mdw:19890</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.admin.address</name>
    <value>gp-mdw:10033</value>
    </property>
    <property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/user</value>
    </property>
    </configuration>

    # /etc/hadoop/conf/core-site.xml,下面只贴出修改的部分配置
    <property>
    <name>hadoop.proxyuser.mapred.groups</name>
    <value>*</value>
    </property>
    <property>
    <name>hadoop.proxyuser.mapred.hosts</name>
    <value>*</value>
    </property>

    将修改的配置文件保存并同步到所有节点
    scp /etc/hadoop/conf/core-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/core-site.xml gp-data-2:/etc/hadoop/conf
    scp /etc/hadoop/conf/yarn-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/yarn-site.xml gp-data-2:/etc/hadoop/conf
    scp /etc/hadoop/conf/mapred-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/mapred-site.xml gp-data-2:/etc/hadoop/conf

    5.启动Yarn服务
    在JobHistoryServer节点上启动mapred-historyserver
    【gp-mdw】/etc/init.d/hadoop-mapreduce-historyserver start

    在RM节点启动ResourceManager
    【gp-mdw】systemctl start hadoop-yarn-resourcemanager
    【gp-mdw】systemctl status hadoop-yarn-resourcemanager

    在NM节点启动NodeManager
    【全部】systemctl start hadoop-yarn-nodemanager
    【全部】systemctl status hadoop-yarn-nodemanager

    6.访问Yarn服务的Web UI
    Yarn的管理页面
    JobHistory的管理页面
    查看在线的节点
    <name>mapreduce.jobhistory.address</name>
    <value>gp-mdw:10020</value>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>gp-mdw:19888</value>
    <name>mapreduce.jobhistory.webapp.https.address</name>
    <value>gp-mdw:19890</value>
    <name>mapreduce.jobhistory.admin.address</name>
    <value>gp-mdw:10033</value>

    7.运行MR示例程序
    使用root用户运行示例程序,所以要先创建root用户的目录
    sudo -u hdfs hdfs dfs -mkdir /user/root
    sudo -u hdfs hdfs dfs -chown root:root /user/root
    运行MR示例程序,运行成功
    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 5 5

    -----------------------------------------------------------------------------

    3.4 Spark
    1.安装Spark所需的包
    【gp-mdw】yum -y install spark-core spark-master spark-worker spark-history-server spark-python

    2.创建目录并修改属主和权限
    sudo -u hdfs hadoop fs -mkdir /user/spark
    sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory
    sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark
    sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory

    3.修改配置文件/etc/spark/conf/spark-defaults.conf
    spark.eventLog.enabled=true
    spark.eventLog.dir=hdfs://gp-mdw:8020/user/spark/applicationHistory
    spark.yarn.historyServer.address=http://gp-mdw:18088

    4.启动spark-history-server
    【gp-mdw】systemctl start spark-history-server
    【gp-mdw】systemctl status spark-history-server
    访问Web UI

    //5.修改配置文件并同步到所有节点
    //scp /etc/spark/conf/spark-defaults.conf gp-data-1:/etc/spark/conf
    //scp /etc/spark/conf/spark-defaults.conf gp-data-2:/etc/spark/conf

    6.测试Spark使用
    spark-submit --class org.apache.spark.examples.SparkPi --master local /usr/lib/spark/examples/jars/spark-examples_2.11-2.4.0-cdh6.3.2.jar
    2020-03-15 10:01:56 INFO DAGScheduler:57 - Job 0 finished: reduce at SparkPi.scala:38, took 1.052675 s
    Pi is roughly 3.143435717178586

    -----------------------------------------------------------------------------------

    由于centos7原本就安装了Python2,而且这个Python2不能被删除,因为有很多系统命令,比如yum都要用到。
    安装python3.7之前需要先安装一些依赖
    yum -y install zlib-devel bzip2-devel openssl-devel openssl-static ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel lzma gcc
    wget https://www.python.org/ftp/python/3.7.7/Python-3.7.7.tar.xz

    mkdir /usr/local/python3

    然后解压压缩包,进入该目录,安装Python3
    tar -xvJf Python-3.7.7.tar.xz
    cd Python-3.7.7
    ./configure --prefix=/usr/local/python3 --enable-shared
    make && make install

    最后创建软链接
    ln -s /usr/local/python3/bin/python3 /usr/bin/python3
    ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3

    在命令行中输入python3测试
    # echo "/usr/local/python3/lib" > /etc/ld.so.conf.d/python3-x86_64.conf
    # ldconfig -v

    # vi /etc/profile
    export PYSPARK_PYTHON=/usr/bin/python3
    export PYSPARK_DRIVER_PYTHON=/usr/bin/python3
    # source /etc/profile
    # pyspark 调用的是就是python3.7

    ------------------------------------------------------------------------------------------

  • 相关阅读:
    CREATE AGGREGATE
    技术文档列表
    jQuery 判断表单中多个 input text 中至少有一个不为空
    Java实现 蓝桥杯 算法提高 奥运会开幕式
    Java实现 蓝桥杯 算法提高 最长滑雪道
    Java实现 蓝桥杯 算法提高 最长滑雪道
    Java实现 蓝桥杯 算法提高 最长滑雪道
    Java实现 蓝桥杯 算法提高 最大值路径
    Java实现 蓝桥杯 算法提高 最大值路径
    Java实现 蓝桥杯 算法提高 最大值路径
  • 原文地址:https://www.cnblogs.com/zsfishman/p/12500104.html
Copyright © 2011-2022 走看看