zoukankan      html  css  js  c++  java
  • centos7 hdfs yarn spark 搭建笔记

    1.搭建3台虚拟机

    2.建立账户及信任关系

    3.安装java

    wget jdk-xxx
    
    rpm -i jdk-xxx

    4.添加环境变量(全部)

    export JAVA_HOME=/usr/java/jdk1.8.0_141
    export JRE_HOME=$JAVA_HOME/jre
    export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
    export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    export HADOOP_HOME=/data/spark/bin/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export SPARK_HOME=/data/spark/bin/spark
    export PATH=$PATH:$SPARK_HOME/bin

    5.搭建hadoop

    1>vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

    export JAVA_HOME=/usr/java/jdk1.8.0_141

    2>vi $HADOOP_HOME/etc/hadoop/core-site.xml

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://10.0.0.5:9000</value>
        </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/data/spark/bin/hadoop/tmp</value>
    </property> </configuration>

    3>vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:///data/spark/hdfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:///data1/hdfs-ext,file:///data2/hdfs-ext,file:///data3/hdfs-ext</value>
        </property>
        <property>
            <name>dfs.namenode.checkpoint.dir</name>
            <value>/data/spark/hdfs/namesecondary</value>
        </property>
        <property>
            <name>dfs.namenode.http-address</name>
            <value>0.0.0.0:50070</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>0.0.0.0:50090</value>
        </property>
        <property>
            <name>dfs.datanode.http.address</name>
            <value>0.0.0.0:50075</value>
        </property>
        <property>
          <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
          <value>false</value>
        </property>
    </configuration>

    4>vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

    <configuration>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>10.0.0.5</value>
        </property>
        <property>
            <name>yarn.nodemanager.local-dirs</name>
            <value>/data/spark/hdfs/nm-local-dir</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>8192</value>
        </property>
        <property>
            <name>yarn.nodemanager.resource.cpu-vcores</name>
            <value>4</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>0.0.0.0:8088</value>
        </property>
        <property>
            <name>yarn.nodemanager.webapp.address</name>
            <value>0.0.0.0:8042</value>
        </property>
        <property>
            <name>yarn.nodemanager.pmem-check-enabled</name>
            <value>false</value>
        </property>
        <property>
            <name>yarn.nodemanager.vmem-check-enabled</name>
            <value>false</value>
        </property>
    <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>5</value>
    </property> </configuration>

     5>vi $HADOOP_HOME/etc/hadoop/slaves

    10.0.0.5
    10.0.0.6
    10.0.0.7

    6>拷贝hadoop文件到各个从机,并设置PATH

    7>hdfs namenode格式化

    hdfs namenode -format

    8>启动hdfs并查看日志

    start-dfs.sh

    9>启动yarn并查看日志

    start-yarn.sh

    10>查看各节点进程情况,一定要看日志

    jps

    一定要看日志

    11>测试并查看日志

    cd /xxx
    echo "this is a test for hdfs" > 1.txt
    hadoop fs -mkdir /spark
    hadoop fs -mkdir /spark/test
    hadoop fs -appendToFile 1.txt hdfs://10.0.0.5:9000/spark/test/1.txt
    hadoop fs -cat hdfs://10.0.0.5:9000/spark/test/1.txt

    6.搭建spark

    1>修改spark-env.sh

    mv $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh
    vi $SPARK_HOME/conf/spark-env.sh
    export SPARK_HOME=/data/spark/bin/spark
    export JAVA_HOME=/usr/java/jdk1.8.0_141
    export HADOOP_HOME=/data/spark/bin/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export SPARK_MASTER_IP=10.0.0.5
    export SPARK_LOCAL_DIRS=/data/spark/bin/spark
    export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
    export SPARK_LOG_DIR=/data/spark/bin/spark/logs

    2>修改spark-defaults.conf

    mv $SPARK_HOME/conf/spark-defaults.conf.template $SPARK_HOME/conf/spark-defaults.conf
    vi $SPARK_HOME/conf/spark-defaults.conf
    spark.yarn.jars  hdfs://10.0.0.5:9000/spark/jars/*

     3>上传jars

    cd $SPARK_HOME/jars
    hadoop fs -mkdir /spark/jars
    hadoop fs -put * hdfs://10.0.0.5:9000/spark/jars/

    4>修改slave(没什么用)

    mv $SPARK_HOME/conf/slaves.template $SPARK_HOME/conf/slaves
    vi $SPARK_HOME/conf/slaves
    10.0.0.5
    10.0.0.6
    10.0.0.7

     5>单点交互测试

    pyspark --master local[4]

    6>集群交互测试

    pyspark --master yarn --deploy-mode client

    7>建立测试脚本 vi test.py

    from __future__ import print_function
    
    import sys
    from random import random
    from operator import add
    
    from pyspark.sql import SparkSession
    
    if __name__ == "__main__":
    """
    Usage: pi [partitions]
    """
        spark = SparkSession
    .builder
    .appName("PythonPi")
    .getOrCreate()
    
    lines = spark.sparkContext.textFile("hdfs://10.0.0.5:9000/spark/test/1.txt")
    num = lines.count()
    p_str = lines.first()
    print("--------------------"+str(num)+"---------------------")
    print("--------------------"+p_str+"---------------------")
    
    spark.stop()

    8>单点任务测试

    spark-submit --master local[4] test.py

    9>集群任务测试

    spark-submit --master yarn --deploy-mode cluster test.py
  • 相关阅读:
    合并字符串中的多个空格
    IfcSpecularRoughness
    IfcSpecularExponent
    IfcPresentableText
    IfcFontWeight
    IfcFontVariant
    uwb ifc模型定位測試
    IfcFontStyle
    IfcGeometricModelResource
    qt6安装
  • 原文地址:https://www.cnblogs.com/mangoVic/p/7515435.html
Copyright © 2011-2022 走看看