zoukankan      html  css  js  c++  java
  • docker使用Dockerfile搭建spark集群

    1、创建Dockerfile文件,内容如下

    # 基础镜像,包括jdk
    FROM openjdk:8u131-jre-alpine
    
    #作者
    LABEL maintainer "tony@163.com"
    
    #用户
    USER root
    
    #编码
    
    ENV LANG=C.UTF-8 
        TZ=Asia/Shanghai
    
    #下载到时候安装spark需要的工具
    
    RUN apk add --no-cache --update-cache bash curl tzdata wget tar 
        && cp /usr/share/zoneinfo/$TZ /etc/localtime 
        && echo $TZ > /etc/timezone
    
    #设置工作目录
    
    WORKDIR /usr/local
    
    #拷贝 当下载过慢时先下载到本地在拷贝
    # COPY spark-2.4.0-bin-hadoop2.7.tgz /usr/local 
    
    #下载spark
    
    RUN wget "http://archive.apache.org/dist/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz" 
        && tar -vxf spark-* 
        && mv spark-2.4.0-bin-hadoop2.7 spark 
        && rm -rf spark-2.4.0-bin-hadoop2.7.tgz
    
    #设定spark的home
    
    ENV SPARK_HOME=/usr/local/spark 
        JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk 
        PATH=${PATH}:${JAVA_HOME}/bin:${SPARK_HOME}/bin
    
    #暴露端口,8080 6066 7077 4044 18080
    
    EXPOSE 6066 8080 7077 4044 18080
    
    #工作目录
    
    WORKDIR $SPARK_HOME
    
    CMD ["/bin/bash"]

    2、在Dockerfile所在目录下构建镜像

    docker build -t spark:v2.4.0 .

    3、启动主节点

    docker run -itd --name spark-master -p 6066:6066 -p 7077:7077 -p 8081:8080 -h spark-master spark:v2.4.0 ./bin/spark-class org.apache.spark.deploy.master.Master

    4、启动从节点

    docker run -itd --name spark-worker -P -h spark-worker --link spark-master spark:v2.4.0 ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077

    5、浏览器访问

    http://192.168.75.142:8081/

    6、启动spark-shell客户端

    docker exec -it 99306051a9d4 ./bin/spark-shell

    其中 99306051a9d4 是 spark-master 的容器ID

    7、spark-shell测试

    scala> val file = sc.textFile("/usr/local/spark/bin/beeline")
    file: org.apache.spark.rdd.RDD[String] = /usr/local/spark/bin/beeline MapPartitionsRDD[1] at textFile at <console>:24
    
    scala> val words = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey((a,b) => a+b)
    words: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25
    
    scala> words.collect
    res0: Array[(String, Int)] = Array(($CLASS,1), (Software,1), (Unless,1), (this,3), (starting,1), (under,4), (The,1), (limitations,1),
    ("$0")"/find-spark-home,1), (Figure,1), (express,1), (contributor,1), (#,20), (WITHOUT,1), ("AS,1), (#!/usr/bin/env,1), (See,2), (License.,2),
    (for,4), (fi,1), (software,1), (IS",1), (obtain,1), (ANY,1), (SPARK_HOME,1), (out,1), (required,1), (2.0,1), (OR,1), (file,3), (the,9), (-o,1),
    (licenses,1), (not,1), (either,1), (if,2), (posix,2), (source,1), (Apache,2), (then,1), (
    "License");,1), (language,1), (License,3), (Enter,1),
    (permissions,1), (WARRANTIES,1), (license,1), (by,1), (];,1), (
    "$(dirname,1), (an,1), ([,1), (agreed,1), (Version,1), (implied.,1), (KIND,,1),
    (is,2), ((the,1), (exec,1), ("${SPARK_HOME}",1), (agreements.,1), (on,1), (You,2), (one,1)... scala>
  • 相关阅读:
    CS 系统框架二[完善自动更新]
    CS 系统框架二
    CS 系统框架二[增加默认启动以及代码打开窗体]
    2022届宝鸡质检[1]文数参考答案
    2022届宝鸡质检[1]理数参考答案
    合并DataTable并排除重复数据的通用方法
    IE6鼠标闪烁之谜
    Windows下MemCache多端口安装配置
    XML解析文件出错解决方法
    巧用row_number和partition by分组取top数据
  • 原文地址:https://www.cnblogs.com/areyouready/p/10383783.html
Copyright © 2011-2022 走看看