zoukankan html css js c++ java

SparkShell(sparkSql) on k8s

k8s上没有搭建zepplin,有时候想使用sparkshell/sparksql查一些数据不是很方便，尤其是数据量大的时候,下面描述一下在k8s上运行一个pod,然后在pod里面运行sparkshell/sparksql这样就可以方便查询数据。

(当然，如果你本机有固定的ip或可以使用花生壳之类的服务，就可以直接使用spark-shell/sparksql 的client模式到k8s上请求资源）

#step1 create a pod as spark-client 
cat <<EOF >spark-client.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: spark-client
  name: spark-client
spec:
  containers:
  - name: spark-client
    image: student2021/spark:301p
    imagePullPolicy: Always
    securityContext:
        allowPrivilegeEscalation: false
        runAsUser: 0
    command:
      - sh
      - -c
      - "exec tail -f /dev/null"
  restartPolicy: Never
  serviceAccount: spark
EOF 
kubectl apply -n spark-job spark-client.yaml 
#step2 enter spark-client pod and run spark-shell or spark-sql 
kubectl -n spark-job exec -it spark-client sh 
export SPARK_USER=spark
driver_host=$(cat /etc/hosts|grep spark-client|cut -f 1)
echo $driver_host
/opt/spark/bin/spark-shell --conf spark.jars.ivy=/tmp/.ivy 
--master k8s://localhost:18080 
--deploy-mode client  
--conf spark.kubernetes.namespace=spark  
--conf spark.kubernetes.container.image=student2021/spark:301p 
--conf spark.kubernetes.container.image.pullPolicy=Always  
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark  
--conf spark.kubernetes.driver.pod.name=spark-client  
--conf spark.executor.instances=4  
--conf spark.executor.memory=4g 
--conf spark.driver.memory=4g 
--conf spark.driver.host=${driver_host} 
--conf spark.driver.port=14040  
###如果想使用headless service可以执行下面的操作
kubectl expose deployment spark-client --port=14040 --type=ClusterIP --cluster-ip=None

Looking for a job working at Home about MSBI

查看全文

相关阅读:
数据结构——线性结构（链表）
栈和队列的应用——迷宫问题（深度、广度优先搜索）
数据结构——线性结构（列表、栈、队列）
hibernate之HQL
hibernate关联关系(多对多)
Hibernate关联关系（一对多）
hibernate之主键生成策略
 hibernate入门
 reduce个数问题
 hbase连接linux开发过程

原文地址：https://www.cnblogs.com/huaxiaoyao/p/14714071.html