k8s上没有搭建zepplin,有时候想使用sparkshell/sparksql查一些数据不是很方便,尤其是数据量大的时候,下面描述一下在k8s上运行一个pod,然后在pod里面运行sparkshell/sparksql这样就可以方便查询数据。
(当然,如果你本机有固定的ip或可以使用花生壳之类的服务,就可以直接使用spark-shell/sparksql 的client模式到k8s上请求资源)
#step1 create a pod as spark-client cat <<EOF >spark-client.yaml apiVersion: v1 kind: Pod metadata: labels: run: spark-client name: spark-client spec: containers: - name: spark-client image: student2021/spark:301p imagePullPolicy: Always securityContext: allowPrivilegeEscalation: false runAsUser: 0 command: - sh - -c - "exec tail -f /dev/null" restartPolicy: Never serviceAccount: spark EOF kubectl apply -n spark-job spark-client.yaml #step2 enter spark-client pod and run spark-shell or spark-sql kubectl -n spark-job exec -it spark-client sh export SPARK_USER=spark driver_host=$(cat /etc/hosts|grep spark-client|cut -f 1) echo $driver_host /opt/spark/bin/spark-shell --conf spark.jars.ivy=/tmp/.ivy --master k8s://localhost:18080 --deploy-mode client --conf spark.kubernetes.namespace=spark --conf spark.kubernetes.container.image=student2021/spark:301p --conf spark.kubernetes.container.image.pullPolicy=Always --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.pod.name=spark-client --conf spark.executor.instances=4 --conf spark.executor.memory=4g --conf spark.driver.memory=4g --conf spark.driver.host=${driver_host} --conf spark.driver.port=14040 ###如果想使用headless service可以执行下面的操作 kubectl expose deployment spark-client --port=14040 --type=ClusterIP --cluster-ip=None