zoukankan      html  css  js  c++  java
  • 学习随笔 pyspark JDBC 操作oracle数据库

    # -*- coding:utf-8 -*-
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SQLContext
    import numpy as np
    
    
    appName = "jhl_spark_1"  # 你的应用程序名称
    master = "local"  # 设置单机
    conf = SparkConf().setAppName(appName).setMaster(master)  # 配置SparkContext
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    url='jdbc:oracle:thin:@127.0.0.1:1521:ORCL'
    tablename='V_JSJQZ'
    properties={"user": "Xho", "password": "sys"}
    df=sqlContext.read.jdbc(url=url,table=tablename,properties=properties)
    #df=sqlContext.read.format("jdbc").option("url",url).option("dbtable",tablename).option("user","Xho").option("password","sys").load()
    #注册为表,然后在SQL语句中使用
    df.registerTempTable("v_jsjqz")
    #SQL可以在已注册为表的RDDS上运行
    df2=sqlContext.sql("select ZBLX,BS,JS,JG from v_jsjqz t order by ZBLX,BS")
    list_data=df2.toPandas()# 转换格式toDataFrame
    list_data = list_data.dropna()# 清洗操作,去除有空值的数据
    list_data = np.array(list_data).tolist()#tolist
    RDDv1=sc.parallelize(list_data)#并行化数据,转化为RDD
    RDDv2=RDDv1.map(lambda x:(x[0]+'^'+x[1],[[float(x[2]),float(x[3])]]))
    RDDv3=RDDv2.reduceByKey(lambda a,b:a+b)
    sc.stop()

     这里的 pyspark 是spark安装的文件夹里python文件夹下的,需要复制到anoconda的Lib下site-packages中

    代码中没有环境变量的配置,不愿意在本机配置环境变量的可以去查查spark在python中环境变量配置

  • 相关阅读:
    堆和栈究竟有什么区别?
    堆和栈的区别
    POJ 1528问题描述
    Facial Detection and Recognition with opencv on ios
    10个免费学习编程的好地方
    目标检测的图像特征提取之(一)HOG特征
    行人检测综述
    Introduction to Face Detection and Face Recognition
    opencv hog+svm行人检测
    苹果检测
  • 原文地址:https://www.cnblogs.com/ToDoNow/p/9542731.html
Copyright © 2011-2022 走看看