zoukankan      html  css  js  c++  java
  • 本地IDEA中使用Spark直连集群上的Hive

    背景

    我用VMWare搭建了一个Hadoop集群,Spark与Hive等组件都已经安装完毕。现在我希望在我的开发机上使用IDEA连接到集群上的Hive进行相关操作。

    进行配置修改

    修改Hive中的hive-site.xml

    在hive-site.xml中找到这个配置,将改成如下形式

    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://master节点的ip地址:9083</value>
      <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>
    

    在hive-site.xml中找到如下配置,将中设置为false

    <property>
      <name>hive.metastore.schema.verification</name>
      <value>false</value>
      <description>
          Enforce metastore schema version consistency.
          True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
                schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
                proper metastore schema migration. (Default)
          False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
      </description>
    </property>
    

    将相关文件拷贝

    1. 复制hive-site.xml到spark目录下的conf/中
    2. 将hive文件夹jar下的mysql-connector-java-版本.jar拷贝到spark目录下的jar/中

    在集群上启动命令

    在master节点的命令行中启动

    hive --service metastore
    hive --service hiveserver2
    

    以上2条命令可以在后台运行,使用nohup即可

    本地IDEA使用

    示例代码如下:

    import ml.dmlc.xgboost4j.scala.spark.XGBoost
    import org.apache.spark.ml.feature.{StringIndexer, VectorAssembler}
    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}
    
    object XgbPredict {
        def main(args: Array[String]): Unit = {
            val spark = SparkSession
              .builder()
              .master("spark://master:7077")
              .config("hive.metastore.uris", "thrift://172.16.74.128:9083")
              .config("hive.metastore.warehouse.dir", "hdfs://172.16.74.128:9000/user/hive/warehouse")
              .config("spark.sql.warehouse.dir", "hdfs://172.16.74.128:9000/user/hive/warehouse")
              .enableHiveSupport()
              .getOrCreate()
    
            spark.sql("show databases").show()
            println("Done!")
        }
    
    }
    

    pom.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.wangt</groupId>
        <artifactId>SparkTest</artifactId>
        <version>1.0-SNAPSHOT</version>
        <properties>
            <spark.version>2.4.3</spark.version>
            <scala.version>2.11</scala.version>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs-client</artifactId>
                <version>2.8.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase-client</artifactId>
                <version>2.2.0</version>
            </dependency>
            <dependency>
                <groupId>ml.dmlc</groupId>
                <artifactId>xgboost4j-spark</artifactId>
                <version>0.72</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-hive_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-mllib_${scala.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-catalyst_2.11</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>com.google.guava</groupId>
                <artifactId>guava</artifactId>
                <version>14.0.1</version>
            </dependency>
            <dependency><!--数据库驱动:Mysql-->
                <groupId>mysql</groupId>
                <artifactId>mysql-connector-java</artifactId>
                <version>8.0.19</version>
            </dependency>
        </dependencies>
    
        <build>
            <plugins>
                <plugin>
                    <groupId>org.scala-tools</groupId>
                    <artifactId>maven-scala-plugin</artifactId>
                    <version>2.15.2</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
    
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.6.0</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                    </configuration>
                </plugin>
    
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.20</version>
                </plugin>
                <plugin>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <configuration>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                </plugin>
            </plugins>
        </build>
    </project>
    
  • 相关阅读:
    C#开发微信门户及应用(7)-微信多客服功能及开发集成
    C#开发微信门户及应用(6)--微信门户菜单的管理操作
    使用Json.NET来序列化所需的数据
    Winform开发框架里面使用事务操作的原理及介绍
    C#开发微信门户及应用(5)--用户分组信息管理
    C#开发微信门户及应用(4)--关注用户列表及详细信息管理
    基于MVC4+EasyUI的Web开发框架经验总结(3)- 使用Json实体类构建菜单数据
    基于MVC4+EasyUI的Web开发框架经验总结(2)- 使用EasyUI的树控件构建Web界面
    基于MVC4+EasyUI的Web开发框架经验总结(1)-利用jQuery Tags Input 插件显示选择记录
    如何在应用系统中实现数据权限的控制功能
  • 原文地址:https://www.cnblogs.com/shayue/p/ben-despark-zhi-jie-ji-qun-shang-dehive.html
Copyright © 2011-2022 走看看