一:将数据手动导入hive中
(1)先将数据和脚本用上传工具传入/home/hadoop中
(2)在虚拟机中 ./hive -f /home/hadoop/createHiveTab.sql 运行该命令,数据将手动导入hive中
(在这里注意hive -f 和 hive -e 的区别):
./hive -f /home/hadoop/createHiveTab.sql
hive -f 后面指定的是一个文件,然后文件里面直接写sql,就可以运行hive的sql,
./hive -e "show databases;use default;show tables;"
hive -e 后面是直接用双引号拼接hivesql,然后就可以执行命令
脚本 createHiveTab.sql:
set hive.support.sql11.reserved.keywords=false; CREATE TABLE IF NOT EXISTS traffic.monitor_flow_action( date string , monitor_id string , camera_id string , car string , action_time string , speed string , road_id string, area_id string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' ; load data local inpath '/home/hadoop/monitor_flow_action' into table traffic.monitor_flow_action; CREATE TABLE IF NOT EXISTS traffic.monitor_camera_info( monitor_id string , camera_id string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' ; load data local inpath '/home/hadoop/monitor_camera_info' into table traffic.monitor_camera_info;
二:将数据自动导入hive中 :
(1)在maven项目中pom.xml中引入hive整合包
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
(2)把hive中这个路径下的/root/Downloads/apache-hive-1.2.0-bin/conf中的 hive-site.xml 这个放到resource下
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://主机名:端口号/数据库名?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hadoop</value> </property> </configuration>
(3)spark sql 声明对hive支持 运行idea下面的代码,数据会自动在hive数仓中生成 !
object HiveAsDataDource { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder() .appName("HiveAsDataDource") .master("local[*]") .enableHiveSupport() .getOrCreate() // 自动将数据写入hive中 spark.sql("create database IF NOT EXISTS traffic") spark.sql("USE traffic") spark.sql("DROP TABLE IF EXISTS monitor_flow_action") //在hive 中创建monitor_flow_action表 spark.sql("CREATE TABLE IF NOT EXISTS monitor_flow_action"+ "(date STRING,monitor_id STRING,camera_id STRING,car STRING,action_time STRING,speed STRING,road_id STRING,area_id STRING)"+ " row format delimited fields terminated by ' '") spark.sql("load data local inpath 'file:///D:/data/monitor_flow_action' into table monitor_flow_action") //在hive 中创建monitor_camera_info表 spark.sql("DROP TABLE IF EXISTS monitor_camera_info") spark.sql("CREATE TABLE IF NOT EXISTS monitor_camera_info (monitor_id STRING,camera_id STRING) row format delimited fields terminated by ' '") spark.sql("LOAD DATA" + " LOCAL INPATH 'file:///D:/data/monitor_camera_info'" + "INTO TABLE monitor_camera_info") System.out.println("===========data2hive finish===========") spark.close() } }
(4)注意红色的地方,file前面不能有空格,///转义,否则报
Exception in thread "main" org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist
(5)额外补充:删除hive数据库中已经存在的数据库的命令是:
drop database 数据库名 cascade;