Spark 读写hive 表 - 走看看

zoukankan html css js c++ java

Spark 读写hive 表
spark 读写hive表主要是通过sparkssSession

读表的时候，很简单，直接像写sql一样sparkSession.sql("select * from xx") 就可以了。

这里主要是写数据，因为数据格式有很多类型，比如orc,parquet 等，这里就需要按需要的格式写数据。

首先，对于特殊的格式这里就要制定

　　 dataFrame.write.format("orc")的方式。

其次，对于写入分区表有2种方式，insertInto 和saveAsTable,

　　a) insertInto 不需要制定分区，分区应该是你创建表的时候已经写明了的。
```
  insertInto() can't be used together with partitionBy().Partition columns have already be defined for the table. It is not necessary to use partitionBy().
```
　　b) saveAsTable 抛异常：提示你用 insertInto，忘了把日志保存了。暂时记着吧。

　　　

类似问题：

http://blog.csdn.net/lc0817/article/details/78211695?utm_source=debugrun&utm_medium=referral

https://stackoverflow.com/questions/32362206/spark-dataframe-saveastable-with-partitionby-creates-no-orc-file-in-hdfs
查看全文

相关阅读:
大叔程序员的第六天 @布局&eclipse配置文件
 20130305Android自定义Button按钮显示样式
 solr学习（1）
Lucence3.0学习（1）
Liskov Substitution Principle (LSP) OO设计的里氏替换原则
 对于高内聚低耦合的理解
 ASP.NET 网站中的共享代码文件夹
 二探String类型
 初探String类型
 版本控制初探1

原文地址：https://www.cnblogs.com/parkin/p/7919866.html

Copyright © 2011-2022 走看看