zoukankan      html  css  js  c++  java
  • HIVE存储格式ORC、PARQUET对比

      hive有三种默认的存储格式,TEXT、ORC、PARQUET。TEXT是默认的格式,ORC、PARQUET是列存储格式,占用空间和查询效率是不同的,专门测试过后记录一下。

    一:建表语句差别

    create table if not exists text(
    a bigint
    ) partitioned by (dt string)
    row format delimited fields terminated by '01'
    location '/hdfs/text/';

    create table if not exists orc(
    a bigint)
    partitioned by (dt string)
    row format delimited fields terminated by '01'
    stored as orc
    location '/hdfs/orc/';

    create table if not exists parquet(
    a bigint)
    partitioned by (dt string)
    row format delimited fields terminated by '01'
    stored as parquet
    location '/hdfs/parquet/';

    其实就是stored as 后面跟的不一样

    二:HDFS存储对比

    parquet orc text
    709M 275M 1G
    687M 249M 1G
    647M 265M 1G

    三:查询时间对比

    parquet orc text
    36.451 26.133 42.574
    38.425 29.353 41.673
    36.647 27.825 43.938

    四:文件如何生成

    val sparkSession = SparkSession.builder().master("local").appName("pushFunnelV3").getOrCreate()
    val javasc = new JavaSparkContext(sparkSession.sparkContext)
    val nameRDD = javasc.parallelize(util.Arrays.asList("{'name':'zhangsan','age':'18'}", "{'name':'lisi','age':'19'}")).rdd;
    sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).csv("/data/aa")
    sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).orc("/data/bb")
    sparkSession.read.json(nameRDD).write.mode(SaveMode.Overwrite).parquet("/data/cc")

  • 相关阅读:
    最详细win7下手动搭建PHP环境:apache2.4.23+php7.0.11
    读书笔记:《HTML5开发手册》Web表单
    jQuery点击图片弹出大图遮罩层
    数据库之一
    Jquery中$.get(),$.post(),$.ajax(),$.getJSON()的用法总结
    PHP实现RTX发送消息提醒
    angularJS(3)
    angularJS(2)
    替换
    事务格式
  • 原文地址:https://www.cnblogs.com/wuxiaolong4/p/11809291.html
Copyright © 2011-2022 走看看