zoukankan      html  css  js  c++  java
  • Hiv 语法

    HIV语法

    • 1、创建数据库
      create database hive_db;

      显示数据库中表

      show tables;

      切换数据库

      use hive_db;

      显示数据库中

      show databases;

      显示某表结构,两种方式 1:表结构 2:表结构和分区数据位置等

      desc 表名;
      show create table 表名

    • 2、创建表脚本
      2.1 内部表-直接建表
      create table IF NOT EXISTS orders
      (
      order_id string comment "订单编号",
      user_id string comment "用户ID",
      order_number string comment "下单顺序",
      order_dow string comment "下单日期周一到周日",
      order_hour_of_day string comment "下单时间",
      days_since_prior_order string comment "距离上一次购物时间"
      )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY ' '
      STORED AS TEXTFILE;

      create table IF NOT EXISTS order_product(
       order_id string comment "订单编号",
       product_id string comment "物品ID",
       add_to_cart_order string comment "加入购物车时间",
       reordered string comment "是否复购"
      )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '
      '
      STORED AS TEXTFILE;
      

      2.1 查询建表
      #场景:建立一个临时表,或者一个中间表
      create table movies_tem as select * from movies limit 100;
      2.2 like 建表
      CREATE TABLE IF NOT EXISTS default.weblog_20150923
      LIKE default.weblog ;

    • 3、导入数据
      3.1 本地导入
      load data local inpath ‘/user/root/custom.csv’ overwrite into table 表明
      #用overwrite 加载本地数据到hive数据仓库
      #local:加上local指本地的数据路径,也就是在linux系统下的文件路径
      #不加local:指文件在hdfs下的路径,文件上传到hdfs后的路径

    • 4、collect_list和collect_set
      它们都是将分组中的某列转为一个数组返回,不同的是collect_list不去重而collect_set去重。
      https://www.cnblogs.com/cc11001100/p/9043946.html

    • 5、split分割和数组转行
      select split("I Love you", " ")
      select explode(split("I Love you", " "));

    • 6、row_number()、rank()、dense_rank()三个函数区别

    • 7、时间窗口
      row_number() over()
      sum() over()

      从最早的时间距你当前的时间

      select *, sum(result) over (partition by user_name order by create_time) as result_sum
      from user_match_temp

      从你当前时间到当前时间的前三条数据,不包括本条数据

      select *, avg(result) over (partition by user_name order by create_time rows between 3 preceding and current row) as recenty_wins
      from user_match_temp

    • 8、case when 条件表达式 then 表达式为true返回值 else 表达式为false返回值 end
      select uid,iid,score,case when score<=1 then '0-1' when score>1 and score<=3 then '1-3' when score>3 then '3-5' else '-1' end as
      score_rank from movies limit 30;

    • 9、if(条件表达式,表达式为true返回值,表达式为false返回值)

    • 10、自定义函数(UDFUDAFUDTF)

    • 11、concat 拼接
      select concat("{",'aaa',"}");

    • 12、regexp_replace #正在表达式
      select regexp_replace("'course')}", "^W+|W+$","")

    • 13、 针对json格式进行解析处理并转换成多行处理

    a:shangdong,b:beijing,c:shanghai	1,2,3	[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"}]
    a:tianjing,b:beijing,c:shanghai	3,4,5	[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"}]
    
    
    #去掉开头和结尾的[{、}]
    select regexp_replace(sale_info,'\[\{|\}\]','') from explode_test
    

    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"
    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"

    # 拆分并转成多行
    select explode(split(regexp_replace(sale_info,'\[\{|\}\]',''),"\},\{")) from explode_test
    ~~~显示结果
    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"
    "source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"
    "source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"
    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"
    "source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"
    "source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"
    

    通过get_json_object函数转换成json并获取json中属性

    select
    get_json_object(concat("{",t.infos,"}"),"$.source"),
    get_json_object(concat("{",t.infos,"}"),"$.monthSales"),
    get_json_object(concat("{",t.infos,"}"),"$.score")
    from
    (
    select explode(split(regexp_replace(sale_info,'[{|}]',''),"},{")) as infos from explode_test
    )t

    7fresh	4900	9.9
    jdmart	7900	5.9
    yam	54900	4.9
    7fresh	4900	9.9
    jdmart	7900	5.9
    yam	54900	4.9
    

    扩展,如果显示area,如何处理,可以通过udtf即lateral view进行实现

    select
    area,
    get_json_object(concat("{",infos,"}"),"$.source"),
    get_json_object(concat("{",infos,"}"),"$.monthSales"),
    get_json_object(concat("{",infos,"}"),"$.score")
    from explode_test lateral view explode(split(regexp_replace(sale_info,'[{|}]',''),"},{")) g as infos

    
    
    针对hive一些配置说明
    1、当select * from xxx,不能显示列名的时候,可以到hive-site.xml里面添加
       <property>
           <name>hive.cli.print.current.db</name>
           <value>true</value>
           </property>
       <property>
           <name>hive.cli.print.header</name>
           <value>true</value>
       </property>
    或者进入hive之后 
       – set hive.cli.print.current.db=true;
       – set hive.cli.print.header=true;
  • 相关阅读:
    第7次实践作业 25组
    第6次实践作业 25组
    第5次实践作业
    第4次实践作业
    第3次实践作业
    第2次实践作业
    第1次实践作业
    软工实践个人总结
    2019 SDN大作业
    C语言Il作业01
  • 原文地址:https://www.cnblogs.com/sjkzy/p/15092253.html
Copyright © 2011-2022 走看看