zoukankan      html  css  js  c++  java
  • Hiv 语法

    HIV语法

    • 1、创建数据库
      create database hive_db;

      显示数据库中表

      show tables;

      切换数据库

      use hive_db;

      显示数据库中

      show databases;

      显示某表结构,两种方式 1:表结构 2:表结构和分区数据位置等

      desc 表名;
      show create table 表名

    • 2、创建表脚本
      2.1 内部表-直接建表
      create table IF NOT EXISTS orders
      (
      order_id string comment "订单编号",
      user_id string comment "用户ID",
      order_number string comment "下单顺序",
      order_dow string comment "下单日期周一到周日",
      order_hour_of_day string comment "下单时间",
      days_since_prior_order string comment "距离上一次购物时间"
      )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY ' '
      STORED AS TEXTFILE;

      create table IF NOT EXISTS order_product(
       order_id string comment "订单编号",
       product_id string comment "物品ID",
       add_to_cart_order string comment "加入购物车时间",
       reordered string comment "是否复购"
      )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '
      '
      STORED AS TEXTFILE;
      

      2.1 查询建表
      #场景:建立一个临时表,或者一个中间表
      create table movies_tem as select * from movies limit 100;
      2.2 like 建表
      CREATE TABLE IF NOT EXISTS default.weblog_20150923
      LIKE default.weblog ;

    • 3、导入数据
      3.1 本地导入
      load data local inpath ‘/user/root/custom.csv’ overwrite into table 表明
      #用overwrite 加载本地数据到hive数据仓库
      #local:加上local指本地的数据路径,也就是在linux系统下的文件路径
      #不加local:指文件在hdfs下的路径,文件上传到hdfs后的路径

    • 4、collect_list和collect_set
      它们都是将分组中的某列转为一个数组返回,不同的是collect_list不去重而collect_set去重。
      https://www.cnblogs.com/cc11001100/p/9043946.html

    • 5、split分割和数组转行
      select split("I Love you", " ")
      select explode(split("I Love you", " "));

    • 6、row_number()、rank()、dense_rank()三个函数区别

    • 7、时间窗口
      row_number() over()
      sum() over()

      从最早的时间距你当前的时间

      select *, sum(result) over (partition by user_name order by create_time) as result_sum
      from user_match_temp

      从你当前时间到当前时间的前三条数据,不包括本条数据

      select *, avg(result) over (partition by user_name order by create_time rows between 3 preceding and current row) as recenty_wins
      from user_match_temp

    • 8、case when 条件表达式 then 表达式为true返回值 else 表达式为false返回值 end
      select uid,iid,score,case when score<=1 then '0-1' when score>1 and score<=3 then '1-3' when score>3 then '3-5' else '-1' end as
      score_rank from movies limit 30;

    • 9、if(条件表达式,表达式为true返回值,表达式为false返回值)

    • 10、自定义函数(UDFUDAFUDTF)

    • 11、concat 拼接
      select concat("{",'aaa',"}");

    • 12、regexp_replace #正在表达式
      select regexp_replace("'course')}", "^W+|W+$","")

    • 13、 针对json格式进行解析处理并转换成多行处理

    a:shangdong,b:beijing,c:shanghai	1,2,3	[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"}]
    a:tianjing,b:beijing,c:shanghai	3,4,5	[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"}]
    
    
    #去掉开头和结尾的[{、}]
    select regexp_replace(sale_info,'\[\{|\}\]','') from explode_test
    

    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"
    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"},{"source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"

    # 拆分并转成多行
    select explode(split(regexp_replace(sale_info,'\[\{|\}\]',''),"\},\{")) from explode_test
    ~~~显示结果
    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"
    "source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"
    "source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"
    "source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"
    "source":"jdmart","monthSales":7900,"userCount":2900,"score":"5.9"
    "source":"yam","monthSales":54900,"userCount":12900,"score":"4.9"
    

    通过get_json_object函数转换成json并获取json中属性

    select
    get_json_object(concat("{",t.infos,"}"),"$.source"),
    get_json_object(concat("{",t.infos,"}"),"$.monthSales"),
    get_json_object(concat("{",t.infos,"}"),"$.score")
    from
    (
    select explode(split(regexp_replace(sale_info,'[{|}]',''),"},{")) as infos from explode_test
    )t

    7fresh	4900	9.9
    jdmart	7900	5.9
    yam	54900	4.9
    7fresh	4900	9.9
    jdmart	7900	5.9
    yam	54900	4.9
    

    扩展,如果显示area,如何处理,可以通过udtf即lateral view进行实现

    select
    area,
    get_json_object(concat("{",infos,"}"),"$.source"),
    get_json_object(concat("{",infos,"}"),"$.monthSales"),
    get_json_object(concat("{",infos,"}"),"$.score")
    from explode_test lateral view explode(split(regexp_replace(sale_info,'[{|}]',''),"},{")) g as infos

    
    
    针对hive一些配置说明
    1、当select * from xxx,不能显示列名的时候,可以到hive-site.xml里面添加
       <property>
           <name>hive.cli.print.current.db</name>
           <value>true</value>
           </property>
       <property>
           <name>hive.cli.print.header</name>
           <value>true</value>
       </property>
    或者进入hive之后 
       – set hive.cli.print.current.db=true;
       – set hive.cli.print.header=true;
  • 相关阅读:
    几个新角色:数据科学家、数据分析师、数据(算法)工程师
    人类投资经理再也无法击败电脑的时代终将到来了...
    Action Results in Web API 2
    Multiple actions were found that match the request in Web Api
    Routing in ASP.NET Web API
    how to create an asp.net web api project in visual studio 2017
    网站漏洞扫描工具
    How does asp.net web api work?
    asp.net web api history and how does it work?
    What is the difference between a web API and a web service?
  • 原文地址:https://www.cnblogs.com/sjkzy/p/15092253.html
Copyright © 2011-2022 走看看