zoukankan      html  css  js  c++  java
  • 大数据之sparksql常用函数

    1、桥接+子查询+交叉连接

    select

      p.project_name as project_name

    from

      ods_pms_t_project_work_1217 w

      left join ods_pms_t_project_user_1217 u on w.user_id = u.id

      and w.work_hour > 0

      left join (

        select

          t1.project_id as project_id,

          t1.project_name as project_name

        from

          ods_pms_t_project_1217 t1

      ) p on p.project_id = u.project_id

    cross join ods_pms_t_project_user_1217 u1 on 1=1

    2、when ... case语句

     SELECT CASE t1.STATUS WHEN 'Y' THEN t1.STATUS ELSE t2.city END

    两种写法:

    SELECT CASE name WHEN 'shanmin' THEN name when 'zhourong' then concat(name, '-',address) ELSE address END  as test111111 from liuyue_test;

    SELECT CASE  WHEN name='shanmin' THEN name when name='zhourong' then concat(name, '-',address) ELSE address END  as test111111 from liuyue_test;

    3、split语句

    4、多级子查询

    select t3.id from  (

    select t2.id, t2.name,t2.address, t2.band, t2.sex, t2.number, t2.phone_num from  (

    select t1.id, t1.name,t1.address, t1.band, t1.sex, t1.number, t1.phone_num from  

    (select * from ods_alm_liuyue_test3) t1) t2) t3;

    select  t1.name as name2 from  ods_alm_liuyue_test3  cross join (select  t0.sex from  ods_alm_liuyue_test3 t0 where  t0.name = 'Zhangsan' ) t1 on 1 = 1

    4、join、cross join、inner join、full join是一样的?

    5、cast语句:

    select cast(cast(t1.id as decimal) as string) as typecast from ods_alm_liuyue_0329_1 t1

    8、在会话中设置用户变量:

    select @var_name :=  task_id from de_task_conf where task_name='acc_edw_dim_test_inre_slowly_051404';

    之后在当前会话中就能引用var_name

    9、往spark表中插入数据,需要mr账号执行权限:

    set hive.exec.dynamic.partition.mode=nonstrict;

    insert into   `zxvmax.edw_dim_test_incre_slowly_0816_backup` PARTITION (`p_date`) values('10000000012','10000000012','1012','zhourong_prod','Y','2019-07-16 10:35:26',

    '2019-07-16 10:35:26','ods_alm_liuyue_test_0716','1900-01-01','3010-12-31','2019-07-16');

    10、sparksql中使用时间、日期函数,及日期函数加减运算:

    select now(),current_timestamp(),date_sub(current_date(),1),date_add(current_date(),1);

    11、spark任务使用split函数会运行失败

    可以用substr()、concat()替换。

    12、在代码编辑中使用系统变量的日期运算

     ‘$cur_date#-1$’ $[date]-1

    13、查看建表语句、表结构

    desc table zxvmax.edw_test_change_part_09104;

    Show create table zxvmax.edw_test_change_part_09104;

    14、insert语句

    MaxCompute的insert语法与通常使用的MySQL或Oracle的insert语法有差别,在insert overwrite/into后需要加入table关键字,而非直接使用tablename。insert into与insert overwrite的区别是:insert into会向表或表的分区中追加数据,而insert overwrite会在向表或分区中插入数据前清空表中的原有数据。

    15、lead()函数

    LEAD(scd_start_date,n=1,default=NULL) OVER ( PARTITION BY id  ORDER BY TO_DATE(scd_start_date)) AS scd_end_date

    表示将下一行的scd_start_date的值,作为当前行的scd_end_date值,根据id分组,按scd_start_date正序排列

    16、Lag()函数

    lag(scd_start_date,n=1,default=NULL) OVER ( PARTITION BY id  ORDER BY TO_DATE(scd_start_date)) AS scd_end_date

    表示将上一行的scd_start_date的值,作为当前行的scd_end_date值,根据id分组,按scd_start_date正序排列

    17、Row_num()函数

    row_number() OVER (PARTITION BY COL1 ORDER BY COL2) as order_num 表示根据COL1分组,在分组内部根据 COL2排序,而此函数计算的值就表示每组内部排序后的顺序编号(组内连续的唯一的)

    18、Explain语句

    explain select * from ods_alm_liuyue_test_0905

    19、Compute语句

    20、compute stats ods_alm_liuyue_test_0905

  • 相关阅读:
    PHP简单工厂模式、工厂方法模式和抽象工厂模式
    PHP的HashTable实现
    理解Hash
    PHP中Array的hash函数实现
    R-FCN论文讲解(转载链接)
    目标检测的发展历程
    K-means算法
    pytorch0.4.1安装
    反卷积(deconvolution)
    faster-rcnn自己的理解总结(包括它的前世今身R-CNN和fast R-CNN)
  • 原文地址:https://www.cnblogs.com/yahutiaotiao/p/12631807.html
Copyright © 2011-2022 走看看