zoukankan      html  css  js  c++  java
  • Hive学习笔记

    基本概念

    来一段官网的基本解释
    The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax.
    Built on top of Apache Hadoop™, Hive provides the following features:

    • Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/zzzform/load (ETL), reporting, and data analysis.
    • A mechanism to impose structure on a variety of data formats
    • Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
    • Query execution via Apache Tez™, Apache Spark™, or MapReduce
    • Procedural language with HPL-SQL
    • Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.
      Hive定义为基于Hadoop的分布式数据仓库,提供基于SQL的数据读写和数据管理管理功能,它的特色包括:支持数据的ETL、构建数据体系、基于HDFS/HBase等数据存储系统。Hive基于HDFS来做文件存储,需要使用Map-Reduce计算框架。

    基本使用方法

    1. #如果表xxx存在删除,为创建xxx做准备
    2. drop table if exists xxx;
    3. #创建表xxx,包含15个字段及其类型,并创建分区列index
    4. CREATE TABLE xxx(yyy_attr INT, yyy_bin STRING, iss_mmm_id_cd STRING, pri_acct_no_conv INT, zzz_chnl STRING, zzz_curr_cd INT, zzz_id STRING, settle_dt INT, acpt_mmm_id_cd STRING, yyy_accptr_nm_addr STRING, mchnt_cd STRING, mchnt_tp STRING, term_id STRING, count STRING) COMMENT 'xxx details' PARTITIONED BY (index STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
    5. #给index'0001'的分区导入数据
    6. LOAD DATA INPATH '/user/kejun.he/input/acquire_1.txt' OVERWRITE INTO TABLE xxx PARTITION (index='0001');
    7. #给index'0002'的分区导入数据
    8. LOAD DATA LOCAL INPATH '/home/kejun.he/hive/acquire_2.txt' OVERWRITE INTO TABLE xxx PARTITION (index='0002');
    9. #查询,并检查order by 和limit用法
    10. select a.yyy_accptr_nm_addr from xxx a where a.index='0001' order by a.yyy_accptr_nm_addr limit 20;
    11. #查询,并检查两个分区的查询,速度很慢
    12. select a.yyy_accptr_nm_addr from xxx a order by a.yyy_accptr_nm_addr limit 20;
    13. #根据查询结果建表
    14. CREATE TABLE mmmtitue AS select distinct iss_mmm_id_cSELEd from xxx where index='0001';
    15. #修改表名
    16. ALTER TABLE mmmtitue RENAME TO mmmtitute;
    17. #inner join
    18. select B.iss_mmm_id_cd ,A.yyy_bin from xxx A JOIN mmmtitute B ON (A.iss_mmm_id_cd=B.iss_mmm_id_cd) where A.index='0002';
    19. #left join
    20. select count(A.yyy_bin) from xxx A LEFT OUTER JOIN mmmtitute B ON (A.iss_mmm_id_cd=B.iss_mmm_id_cd) where A.index='0002';
    21. #right join
    22. select count(A.yyy_bin) from xxx A RIGHT OUTER JOIN mmmtitute B ON (A.iss_mmm_id_cd=B.iss_mmm_id_cd) where A.index='0002';
    23. #full join
    24. select count(A.yyy_bin) from xxx A FULL OUTER JOIN mmmtitute B ON (A.iss_mmm_id_cd=B.iss_mmm_id_cd) where A.index='0002';

    UDA用法

  • 相关阅读:
    使用异或解题 —— 序列中仅出现一次的两个数
    使用异或解题 —— 序列中仅出现一次的两个数
    希尔排序
    java中接口的定义与实现
    erlang工作前新手学习指引路线
    NAT的全然分析及其UDP穿透的全然解决方式
    hibernate官方新手教程 (转载)
    git与svn的不同
    C#调用GDAL算法进度信息传递
    atitit. access token是什么??微信平台公众号开发access_token and Web session保持状态机制
  • 原文地址:https://www.cnblogs.com/kendrick/p/6698759.html
Copyright © 2011-2022 走看看