zoukankan      html  css  js  c++  java
  • hive入门

    hive
    • 当前用到的就这些,以后用到的再补充。
    • 参考自官方文档
    • 大小写不敏感
    创建/删除数据库
    CREATE/DROP DATABASE|SCHEMA [IF NOT EXISTS] <database name>
    SHOW DATABASES;  
    SHOW TABLES;  # 与SQL语法都差不多
    
    创建表(例子来源于官方文档
    # 语法
    CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
    
    [(col_name data_type [COMMENT col_comment], ...)]
    [COMMENT table_comment]
    [ROW FORMAT row_format]
    [STORED AS file_format]
    
    # 示例
    CREATE TABLE page_view(viewTime INT, userid BIGINT,
                    page_url STRING, referrer_url STRING,
                    ip STRING COMMENT 'IP Address of the User')
    COMMENT 'This is the page view table'
    PARTITIONED BY(dt STRING, country STRING) 
    ROW FORMAT DELIMITED
            FIELDS TERMINATED BY '44'
            LINES TERMINATED BY '
    '
    STORED AS SEQUENCEFILE;
    
    根据已有表建新表(复制表结构)
    create table new_table_name like old_table_name;
    

    注意: stored as 后面默认为textfile,此种类型不可对其进行分片(split your //file into chunks/blocks)进行并行map操作,即降低了效率。见文档

    显示表结构
    desc table_name;
    show columns from table_name;
    show create table table_name;
    
    hive> desc xxx_log;                                                                                   
    OK                                                                                                                                                                           
    origin                  int                                                                           
    url                     string                                                                        
    host                    string                                                                        
    Time taken: 0.408 seconds, Fetched: 3 row(s)                                                          
    
    hive> show columns from xxx_log;                                                                      
    OK                                                                                                                                                                                                    
    origin                                                                                                
    url                                                                                                   
    host                                                                                                  
    Time taken: 0.208 seconds, Fetched: 3 row(s)
    
    hive> show create table url_log;
    OK
    CREATE TABLE `url_log`(
      `origin` int,
      `url` string,
      `host` string)
    ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      LINES TERMINATED BY '
    '
    STORED AS INPUTFORMAT
      'org.apache.hadoop.mapred.SequenceFileInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
    LOCATION
      'hdfs://taishan/apps/hive/warehouse/sml_wswang.db/url_log'
    TBLPROPERTIES (
      'COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true"}',
      'numFiles'='0',
      'numRows'='0',
      'rawDataSize'='0',
      'totalSize'='0',
      'transient_lastDdlTime'='1502700823')
    Time taken: 0.2 seconds, Fetched: 24 row(s)
    
    更改表名
    alter table table_name rename to new_table_name;
    
    删除表
    ALTER TABLE name RENAME TO new_name
    ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
    ALTER TABLE name DROP [COLUMN] column_name
    ALTER TABLE name CHANGE column_name new_name new_type
    ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
    
    DROP TABLE [IF EXISTS] table_name;
    
    从文件导入数据
    # 语法
    LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
    

    ​ 路径可以是相对/绝对路径,也可以是hdfs上文件,如果路径是目录,则导入目录下全部文件,但是目录下面不能有子目录,示例如下:

    LOAD DATA LOCAL INPTATH '/home/test/hi.txt' OVERWRITE INTO TABLE test
    
    LOAD DATA INPATH '/hdfs_home/hh.txt' INTO TABLE test
    

    ​ 可以导入文档,也可以导入Gzip/Bzip2格式的压缩文件。

    插入数据
    # Standard Syntax:
    INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
     
    Where values_row is:
    ( value [, value ...] )
    where a value is either null or any valid SQL literal
    
    
    # Examples
    CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
      CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
     
    INSERT INTO TABLE students
      VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
     
     
    CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came_from STRING)
      PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
     
    INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
      VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);
     
    INSERT INTO TABLE pageviews PARTITION (datestamp)
      VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
    
    查询结果插入表
    insert into table test_sq select * from test_text;
    
    查询结果保存到文件
    Standard syntax:
    INSERT OVERWRITE [LOCAL] DIRECTORY directory1
      [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
      SELECT ... FROM ...
     
    Hive extension (multiple inserts):
    FROM from_statement
    INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
    [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
     
     
    row_format
      : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
            [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
            [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
            
    # 保存到文件(保存到hdfs中不需要加local)
    insert overwrite local directory "/tmp/out/"                                          
    select user, login_time from user_login;
    # overwrite会删除目录下的其他文件,即覆盖
    hive> insert overwrite directory "/tmp/out/"  
        > row format delimited fields terminated by "	"   
        > select user, login_time from user_login;
    
    删除/更新数据
    DELETE FROM tablename [WHERE expression]
    UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
    
    example
    # 一个综合点的例子
    select url, count(url) as nums from url_log group by url order by nums desc limit 10;
    
    # count用法
    SELECT
        type
      , count(*)
      , count(DISTINCT u)
      , count(CASE WHEN plat=1 THEN u ELSE NULL END)
      , count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL END)
      , count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
      , count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL END)
    FROM
        t
    WHERE
        dt in ("2012-1-12-02", "2012-1-12-03")
    GROUP BY
        type
    ORDER BY
        type
    ;
    
    hive切换队列

    参考这里:

    有三种:

    set mapred.job.queue.name=queue3;
    SET mapreduce.job.queuename=queue3;
    set mapred.queue.names=queue3;
    

    老版本一般 mapred开头
    新版本是mapreduce开头
    老版本对应的新版本参数可以查出来

  • 相关阅读:
    PHP数组处理总结
    设计模式之-工厂模式理解
    我的世界观
    编程入门
    2019 新的一年
    placeholder 不支持ie8
    2018年8月20日
    HttpClientUtil
    通用mapper
    small_demo
  • 原文地址:https://www.cnblogs.com/wswang/p/7718115.html
Copyright © 2011-2022 走看看