zoukankan      html  css  js  c++  java
  • elephant-bird学习笔记

    elephant-bird是Twitter的开源项目,项目的地址为 https://github.com/twitter/elephant-bird

    该项目是Twitter为LZO,thrift,protocol buffer相关的hadoop InputFormats, OutputFormats, Writables, Pig加载函数, Hive SerDe, HBase二级索引等编写的库

    mvn clean install -U -Dprotobuf.version=2.5.0 -DskipTests=true
    

    mvn package的时候需要签名

    gpg --gen-key
    

    以及需要安装apache Thrift和Protocol Buffers

    使用elephant-bird来建hive表的类型对应关系

    CREATE EXTERNAL TABLE `xxxx`(
    	  `ts` string COMMENT 'from deserializer', 
    	  `schema` string COMMENT 'from deserializer', 
    	  `test_string` string COMMENT 'from deserializer', 
    	  `test_long` bigint COMMENT 'from deserializer', 
    	  `test_int` int COMMENT 'from deserializer', 
    	  `test_short` smallint COMMENT 'from deserializer', 
    	  `test_double` double COMMENT 'from deserializer', 
    	  `test_byte` tinyint COMMENT 'from deserializer', 
    	  `test_bool` boolean COMMENT 'from deserializer', 
    	  `test_list` array<string> COMMENT 'from deserializer', 
    	  `test_set` array<bigint> COMMENT 'from deserializer', 
    	  `test_map` map<string,int> COMMENT 'from deserializer')
    	COMMENT 'test_all_type'
    	PARTITIONED BY ( 
    	  `ds` string COMMENT '日期分区')
    	ROW FORMAT SERDE 
    	  'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' 
    	WITH SERDEPROPERTIES ( 
    	  'serialization.class'='com.xxx.xxx.xxx', 
    	  'serialization.format'='org.apache.thrift.protocol.TCompactProtocol') 
    	STORED AS INPUTFORMAT 
    	  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
    	OUTPUTFORMAT 
    	  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
    	LOCATION
    	  'hdfs://xxxxxxx'
    	TBLPROPERTIES (
    
  • 相关阅读:
    docker数据卷
    docker容器的启动、停止、运行、导入、导出、删除
    docker镜像的获取、创建、修改、删除、导入操作
    docker使用-spark安装
    python爬虫-3 解析库
    python爬虫-2 requests使用
    NLP-HMM
    NLP-中文分词-预处理
    python爬虫-1环境安装
    学习笔记3
  • 原文地址:https://www.cnblogs.com/tonglin0325/p/9636641.html
Copyright © 2011-2022 走看看