zoukankan html css js c++ java

Hive中自定义Map/Reduce示例 In Java

Hive支持自定义map与reduce script。接下来我用一个简单的wordcount例子加以说明。

如果自己使用Java开发，需要处理System.in,System,out以及key/value的各种逻辑，比较麻烦。有人开发了一个小框架，可以让我们使用与Hadoop中map与reduce相似的写法，只关注map与reduce即可。如今此框架已经集成在Hive中，就是$HIVE_HOME/lib/hive-contrib-2.3.0.jar，hive版本不同，对应的contrib名字可能不同。

开发工具：intellij
JDK:jdk1.7
hive:2.3.0
hadoop:2.8.1

一、开发map与reduce

“map类
public class WordCountMap {
    public static void main(String args[]) throws Exception{
        new GenericMR().map(System.in, System.out, new Mapper() {
            @Override
            public void map(String[] strings, Output output) throws Exception {
                for(String str:strings){
                    String[] strs=str.split("\W+");//如果源文本文件是以	分隔的，则不需要再拆分，传入的strings就是每行拆分好的单词
                    for(String str_2:strs) {
                        output.collect(new String[]{str_2, "1"});
                    }
                }
            }
        });
    }
}
"reduce类
public class WordCountReducer {
    public static void main(String args[]) throws Exception{
        new GenericMR().reduce(System.in, System.out, new Reducer() {
            @Override
            public void reduce(String s, Iterator<String[]> iterator, Output output) throws Exception {
                int sum=0;
                while(iterator.hasNext()){
                    Integer count=Integer.valueOf(iterator.next()[1]);
                    sum+=count;
                }
                output.collect(new String[]{s,String.valueOf(sum)});
            }
        });
    }
}

二、导出jar包

然后导出Jar包(包含hive-contrib-2.3.0)，假如导出jar包名为wordcount.jar

File->Project Structure

add Artifacts

不用填写Main Class,直接点击OK

jar包配置

生成jar包

三、编写hive sql

drop table if exists raw_lines;

-- create table raw_line, and read all the lines in '/user/inputs', this is the path on your local HDFS
create external table if not exists raw_lines(line string)
ROW FORMAT DELIMITED
stored as textfile
location '/user/inputs';

drop table if exists word_count;

-- create table word_count, this is the output table which will be put in '/user/outputs' as a text file, this is the path on your local HDFS

create external table if not exists word_count(word string, count int)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '	'
 lines terminated by '
' STORED AS TEXTFILE LOCATION '/user/outputs/';


-- add the mapper&reducer scripts as resources, please change your/local/path
--must use "add file",not "add jar",or,hive won't find map and reduce main class
add file your/local/path/wordcount.jar;

from (
        from raw_lines
        map raw_lines.line
        --call the mapper here
        using 'java -cp wordcount.jar WordCountMap'
        as word, count
        cluster by word) map_output
insert overwrite table word_count
reduce map_output.word, map_output.count
--call the reducer here
using 'java -cp wordcount.jar WordCountReducer'
as word,count;

此hive sql保存为wordcount.hql

四、执行hive sql

beeline -u [hiveserver] -n username -f wordcount.hql

简单说下Hive的自定义map与reduce内部原理：
hive读取文本文件，然后将其一行行输入系统标准输入中，用户自定义的Map读取标准输入流中数据，一行行处理，然后将其按照一定格式(例如:"key value")输出到标准输出流中，然后hive会将输出的字符串进行排序，然后再送到标准输入流中，Reduce再从标准输入流中读取数据进行相应处理，处理完成后，再送到标准输出流中，Hive再对Reduce结果进行处理存入表中。

查看全文

相关阅读:
libv4l 库【转】
Leap Motion颠覆操控体验的超精致手势追踪技术【转】
嵌入式Linux下Camera编程--V4L2【转】
C语言高级应用---操作linux下V4L2摄像头应用程序【转】
通过摄像头设备采集一帧数据的例子程序（完整版）【转】
V4L2 camera 驱动 capture测试程序【转】
v4l2 spec 中文 Ch01【转】
Video for Linux Two API Specification Revision 2.6.32【转】
Video for Linux Two API Specification revision0.24【转】
OpenCV实践之路——人脸检测（C++/Python) 【转】

原文地址：https://www.cnblogs.com/mycodingworld/p/hive_mapred_java.html