zoukankan      html  css  js  c++  java
  • Mac 下用IDEA时maven,ant打包 (mr 入库hbase)

    现在非常喜欢IDEA,之前在mac 上用的eclipse 经常出现无缘无故的错误。所以转为IDEA.  不过新工具需要学习成本,手头上的项目就遇到了很多问题,现列举如下:

    背景描述

    在hadoop 开发时,经常在mr阶段将清洗后的数据入库到Hbase. 在这个过程中,需要编译、打jar包,然后上传到服务器,执行hadoop jar   *.jar 命令。每次清洗后需要手动4步操作。农民阿姨天生喜欢取巧,故这几天一直研究如何简化此过程。

    思路描述

    1.之前项目自动化打包上传都用ant ,不过是在window系统下eclipse开发的。但是在mac的IDEA中,屡次失败。总是出现如下错误信息

    Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
            at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:287)
            at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:240)
            at java.util.jar.JarVerifier.processEntry(JarVerifier.java:317)
            at java.util.jar.JarVerifier.update(JarVerifier.java:228)
            at java.util.jar.JarFile.initializeVerifier(JarFile.java:348)
            at java.util.jar.JarFile.getInputStream(JarFile.java:415)
            at org.apache.hadoop.util.RunJar.unJar(RunJar.java:101)
            at org.apache.hadoop.util.RunJar.unJar(RunJar.java:81)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:209)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    

    网上搜了很多资料,解决方案都是失败。原因是 项目中引用的jar包签名有问题。(之前用eclipse导出jar包没有出现过这种情况)

    2.用IDEA导出jar包,操作步骤如下
    Fiel –>Project Structure

    屏幕快照 2015-11-21 20.05.50

    然后出现如下:

    D4~H}M`2J8[PFV_A{WOKM8W

     

    选好Manifest File ,Main Class

     

    之后Build->Build Artifacts 可以编译打包了。去项目中 out 文件下可以找到。

     

    (3Q4VBBJIYZ1YE_1%8OP4P0

    但是通过这种打包方式,在运行时报错。因为hbase的lib相关包没有包含进去。这个一种笨的解决方案是在hadoop 中每个节点上的lib包下都拷贝一份hbase的lib包。这种方案有弊端,我没有选择。如何解决这个问题?我用maven 打包,上传jar包,运行,成功。

    最终方案:

    通过maven可以正常打包,那如何自动化上传并运行呢?我是用maven 打包,用ant 上传并执行的。

    maven  pom 文件内容:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>groupId</groupId>
        <artifactId>antdemo</artifactId>
        <version>1</version>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase-client</artifactId>
                <version>0.98.15-hadoop2</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>2.6.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-core</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase-common</artifactId>
                <version>0.98.15-hadoop2</version>
            </dependency>
    
    
    
            <dependency>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase-server</artifactId>
                <version>0.98.15-hadoop2</version>
            </dependency>
    
            <dependency>
                <groupId>com.jcraft</groupId>
                <artifactId>jsch</artifactId>
                <version>0.1.51</version>
            </dependency>
            <dependency>
                <groupId>org.apache.ant</groupId>
                <artifactId>ant-jsch</artifactId>
                <version>1.9.5</version>
            </dependency>
    
        </dependencies>
        <build>
            <plugins>
                <plugin>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <configuration>
                        <archive>
                            <manifest>
                                <!--这里要替换成jar包main方法所在类-->
                                <mainClass>HBaseImport</mainClass>
                            </manifest>
                        </archive>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id> <!-- this is used for inheritance merges -->
                            <phase>package</phase> <!-- 指定在打包节点执行jar包合并操作 -->
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
    </project>
    

    ant 中的 build.xml的内容:

    <?xml version="1.0" encoding="UTF-8"?>
    
    <project name="项目名称" basedir="." default="sshexec">
       <description>本配置文件供ANT编译项目、自动进行单元测试、打包并部署之用。</description>
       <description>默认操作(输入命令:ant)为编译源程序并发布运行。</description>
    
       <!--属性设置-->
       <property environment="env" />
       <!--<property file="build.xml" />-->
       <property name="build.dir" location="build.xml"/>
       <property name="src.dir" value="${basedir}/src" />
       <!--<property name="java.lib.dir" location="lib"/>-->
       <property name="java.lib.dir" value="/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/lib" />
       <property name="classes.dir" value="${basedir}/classes" />
       <property name="dist.dir" value="${basedir}/dist" />
       <property name="third.lib.dir" value="/Users/zzy/Kclouds/01_mac/doc/workspace/other_workspace/hbase-0.98.8-hadoop2/lib" />
       <!--<property name="third.lib.dir" value="{basedir}/lib" />-->
    
       <property name="localpath.dir" value="${basedir}" />
       <property name="remote.host" value="192.168.122.211"/>
       <property name="remote.username" value="root"/>
       <property name="remote.password" value="lg"/>
       <property name="remote.home" value="/test"/>
       <!--每次需要知道的main类,写到这里-->
       <property name="main.class" value="HBaseImport"/>
    
       <!-- 基本编译路径设置 -->
    
    
       <path id="compile.classpath">
          <fileset dir="${java.lib.dir}">
             <include name="tools.jar" />
          </fileset>
          <fileset dir="${third.lib.dir}">
             <include name="*.jar"/>
          </fileset>
    
       </path>
       
       <!-- 运行路径设置 -->
       <path id="run.classpath">
          <path refid="compile.classpath" />
          <pathelement location="${classes.dir}" />
       </path>
       <!-- 清理,删除临时目录 -->
       <target name="clean" description="清理,删除临时目录">
          <!--delete dir="${build.dir}" /-->
          <delete dir="${dist.dir}" />
          <delete dir="${classes.dir}" />
          <echo level="info">清理完毕</echo>
    
    
       </target>
       <!-- 初始化,建立目录,复制文件 -->
       <target name="init" depends="clean" description="初始化,建立目录,复制文件">
          <mkdir dir="${classes.dir}" />
          <mkdir dir="${dist.dir}" />
       </target>
       <!-- 编译源文件-->
       <target name="compile" depends="init" description="编译源文件">
          <javac srcdir="${src.dir}" destdir="${classes.dir}" source="1.7" target="1.7"  includeAntRuntime="false" debug="false" verbose="false">
              <compilerarg line="-encoding UTF-8 "/> 
             <classpath refid="compile.classpath" />
          </javac>
       </target>
       <!-- 打包类文件 -->
       <target name="jar" depends="compile" description="打包类文件">
          <jar jarfile="${dist.dir}/jar.jar">
             <fileset dir="${classes.dir}" includes="**/*.*" />
          </jar>
       </target>
       <!--上传到服务器
       **需要把lib目录下的jsch-0.1.51拷贝到$ANT_HOME/lib下,如果是Eclipse下的Ant环境必须在Window->Preferences->Ant->Runtime->Classpath中加入jsch-0.1.51。
       -->
       <!--<target name="ssh" depends="jar">-->
          <!--<scp file="${dist.dir}/jar.jar" todir="${remote.username}@${remote.host}:${remote.home}" password="${remote.password}" trust="true"/>-->
       <!--</target>-->
       <!-- -->
       <!--<target name="sshexec" depends="ssh">-->
             <!--<sshexec host="${remote.host}" username="${remote.username}"  password="${remote.password}" trust="true" command="source /etc/profile;hadoop jar ${remote.home}/jar.jar ${main.class}"/>-->
             <!--&lt;!&ndash;<sshexec host="${remote.host}" username="${remote.username}"  password="${remote.password}" trust="true" command="source /etc/profile;touch  what.text ${main.class}"/>&ndash;&gt;-->
       <!--</target>-->
    
    
       <target name="ssh" depends="jar">
          <scp file="${basedir}/target/antdemo-1-jar-with-dependencies.jar" todir="${remote.username}@${remote.host}:${remote.home}" password="${remote.password}" trust="true"/>
       </target>
    
       <target name="sshexec" depends="ssh">
          <sshexec host="${remote.host}" username="${remote.username}"  password="${remote.password}" trust="true" command="source /etc/profile;hadoop jar ${remote.home}/antdemo-1-jar-with-dependencies.jar ${main.class}"/>
          <!--<sshexec host="${remote.host}" username="${remote.username}"  password="${remote.password}" trust="true" command="source /etc/profile;touch  what.text ${main.class}"/>-->
       </target>
    
    </project>
     
     
    最关键的是:
    <target name="ssh" depends="jar">
          <scp file="${basedir}/target/antdemo-1-jar-with-dependencies.jar" todir="${remote.username}@${remote.host}:${remote.home}" password="${remote.password}" trust="true"/>
       </target>
    
       <target name="sshexec" depends="ssh">
          <sshexec host="${remote.host}" username="${remote.username}"  password="${remote.password}" trust="true" command="source /etc/profile;hadoop jar ${remote.home}/antdemo-1-jar-with-dependencies.jar ${main.class}"/>
          <!--<sshexec host="${remote.host}" username="${remote.username}"  password="${remote.password}" trust="true" command="source /etc/profile;touch  what.text ${main.class}"/>-->
       </target>
    
    

    这步是上传并执行,好好研究一下这步代码。

     

    MR清洗后入库Hbase 代码:

     

    package demo;
    
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.client.Put;
    import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
    import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
    import org.apache.hadoop.hbase.mapreduce.TableReducer;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    //import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    
    
    import java.io.IOException;
    import java.text.SimpleDateFormat;
    import java.util.Date;
    
    
    /**
     * Created by zzy on 15/11/17.
     */
    
    /**
     * mr 中操作hbase
     */
    public class HBaseImport {
        static class BatchImportMapper extends Mapper<LongWritable,Text,LongWritable,Text>{
            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                super.map(key, value, context);
                String line = value.toString();
                String[] splited = line.split("	");
                SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");
                String time =sdf.format(new Date(Long.parseLong(splited[0].trim())));
    
                String rowkey = splited[1]+"_"+time;
                Text v2s = new Text();
                v2s.set(rowkey+"	" +line);
                context.write(key,v2s);
    
            }
        }
        static class BatchImportReducer extends TableReducer<LongWritable,Text,NullWritable> {
            private byte[] family = "cf".getBytes();
    
            @Override
            protected void reduce(LongWritable key, Iterable<Text> v2s, Context context) throws IOException, InterruptedException {
    //            super.reduce(key, values, context);
                for (Text v2 :v2s){
                    String [] splited = v2.toString().split("	");
                    String rowKey = splited[0];
                    Put put =  new Put(rowKey.getBytes());
                    put.add(family,"raw".getBytes(),v2.toString().getBytes());
                    put.add(family,"reportTime".getBytes(),splited[1].getBytes());
                    put.add(family,"msisdn".getBytes(),splited[2].getBytes());
                    put.add(family,"apmac".getBytes(),splited[3].getBytes());
                    put.add(family,"acmac".getBytes(),splited[4].getBytes());
                }
            }
        }
        private static final String tableName = "logs";
        private  static  void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
            Configuration conf = new Configuration();
            conf.set("hbase.zookeeper.quorum","192.168.122.213:2181");
            conf.set("hbase.rootdir","hdfs://192.168.122.211:9000/hbase");
            conf.set(TableOutputFormat.OUTPUT_TABLE, tableName);
    
            Job job = Job.getInstance(conf,HBaseImport.class.getSimpleName());
    
            TableMapReduceUtil.addDependencyJars(job);
            job.setJarByClass(HBaseImport.class);
            job.setMapperClass(BatchImportMapper.class);
            job.setReducerClass(BatchImportReducer.class);
    
            job.setMapOutputKeyClass(LongWritable.class);
            job.setMapOutputValueClass(Text.class);
    
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(TableOutputFormat.class);
    
            FileInputFormat.setInputPaths(job, "hdfs://192.168.122.211:9000/user/hbase");
            job.waitForCompletion(true);
    //        FileInputFormat.setInputPaths(job,"");
    
        }
    }
    

    本文章如何对阁下有帮助,希望您高举贵手,支持一下。

  • 相关阅读:
    CAP.dll of dotnetcore
    GRPC 高性能 RPC 框架, 服务间通讯
    REST Client
    ERP Export
    ERP Update DataSource
    knockout js
    面试题目总结
    JavaScript Objiects and Prototypes
    PLS-00172: string literal too long
    字符串连接性能
  • 原文地址:https://www.cnblogs.com/chaoren399/p/4984609.html
Copyright © 2011-2022 走看看