zoukankan      html  css  js  c++  java
  • 【原创】大叔经验分享(18)hive2.0以后通过beeline执行sql没有进度信息

    一 问题

    在hive1.2中使用hive或者beeline执行sql都有进度信息,但是升级到hive2.0以后,只有hive执行sql还有进度信息,beeline执行sql完全silence,在等待结果的过程中完全不知道执行到哪了

    1 hive执行sql过程(有进度信息)

    hive> select count(1) from test_table;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = hadoop_20181227162003_bd82e3e2-2736-42b4-b1da-4270ead87e4d
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
    set mapreduce.job.reduces=<number>
    Starting Job = job_1544593827645_22873, Tracking URL = http://rm1:8088/proxy/application_1544593827645_22873/
    Kill Command = /export/App/hadoop-2.6.1/bin/hadoop job -kill job_1544593827645_22873
    2018-12-27 16:20:27,650 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 116.9 sec
    MapReduce Total cumulative CPU time: 1 minutes 56 seconds 900 msec
    Ended Job = job_1544593827645_22873
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 29 Reduce: 1 Cumulative CPU: 116.9 sec HDFS Read: 518497 HDFS Write: 197 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 56 seconds 900 msec
    OK
    104
    Time taken: 24.437 seconds, Fetched: 1 row(s)

    2 beeline执行sql过程(无进度信息)

    0: jdbc:hive2://thrift1:10000> select count(1) from test_table;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    +------+--+
    | c0 |
    +------+--+
    | 104 |
    +------+--+
    1 row selected (23.965 seconds)

    二 代码分析

    hive执行sql的详细过程详见:https://www.cnblogs.com/barneywill/p/10185168.html

    hive中执行sql最终都会调用到Driver.run,run会调用execute,下面直接看execute代码:

    org.apache.hadoop.hive.ql.Driver

      public int execute(boolean deferClose) throws CommandNeedRetryException {
    ...
          if (jobs > 0) {
            logMrWarning(mrJobs);
            console.printInfo("Query ID = " + queryId);
            console.printInfo("Total jobs = " + jobs);
          }
    ...
      private void logMrWarning(int mrJobs) {
        if (mrJobs <= 0 || !("mr".equals(HiveConf.getVar(conf, ConfVars.HIVE_EXECUTION_ENGINE)))) {
          return;
        }
        String warning = HiveConf.generateMrDeprecationWarning();
        LOG.warn(warning);
        warning = "WARNING: " + warning;
        console.printInfo(warning);
        // Propagate warning to beeline via operation log.
        OperationLog ol = OperationLog.getCurrentOperationLog();
        if (ol != null) {
          ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "
    ");
        }
      }

    可见在hive命令中看到的进度信息是通过console.printInfo输出的;
    注意到一个细节,在beeline中虽然没有进度信息,但是有一个warning信息:

    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

    这个warning信息是通过如下代码输出的:

        OperationLog ol = OperationLog.getCurrentOperationLog();
        if (ol != null) {
          ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "
    ");
        }

    所以如果让beeline执行sql也有进度信息,就要通过相同的方式输出;

    三 hive进度信息位置

    熟悉的进度信息在这里:

    org.apache.hadoop.hive.ql.Driver

      public int execute(boolean deferClose) throws CommandNeedRetryException {
    ...
            console.printInfo("Query ID = " + queryId);
            console.printInfo("Total jobs = " + jobs);
    
      private TaskRunner launchTask(Task<? extends Serializable> tsk, String queryId, boolean noName,
          String jobname, int jobs, DriverContext cxt) throws HiveException {
    ...
          console.printInfo("Launching Job " + cxt.getCurJobNo() + " out of " + jobs);

    org.apache.hadoop.hive.ql.exec.mr.MapRedTask

      private void setNumberOfReducers() throws IOException {
        ReduceWork rWork = work.getReduceWork();
        // this is a temporary hack to fix things that are not fixed in the compiler
        Integer numReducersFromWork = rWork == null ? 0 : rWork.getNumReduceTasks();
    
        if (rWork == null) {
          console
              .printInfo("Number of reduce tasks is set to 0 since there's no reduce operator");
        } else {
          if (numReducersFromWork >= 0) {
            console.printInfo("Number of reduce tasks determined at compile time: "
                + rWork.getNumReduceTasks());
          } else if (job.getNumReduceTasks() > 0) {
            int reducers = job.getNumReduceTasks();
            rWork.setNumReduceTasks(reducers);
            console
                .printInfo("Number of reduce tasks not specified. Defaulting to jobconf value of: "
                + reducers);
          } else {
            if (inputSummary == null) {
              inputSummary =  Utilities.getInputSummary(driverContext.getCtx(), work.getMapWork(), null);
            }
            int reducers = Utilities.estimateNumberOfReducers(conf, inputSummary, work.getMapWork(),
                                                              work.isFinalMapRed());
            rWork.setNumReduceTasks(reducers);
            console
                .printInfo("Number of reduce tasks not specified. Estimated from input data size: "
                + reducers);
    
          }
          console
              .printInfo("In order to change the average load for a reducer (in bytes):");
          console.printInfo("  set " + HiveConf.ConfVars.BYTESPERREDUCER.varname
              + "=<number>");
          console.printInfo("In order to limit the maximum number of reducers:");
          console.printInfo("  set " + HiveConf.ConfVars.MAXREDUCERS.varname
              + "=<number>");
          console.printInfo("In order to set a constant number of reducers:");
          console.printInfo("  set " + HiveConf.ConfVars.HADOOPNUMREDUCERS
              + "=<number>");
        }
      }

    大部分都在下边这个类里:

    org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper

      public void jobInfo(RunningJob rj) {
        if (ShimLoader.getHadoopShims().isLocalMode(job)) {
          console.printInfo("Job running in-process (local Hadoop)");
        } else {
          if (SessionState.get() != null) {
            SessionState.get().getHiveHistory().setTaskProperty(queryState.getQueryId(),
                getId(), Keys.TASK_HADOOP_ID, rj.getID().toString());
          }
          console.printInfo(getJobStartMsg(rj.getID()) + ", Tracking URL = "
              + rj.getTrackingURL());
          console.printInfo("Kill Command = " + HiveConf.getVar(job, HiveConf.ConfVars.HADOOPBIN)
              + " job  -kill " + rj.getID());
        }
      }
    
      private MapRedStats progress(ExecDriverTaskHandle th) throws IOException, LockException {
    ...
          StringBuilder report = new StringBuilder();
          report.append(dateFormat.format(Calendar.getInstance().getTime()));
    
          report.append(' ').append(getId());
          report.append(" map = ").append(mapProgress).append("%, ");
          report.append(" reduce = ").append(reduceProgress).append('%');
    ...
          String output = report.toString();
    ...
          console.printInfo(output);
    ...
      
      public static String getJobEndMsg(JobID jobId) {
        return "Ended Job = " + jobId;
      }

    看起来改动工作量不小,哈哈

  • 相关阅读:
    612.1.004 ALGS4 | Elementary Sorts
    612.1.003 ALGS4 | Stacks and Queues
    612.1.002 ALGS4 | Analysis of Algorithms
    132.1.001 Union-Find | 并查集
    如何优雅使用Coursera ? —— Coursera 视频缓冲 & 字幕遮挡
    Jupyter notebook 使用多个Conda 环境
    如何从 GitHub 上下载单个文件夹
    在jupyter notebook中同时安装python2和python3
    修改ps工具栏字体大小
    python之集合
  • 原文地址:https://www.cnblogs.com/barneywill/p/10185949.html
Copyright © 2011-2022 走看看