zoukankan      html  css  js  c++  java
  • wormhole提升hivereader读取速度方案

    背景:

    最近dw用户反馈wormhole传输速度很慢,有些作业甚至需要3-4个小时才能完成,会影响每天线上报表的及时推送。我看了下,基本都是从Hive到其他数据目的地,也就是使用的是hivereader,日志上也显示hivereader实时传输速度很慢,问题应该在hivereader上

    先介绍下wormhole,wormhole是我们开发的一个高速数据传导工具,它支持多种异构数据源,架构设计图如下:


    问题描述:

    每一个wormhole都是一个单机作业,用户需要填写wormhole job xml描述文件,定义好data source,data destination,还有其他一些列配置参数,然后提交job,wormhole 接受job xml文件后,会创建一个job,然后分别对reader和writer端分别进行预处理(Periphery),切分job(Splitter)。之后会起reader thread pool 和 writer thread pool 并发读取和写入数据,中间通过一个storage作为缓冲队列。

    回到之前问题 hive reader中,我会将用户填写的hql,通过JDBC提交到Hive Server中,然后执行返回数据结果,这种方式有几点不好的地方

    1. hql不能拆分,所以只能启动一个reader thread,发挥不了并行读取的优势

    2. 我们hive server部署了两台,由于还有其他产品和查询也需要访问hive server,大规模数据拉取的话,会受限于hive server和service节点网络吞吐量

    3. hql提交后,mapred job会将结果数据先放入一个临时目录下,然后通过一个fetch task拉取到hive server再吐出给wormhole client,经过了datanode -> hive server -> wormhole client, 仍然瓶颈在hive server上


    解决方案:

    提供另一种hivereader执行mode,既然hive server的数据读取是瓶颈,那我可以绕开hive server 直接并行从datanode上读数据,而hive server的作用仅仅是提交hql. 比如用户本身的查询语句是"select * from bi.dpdm_device_permanent_city",可以自动改写成"INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city",将数据insert到一个我们指定的临时目录下,注意两点

    1. 开启set hive.exec.compress.output=true 压缩结果文件,进一步减少和wormhole client交互时候的网络IO

    2. 用户自定义reduce数set mapred.reduce.tasks=N,由于每一个reduce生成一个文件,而hive reader是按照文件数进行切分的,所以用户可以预估数据输出量来设置reduce数

    在periphery环节将hql提交给hiveserver,这时数据已经落地在不同的datanode上,然后splitter根据文件数生成等量的split list,在启动concurrency数的Reader Thread Pool,多线程并行从不同的datanode上fetch(每个线程维护一个DFSClient,会先用ClientProtocol和Namenode通信,然后直接跟datanode 读取block data) , 最后再把临时目录删除掉。


    性能对比:

    测试表:dpdm_device_permanent_city

    一共108593390条record, HDFS_BYTES_READ: 10,149,072,324

    从hiveserver上读取:

    2013-07-12 12:00:30,806 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 107373504 | Write 107372736 | speed 2.89MB/s 34163L/s|
    
    2013-07-12 12:00:40,809 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 107695040 | Write 107694912 | speed 2.84MB/s 32192L/s|
    
    2013-07-12 12:00:50,812 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 108027968 | Write 108027392 | speed 2.83MB/s 33254L/s|
    
    2013-07-12 12:01:00,815 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 108386624 | Write 108386560 | speed 2.93MB/s 35904L/s|
    
    2013-07-12 12:01:09,234 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0
    2013-07-12 12:01:09,235 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1
    2013-07-12 12:01:09,245 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO  core.Engine - Nebula wormhole Job is Completed successfully!
    2013-07-12 12:01:09,592 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO  core.Engine - 
    writer-id-0-hdfswriter:
    Wormhole starts work at   : 2013-07-12 11:01:19
    Wormhole ends work at     : 2013-07-12 12:01:09
    Total time costs          :            3590.01s
    Average byte speed        :            2.58MB/s
    Average line speed        :            30248L/s
    Total transferred records :           108593326

    直接从datanode上读取:

    2013-07-12 10:21:47,431 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:66) INFO  core.Engine - Nebula wormhole Start
    2013-07-12 10:21:47,458 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:100) INFO  core.Engine - Start Reader Threads
    2013-07-12 10:21:47,550 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO  common.DFSUtils - fs.default.name=hdfs://10.2.6.102:-1
    2013-07-12 10:21:49,246 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.createTempDir(HiveReaderPeriphery.java:86) INFO  hivereader.HiveReaderPeriphery - create data temp directory successfully hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67
    2013-07-12 10:21:50,685 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.processInsertQuery(HiveJdbcClient.java:65) INFO  hivereader.HiveJdbcClient - hive execute insert sql:INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city
    2013-07-12 10:24:10,943 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.printMetaDataInfoAndGetColumnCount(HiveJdbcClient.java:104) INFO  hivereader.HiveJdbcClient - selected column names: 
    string deviceid, int trainid, int cityid, string first_day, string last_day, double confidence_lower_bound, double confidence_upper_bound, bigint month_state
    2013-07-12 10:24:11,127 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderSplitter.split(HiveReaderSplitter.java:69) INFO  hivereader.HiveReaderSplitter - splitted files num:44
    2013-07-12 10:24:11,151 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000000_0
    2013-07-12 10:24:11,154 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000001_0
    2013-07-12 10:24:11,157 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000002_0
    2013-07-12 10:24:11,161 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000003_0
    2013-07-12 10:24:11,164 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000004_0
    2013-07-12 10:24:11,169 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000005_0
    2013-07-12 10:24:11,172 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000006_0
    2013-07-12 10:24:11,177 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000007_0
    2013-07-12 10:24:11,181 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000008_0
    2013-07-12 10:24:11,185 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000009_0
    log4j:WARN No appenders could be found for logger (com.hadoop.compression.lzo.GPLNativeCodeLoader).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    2013-07-12 10:24:11,296 [main] com.dp.nebula.wormhole.engine.core.ReaderManager.run(ReaderManager.java:125) INFO  core.ReaderManager - Nebula WormHole start to read data
    2013-07-12 10:24:11,297 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:105) INFO  core.Engine - Start Writer Threads
    2013-07-12 10:24:11,313 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO  common.DFSUtils - fs.default.name=file://null:-1
    2013-07-12 10:24:11,450 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsDirSplitter.split(HdfsDirSplitter.java:73) INFO  hdfswriter.HdfsDirSplitter - HdfsWriter splits file to 2 sub-files .
    2013-07-12 10:24:11,457 [main] com.dp.nebula.wormhole.engine.core.WriterManager.run(WriterManager.java:147) INFO  core.WriterManager - Writer: writer-id-0-hdfswriter start to write data
    2013-07-12 10:24:20,481 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 5116352 | Write 5115776 | speed 43.79MB/s 512748L/s|
    
    2013-07-12 10:24:30,501 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 10688896 | Write 10688320 | speed 47.99MB/s 556083L/s|
    
    2013-07-12 10:24:40,510 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 17341248 | Write 17340672 | speed 55.84MB/s 665222L/s|
    
    2013-07-12 10:24:50,584 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 22791040 | Write 22789824 | speed 46.90MB/s 544902L/s|
    
    2013-07-12 10:24:53,507 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000010_0
    2013-07-12 10:24:53,599 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:00,597 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 30125696 | Write 30124608 | speed 63.22MB/s 733472L/s|
    
    2013-07-12 10:25:08,345 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000011_0
    2013-07-12 10:25:08,582 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:09,263 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000012_0
    2013-07-12 10:25:09,291 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:10,131 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000013_0
    2013-07-12 10:25:10,199 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:10,685 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 36688002 | Write 36687106 | speed 55.07MB/s 656237L/s|
    
    2013-07-12 10:25:12,262 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000014_0
    2013-07-12 10:25:12,274 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:01,532 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 67816280 | Write 67815704 | speed 57.08MB/s 673481L/s|
    
    2013-07-12 10:26:03,898 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000025_0
    2013-07-12 10:26:03,908 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:06,370 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000026_0
    2013-07-12 10:26:06,415 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:10,864 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000027_0
    2013-07-12 10:26:10,889 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:11,539 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 73378191 | Write 73377295 | speed 47.58MB/s 556146L/s|
    2013-07-12 10:26:21,576 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:21,690 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 79406971 | Write 79405898 | speed 51.83MB/s 602846L/s|
    
    2013-07-12 10:26:29,739 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000031_0
    2013-07-12 10:26:29,940 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:32,031 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 85765697 | Write 85764545 | speed 53.87MB/s 635847L/s|
    
    2013-07-12 10:26:34,598 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000032_0
    2013-07-12 10:26:34,606 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:36,369 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000033_0
    2013-07-12 10:26:36,373 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:38,984 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000034_0
    2013-07-12 10:26:38,990 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:39,126 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000035_0
    2013-07-12 10:26:39,134 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:42,090 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 91872401 | Write 91872209 | speed 52.52MB/s 610760L/s|
    
    2013-07-12 10:26:50,914 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:52,096 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 97049556 | Write 97048852 | speed 43.83MB/s 517657L/s|
    
    2013-07-12 10:26:53,283 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000039_0
    2013-07-12 10:26:53,304 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:54,701 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000040_0
    2013-07-12 10:26:54,709 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:02,163 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 103048760 | Write 103047800 | speed 51.35MB/s 599869L/s|
    
    2013-07-12 10:27:03,159 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000041_0
    2013-07-12 10:27:03,170 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:03,266 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000042_1
    2013-07-12 10:27:03,281 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:03,742 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000043_0
    2013-07-12 10:27:03,754 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:11,188 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.doPost(HiveReaderPeriphery.java:106) INFO  hivereader.HiveReaderPeriphery - hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67 has been deleted at dopost stage
    2013-07-12 10:27:12,212 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0
    2013-07-12 10:27:12,213 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1
    2013-07-12 10:27:12,214 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO  core.Engine - Nebula wormhole Job is Completed successfully!
    2013-07-12 10:27:12,525 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO  core.Engine - 
    writer-id-0-hdfswriter:
    Wormhole starts work at   : 2013-07-12 10:21:47
    Wormhole ends work at     : 2013-07-12 10:27:12
    Total time costs          :             325.08s
    Average byte speed        :           28.55MB/s
    Average line speed        :           334046L/s
    Total transferred records :           108593262
    


    直接从datanode上读取平均在53MB/S,从hiveserver读取平均在3MB/S,相差18倍,如果算上加上insert into directory后多出来的stage执行时间,总体相差时间也有11倍,提升还是很明显的.



  • 相关阅读:
    js中的原生Ajax和JQuery中的Ajax
    this的用法
    static的特性
    时政20180807
    java compiler没有1.8怎么办
    Description Resource Path Location Type Java compiler level does not match the version of the installed Java project facet Unknown Faceted Project Problem (Java Version Mismatch)
    分词器
    [数算]有一个工程甲、乙、丙单独做,分别要48天、72天、96天完成
    一点感想
    解析Excel文件 Apache POI框架使用
  • 原文地址:https://www.cnblogs.com/javawebsoa/p/3214944.html
Copyright © 2011-2022 走看看