zoukankan      html  css  js  c++  java
  • wormhole提升hivereader读取速度方案

    背景:

    最近dw用户反馈wormhole传输速度很慢,有些作业甚至需要3-4个小时才能完成,会影响每天线上报表的及时推送。我看了下,基本都是从Hive到其他数据目的地,也就是使用的是hivereader,日志上也显示hivereader实时传输速度很慢,问题应该在hivereader上

    先介绍下wormhole,wormhole是我们开发的一个高速数据传导工具,它支持多种异构数据源,架构设计图如下:


    问题描述:

    每一个wormhole都是一个单机作业,用户需要填写wormhole job xml描述文件,定义好data source,data destination,还有其他一些列配置参数,然后提交job,wormhole 接受job xml文件后,会创建一个job,然后分别对reader和writer端分别进行预处理(Periphery),切分job(Splitter)。之后会起reader thread pool 和 writer thread pool 并发读取和写入数据,中间通过一个storage作为缓冲队列。

    回到之前问题 hive reader中,我会将用户填写的hql,通过JDBC提交到Hive Server中,然后执行返回数据结果,这种方式有几点不好的地方

    1. hql不能拆分,所以只能启动一个reader thread,发挥不了并行读取的优势

    2. 我们hive server部署了两台,由于还有其他产品和查询也需要访问hive server,大规模数据拉取的话,会受限于hive server和service节点网络吞吐量

    3. hql提交后,mapred job会将结果数据先放入一个临时目录下,然后通过一个fetch task拉取到hive server再吐出给wormhole client,经过了datanode -> hive server -> wormhole client, 仍然瓶颈在hive server上


    解决方案:

    提供另一种hivereader执行mode,既然hive server的数据读取是瓶颈,那我可以绕开hive server 直接并行从datanode上读数据,而hive server的作用仅仅是提交hql. 比如用户本身的查询语句是"select * from bi.dpdm_device_permanent_city",可以自动改写成"INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city",将数据insert到一个我们指定的临时目录下,注意两点

    1. 开启set hive.exec.compress.output=true 压缩结果文件,进一步减少和wormhole client交互时候的网络IO

    2. 用户自定义reduce数set mapred.reduce.tasks=N,由于每一个reduce生成一个文件,而hive reader是按照文件数进行切分的,所以用户可以预估数据输出量来设置reduce数

    在periphery环节将hql提交给hiveserver,这时数据已经落地在不同的datanode上,然后splitter根据文件数生成等量的split list,在启动concurrency数的Reader Thread Pool,多线程并行从不同的datanode上fetch(每个线程维护一个DFSClient,会先用ClientProtocol和Namenode通信,然后直接跟datanode 读取block data) , 最后再把临时目录删除掉。


    性能对比:

    测试表:dpdm_device_permanent_city

    一共108593390条record, HDFS_BYTES_READ: 10,149,072,324

    从hiveserver上读取:

    2013-07-12 12:00:30,806 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 107373504 | Write 107372736 | speed 2.89MB/s 34163L/s|
    
    2013-07-12 12:00:40,809 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 107695040 | Write 107694912 | speed 2.84MB/s 32192L/s|
    
    2013-07-12 12:00:50,812 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 108027968 | Write 108027392 | speed 2.83MB/s 33254L/s|
    
    2013-07-12 12:01:00,815 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 108386624 | Write 108386560 | speed 2.93MB/s 35904L/s|
    
    2013-07-12 12:01:09,234 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0
    2013-07-12 12:01:09,235 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1
    2013-07-12 12:01:09,245 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO  core.Engine - Nebula wormhole Job is Completed successfully!
    2013-07-12 12:01:09,592 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO  core.Engine - 
    writer-id-0-hdfswriter:
    Wormhole starts work at   : 2013-07-12 11:01:19
    Wormhole ends work at     : 2013-07-12 12:01:09
    Total time costs          :            3590.01s
    Average byte speed        :            2.58MB/s
    Average line speed        :            30248L/s
    Total transferred records :           108593326

    直接从datanode上读取:

    2013-07-12 10:21:47,431 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:66) INFO  core.Engine - Nebula wormhole Start
    2013-07-12 10:21:47,458 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:100) INFO  core.Engine - Start Reader Threads
    2013-07-12 10:21:47,550 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO  common.DFSUtils - fs.default.name=hdfs://10.2.6.102:-1
    2013-07-12 10:21:49,246 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.createTempDir(HiveReaderPeriphery.java:86) INFO  hivereader.HiveReaderPeriphery - create data temp directory successfully hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67
    2013-07-12 10:21:50,685 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.processInsertQuery(HiveJdbcClient.java:65) INFO  hivereader.HiveJdbcClient - hive execute insert sql:INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city
    2013-07-12 10:24:10,943 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.printMetaDataInfoAndGetColumnCount(HiveJdbcClient.java:104) INFO  hivereader.HiveJdbcClient - selected column names: 
    string deviceid, int trainid, int cityid, string first_day, string last_day, double confidence_lower_bound, double confidence_upper_bound, bigint month_state
    2013-07-12 10:24:11,127 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderSplitter.split(HiveReaderSplitter.java:69) INFO  hivereader.HiveReaderSplitter - splitted files num:44
    2013-07-12 10:24:11,151 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000000_0
    2013-07-12 10:24:11,154 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000001_0
    2013-07-12 10:24:11,157 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000002_0
    2013-07-12 10:24:11,161 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000003_0
    2013-07-12 10:24:11,164 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000004_0
    2013-07-12 10:24:11,169 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000005_0
    2013-07-12 10:24:11,172 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000006_0
    2013-07-12 10:24:11,177 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000007_0
    2013-07-12 10:24:11,181 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000008_0
    2013-07-12 10:24:11,185 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000009_0
    log4j:WARN No appenders could be found for logger (com.hadoop.compression.lzo.GPLNativeCodeLoader).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    2013-07-12 10:24:11,296 [main] com.dp.nebula.wormhole.engine.core.ReaderManager.run(ReaderManager.java:125) INFO  core.ReaderManager - Nebula WormHole start to read data
    2013-07-12 10:24:11,297 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:105) INFO  core.Engine - Start Writer Threads
    2013-07-12 10:24:11,313 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO  common.DFSUtils - fs.default.name=file://null:-1
    2013-07-12 10:24:11,450 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsDirSplitter.split(HdfsDirSplitter.java:73) INFO  hdfswriter.HdfsDirSplitter - HdfsWriter splits file to 2 sub-files .
    2013-07-12 10:24:11,457 [main] com.dp.nebula.wormhole.engine.core.WriterManager.run(WriterManager.java:147) INFO  core.WriterManager - Writer: writer-id-0-hdfswriter start to write data
    2013-07-12 10:24:20,481 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 5116352 | Write 5115776 | speed 43.79MB/s 512748L/s|
    
    2013-07-12 10:24:30,501 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 10688896 | Write 10688320 | speed 47.99MB/s 556083L/s|
    
    2013-07-12 10:24:40,510 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 17341248 | Write 17340672 | speed 55.84MB/s 665222L/s|
    
    2013-07-12 10:24:50,584 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 22791040 | Write 22789824 | speed 46.90MB/s 544902L/s|
    
    2013-07-12 10:24:53,507 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000010_0
    2013-07-12 10:24:53,599 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:00,597 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 30125696 | Write 30124608 | speed 63.22MB/s 733472L/s|
    
    2013-07-12 10:25:08,345 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000011_0
    2013-07-12 10:25:08,582 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:09,263 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000012_0
    2013-07-12 10:25:09,291 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:10,131 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000013_0
    2013-07-12 10:25:10,199 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:25:10,685 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 36688002 | Write 36687106 | speed 55.07MB/s 656237L/s|
    
    2013-07-12 10:25:12,262 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000014_0
    2013-07-12 10:25:12,274 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:01,532 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 67816280 | Write 67815704 | speed 57.08MB/s 673481L/s|
    
    2013-07-12 10:26:03,898 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000025_0
    2013-07-12 10:26:03,908 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:06,370 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000026_0
    2013-07-12 10:26:06,415 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:10,864 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000027_0
    2013-07-12 10:26:10,889 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:11,539 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 73378191 | Write 73377295 | speed 47.58MB/s 556146L/s|
    2013-07-12 10:26:21,576 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:21,690 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 79406971 | Write 79405898 | speed 51.83MB/s 602846L/s|
    
    2013-07-12 10:26:29,739 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000031_0
    2013-07-12 10:26:29,940 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:32,031 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 85765697 | Write 85764545 | speed 53.87MB/s 635847L/s|
    
    2013-07-12 10:26:34,598 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000032_0
    2013-07-12 10:26:34,606 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:36,369 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000033_0
    2013-07-12 10:26:36,373 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:38,984 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000034_0
    2013-07-12 10:26:38,990 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:39,126 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000035_0
    2013-07-12 10:26:39,134 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:42,090 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 91872401 | Write 91872209 | speed 52.52MB/s 610760L/s|
    
    2013-07-12 10:26:50,914 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:52,096 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 97049556 | Write 97048852 | speed 43.83MB/s 517657L/s|
    
    2013-07-12 10:26:53,283 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000039_0
    2013-07-12 10:26:53,304 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:26:54,701 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000040_0
    2013-07-12 10:26:54,709 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:02,163 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO  core.Engine - 
    writer-id-0-hdfswriter stat:  Read 103048760 | Write 103047800 | speed 51.35MB/s 599869L/s|
    
    2013-07-12 10:27:03,159 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000041_0
    2013-07-12 10:27:03,170 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:03,266 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000042_1
    2013-07-12 10:27:03,281 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:03,742 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO  hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000043_0
    2013-07-12 10:27:03,754 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO  hivereader.HiveReader - codec not found, using text file reader
    2013-07-12 10:27:11,188 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.doPost(HiveReaderPeriphery.java:106) INFO  hivereader.HiveReaderPeriphery - hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67 has been deleted at dopost stage
    2013-07-12 10:27:12,212 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0
    2013-07-12 10:27:12,213 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO  hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1
    2013-07-12 10:27:12,214 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO  core.Engine - Nebula wormhole Job is Completed successfully!
    2013-07-12 10:27:12,525 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO  core.Engine - 
    writer-id-0-hdfswriter:
    Wormhole starts work at   : 2013-07-12 10:21:47
    Wormhole ends work at     : 2013-07-12 10:27:12
    Total time costs          :             325.08s
    Average byte speed        :           28.55MB/s
    Average line speed        :           334046L/s
    Total transferred records :           108593262
    


    直接从datanode上读取平均在53MB/S,从hiveserver读取平均在3MB/S,相差18倍,如果算上加上insert into directory后多出来的stage执行时间,总体相差时间也有11倍,提升还是很明显的.



  • 相关阅读:
    下一代的前端构建工具:parcel打包react
    vue or react mvvm里的文字上下滚动
    CSS grid layout demo 网格布局实例
    js页面可视区域懒加载
    Vue双向绑定简单实现
    React Router 4.0中文快速入门
    Array.isArray() 和 isObject() 原生js实现
    60分钟课程: 用egg.js实现增删改查,文件上传和restfulApi, webpack react es6 (一)
    mirror.js 整合redux的好工具
    React-redux及异步获取数据20分钟快速入门
  • 原文地址:https://www.cnblogs.com/javawebsoa/p/3214944.html
Copyright © 2011-2022 走看看