zoukankan      html  css  js  c++  java
  • 在Ubuntu18.04下配置HBase

    HBase在HDFS基础上提供了高可靠, 列存储, 可扩展的数据库系统. HBase仅能通过主键(row key)和主键的range来检索数据, 主要用来存储非结构化和半结构化的松散数据. 与Hadoop一样, HBase依靠横向扩展, 通过不断增加廉价的普通服务器来增加计算和存储能力. 适合使用HBase的数据表特点为:

    • 数量巨大: 一个表可以存储数亿行, 数百万列
    • 列存储: 面向列的存储和权限控制, 列族独立检索. 
    • 稀疏字段: 数据中的空(null)字段不占用存储空间, 因此适合于存储非常稀疏的表

    Row Key
    Row key是用来检索记录的主键, 访问table中的行只有三种方式:

    • 通过单个row key访问
    • 通过row key的range
    • 全表扫描

    Row key可以是任意字符串, 最大长度是64KB, 实际应用中一般使用10 ~ 100Bytes
    在HBase内部, Row key保存为字节数组, 存储时, 数据按照Row key的字典序(byte order)排序存储, 设计key时要充分利用排序存储这个特性, 将经常一起读取的行存储在一起. 注意: 字典排序对int排序的结果是1, 10, 100, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21... 如果使行键按整数大小排序, 必须在左边填充0.
    行的一次读写是原子操作, 不论一次读写多少列. 这个设计决策能够使用户很容易的理解程序在同一行进行并发更新时的行为.

    Column Family CF, 列族
    HBase表中的每个列都归属于某个CF. CF是表的schema的一部分(而列不是), 必须在使用表之前定义. 列名都以CF作为前缀, 例如cf:username, cf:code都属于cf这个列族. 访问控制, 磁盘和和内存的使用统计都是在列族这个层面进行的. 实际应用中, 列族上的控制权限能帮助我们管理不同类型的应用, 我们允许一些应用可以添加新的基本数据, 一些应用可以读取基本数据并创建继承的列族, 一些应用只允许浏览数据(甚至可能因为隐私的原因不能浏览所有的数据).

    Timestamp 时间戳
    HBase中通过Row key和Columns确定的一个存储单元称为cell. 每个cell都保存着同一份数据的多个版本, 版本通过时间戳来索引. 时间戳的类型是64位整型数据, 时间戳可以由HBase自动赋值(在数据写入时). 此时时间戳是精确到毫秒的当前系统时间. 时间戳也可以由客户端显式的赋值. 如果应用程序要避免数据版本冲突, 就必须自己生成具有唯一性的时间戳. 每个cell中不同版本的数据按照时间倒序排序, 即最新的数据排列在最前面.
    为了避免数据存在过多版本造成的管理(包括存储和索引)的负担, HBase提供了两种数据版本回收方式: 一是保存数据的最后n个版本, 二是保存最近一段时间内的版本(比如近十天). 用户可以针对每个列族进行设置. 

    Cell
    由于{row key, column{=<family>+<label>}, version} 确定的唯一单元. cell中的数据是没有类型的,全部是以字节码形式存储.

    系统设置

    安装ntp

    避免服务器间时间不同步

    设置ulimit

    参考自 https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_ig_hbase_config.html

    对于运行hdfs和hbase的用户, 其文件打开数量限制和运行进程数量限制可以通过 ulimit -n 和 ulimit -u 查看和设置, 如果需要在启动时自动应用, 可以写入该用户的 .bashrc

    另一种配置方式是通过 PAM (Pluggable Authentication Modules). 
    修改 /etc/security/limits.conf, 对每个需要调整的用户增加两行配置, 例如

    hdfs  -       nofile  32768
    hdfs  -       nproc   2048
    hbase -       nofile  32768
    hbase -       nproc   2048

    为了让配置生效, 需要修改 /etc/pam.d/common-session 在里面增加一行

    session required  pam_limits.so

    Zookeeper配置

    Zookeeper集群的节点数量和配置
    数量上, 只运行一个节点也可以, 但是在生产环境一般会运行3~7个(奇数)节点, 数量越多对单个节点故障的容忍度就越高. 使用奇数是因为如果使用偶数的话, 选举需要的法定人数(quorum)更高. 4个节点和5个节点需要的quorum都是3. 配置上, 建议给每个节点1GB的内存, 如果可以的话, 每个节点使用自己的独立硬盘. 对于负载很高的集群, 建议将节点运行在独立的机器上, 与RegionServer(DataNodes and TaskTrackers)分开.

    HBase设置

    主节点

    解压, 修改 conf/regionservers, 删除localhost, 添加从节点的hostname, 这些主机会随着主节点的启动而启动, 停止而停止

    vm149
    vm150

    如果需要有backup master, 在conf/ 下面添加配置文件 backup-masters, 添加对应的hostname

    修改 conf/hbase-env.sh

    export JAVA_HOME==/opt/jdk/latest
    export HBASE_MANAGES_ZK=false
    export HBASE_LOG_DIR=/home/tomcat/run/hbase/logs

    HBASE_MANAGES_ZK=false表示使用外置的zookeeper
    HBASE_LOG_DIR 如果不使用安装目录下的logs存放日志, 需要在这里指定日志路径, 否则可能在启动时无法写入

    修改 conf/hbase-site.xml

    <configuration>
        <property>
          <name>hbase.cluster.distributed</name>
          <value>true</value>
        </property>
        <property>
          <name>hbase.rootdir</name>
          <value>hdfs://vm148:9000/hbase</value>
        </property>
        <property>
          <name>hbase.zookeeper.property.clientPort</name>
          <value>2222</value>
          <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.</description>
        </property>
        <property>
          <name>hbase.zookeeper.quorum</name>  
          <value>vm151,vm152,vm153</value>
          <description>For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.</description>
        </property>
    </configuration>

    默认的端口是2181, 如果不是标准端口, 则需要在配置中体现. }
    zookeeper.quorum这个参数是必须的, 这里会列出zk集群的所有节点. 因为用的是独立管理的zk集群, 所以其他的zk参数都不需要.

    启动后, 可以通过./zkCli.sh 在里面 ls /hbase 来检查是否正确连接

    从节点

    将配置好的目录从主节点直接复制到从节点

    启动

    启动顺序

    start-dfs.sh (主节点)
    start-yarn.sh (主节点)
    zkServer.sh start (各个zk节点)
    start-hbase.sh (主节点)

    启动后, 访问主节点的 16010 端口 http://vm148:16010/ 就能看到HBase的webui

    其他配置

    设置dfs.datanode.max.transfer.threads

    dfs.datanode.max.transfer.threads 是HDFS的参数, 用于替换掉作废的参数dfs.datanode.max.xciever. 这个参数用于控制HDFS datanode在同一时间服务的文件数量上限. 修改配置文件 etc/hadoop/conf/hdfs-site.xml, 增加以下条目

    <property>
      <name>dfs.datanode.max.transfer.threads</name>
      <value>4096</value>
    </property>

    配置HBase的BlockCache

    默认配置下, HBase使用的是单独的on-heap cache, 如果配置了BucketCache, 那么on-heap cache就只用于Bloom filters和索引, 而off-heap的BucketCache则用于数据cache. 这种形式称为Blockcache配置. 这样可以使用更大的内存缓存, 也可以避免jvm gc带来的影响.

    命令行参考

    进入shell环境

    ./bin/hbase shell
    

    列出所有table: list (如果list后面加'table name', 可以用于检查 table 是否存在)

    hbase(main):001:0> list
    TABLE                                                                                           
    users                                                                                     
    1 row(s)
    Took 0.5786 seconds                                                                             
    => ["users"]
    

    显示table明细: describe 'table name'

    hbase(main):003:0> describe 'users'
    Table users is ENABLED                                                                    
    users                                                                                     
    COLUMN FAMILIES DESCRIPTION                                                                     
    {NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false
    ', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',
     TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_IN
    DEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS
    _ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}         
    1 row(s)
    Took 0.2581 seconds          
    

    启用, 停用table: disable / enable 'table name'

    hbase(main):004:0> disable 'users'
    Took 0.5861 seconds                                                                             
    hbase(main):005:0> enable 'users'
    Took 0.7949 seconds
    

    创建table: create 'table name', 'cf field', (cf可以多个)

    hbase(main):006:0> create 'test','cf'
    Created table test
    Took 1.3285 seconds                                                                             
    => Hbase::Table - test
    
    hbase(main):008:0> create 'test2','cf1','cf2','cf3'
    Created table test2
    Took 1.2728 seconds 
    => Hbase::Table - test2
    

    删除table: drop 'table name' 在drop之前需要先disable.
    注意: 在drop后不会立即释放磁盘空间, 默认情况下, 空间会在5分钟后释放.

    hbase(main):011:0> disable 'test2'
    Took 0.4568 seconds                                                                             
    hbase(main):012:0> drop 'test2'
    Took 0.5034 seconds 
    

    列出table记录: scan 'table name'

    hbase(main):013:0> scan 'test'
    ROW                       COLUMN+CELL                                                           
    0 row(s)
    Took 0.1512 seconds 
    

    新增记录:  put 'table name', 'row id', 'cf field', 'value' 
    对于一个row id, 每次只能put一个字段值, 给同一个row id分别put不同字段的值, 在scan时实际上是显示为多行的

    hbase(main):026:0> put 'test','row001','cf:a','001'
    Took 0.0884 seconds                                                                             
    hbase(main):027:0> put 'test','row002','cf:a','002'
    Took 0.0076 seconds                                                                             
    hbase(main):028:0> put 'test','row003','cf:b','001'
    Took 0.0086 seconds                                                                             
    hbase(main):029:0> scan 'test'
    ROW                       COLUMN+CELL                                                           
     row001                   column=cf:a, timestamp=1548510719243, value=001                       
     row002                   column=cf:a, timestamp=1548510724943, value=002                       
     row003                   column=cf:b, timestamp=1548510733680, value=001                       
    3 row(s)
    Took 0.0477 seconds    
    

    读取一个row id的所有字段记录:  get 'table name', 'row id'

    hbase(main):032:0> get 'test', 'row001'
    COLUMN                    CELL                                                                  
     cf:a                     timestamp=1548510719243, value=001                                    
     cf:b                     timestamp=1548510892749, value=003                                    
    1 row(s)
    Took 0.0491 seconds               
    

    删除一个row id 在指定字段上的记录: delete 'table name', 'row id', 'cf field'

    hbase(main):033:0> delete 'test', 'row001', 'cf:b'
    Took 0.0298 seconds                                                                             
    hbase(main):034:0> get 'test', 'row001'
    COLUMN                    CELL                                                                  
     cf:a                     timestamp=1548510719243, value=001                                    
    1 row(s)
    Took 0.0323 seconds
    

    如果要删除一整个row id, 要使用 deleteall:

    hbase(main):045:0> deleteall 'test', 'row004'
    Took 0.0081 seconds
    

    统计row id数量: count 'table name'

    hbase(main):039:0> scan 'test'
    ROW                       COLUMN+CELL                                                           
     row001                   column=cf:a, timestamp=1548510719243, value=001                       
     row001                   column=cf:b, timestamp=1548511393583, value=003                       
     row002                   column=cf:a, timestamp=1548510724943, value=002                       
     row002                   column=cf:b, timestamp=1548511400007, value=002                       
     row003                   column=cf:b, timestamp=1548510733680, value=001                       
    3 row(s)
    Took 0.0409 seconds                                                                             
    hbase(main):040:0> count 'test'
    3 row(s)
    Took 0.0178 seconds                                                                             
    => 3
    

    .清空table: truncate 'table name'
    这个命令实际上是执行了disable, drop, recreate三个步骤

    hbase(main):047:0> truncate 'test'
    Truncating 'test' table (it may take a while):
    Disabling table...
    Truncating table...
    Took 2.1415 seconds
    

    将csv文件导入hbase

    假定csv文件在当前文件系统目录下(不是hdfs), csv文件以逗号分隔, 要将其导入目标表格为test:

    $ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv
    

    因为默认使用的是TSV格式, 对于CSV格式需要特别指定分隔符为','. 
    目标字段使用importtsv,columns参数指定, 根据csv文件中的列依次对应hbase table中的cf字段.

    导入过程的完整输出为

    $ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv
    2019-01-26 14:35:52,566 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:host.name=vm148
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_192
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.home=/opt/jdk/jdk1.8.0_192/jre
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: opt/hbase/latest/bin/../lib/protobuf-java-2.5.0.jar:/opt/hbase/latest/bin/../lib/snappy-java-1.0.5.jar:/opt/hbase/latest/bin/../lib/spymemcached-2.12.2.jar:/opt/hbase/latest/bin/../lib/validation-api-1.1.0.Final.jar:/opt/hbase/latest/bin/../lib/xmlenc-0.52.jar:/opt/hbase/latest/bin/../lib/xz-1.0.jar:/opt/hbase/latest/bin/../lib/zookeeper-3.4.10.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.name=Linux
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.arch=amd64
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.version=4.15.0-43-generic
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.name=tomcat
    2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.home=/home/tomcat
    2019-01-26 14:35:52,950 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.dir=/home/tomcat
    2019-01-26 14:35:52,951 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
    2019-01-26 14:35:52,969 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error)
    2019-01-26 14:35:52,974 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session
    2019-01-26 14:35:52,986 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0002, negotiated timeout = 40000
    2019-01-26 14:35:54,071 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Session: 0x3002261518a0002 closed
    2019-01-26 14:35:54,074 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0002
    2019-01-26 14:35:54,095 INFO  [main] Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    2019-01-26 14:35:54,096 INFO  [main] jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    2019-01-26 14:35:54,126 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
    2019-01-26 14:35:54,130 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error)
    2019-01-26 14:35:54,134 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session
    2019-01-26 14:35:54,138 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0003, negotiated timeout = 40000
    2019-01-26 14:35:54,416 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Session: 0x3002261518a0003 closed
    2019-01-26 14:35:54,416 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0003
    2019-01-26 14:35:54,579 INFO  [main] input.FileInputFormat: Total input paths to process : 1
    2019-01-26 14:35:54,615 INFO  [main] mapreduce.JobSubmitter: number of splits:1
    2019-01-26 14:35:54,752 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_local98574210_0001
    2019-01-26 14:35:55,026 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop2-compat-2.1.2.jar
    2019-01-26 14:35:55,084 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop2-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar
    2019-01-26 14:35:55,686 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar <- /home/tomcat/jackson-core-2.9.2.jar
    2019-01-26 14:35:55,693 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-core-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar
    2019-01-26 14:35:55,713 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar <- /home/tomcat/hbase-metrics-2.1.2.jar
    2019-01-26 14:35:55,722 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar
    2019-01-26 14:35:55,744 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar <- /home/tomcat/hadoop-common-2.7.7.jar
    2019-01-26 14:35:55,746 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-common-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar
    2019-01-26 14:35:55,746 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar <- /home/tomcat/zookeeper-3.4.10.jar
    2019-01-26 14:35:55,754 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/zookeeper-3.4.10.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar
    2019-01-26 14:35:55,755 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar <- /home/tomcat/hbase-protocol-shaded-2.1.2.jar
    2019-01-26 14:35:55,758 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-shaded-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar
    2019-01-26 14:35:55,758 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar <- /home/tomcat/hbase-client-2.1.2.jar
    2019-01-26 14:35:55,760 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-client-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar
    2019-01-26 14:35:55,760 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar <- /home/tomcat/hadoop-mapreduce-client-core-2.7.7.jar
    2019-01-26 14:35:55,762 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-mapreduce-client-core-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar
    2019-01-26 14:35:55,762 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar <- /home/tomcat/hbase-shaded-netty-2.1.0.jar
    2019-01-26 14:35:55,763 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-netty-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar
    2019-01-26 14:35:55,763 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar <- /home/tomcat/commons-lang3-3.6.jar
    2019-01-26 14:35:55,766 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/commons-lang3-3.6.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar
    2019-01-26 14:35:55,766 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar <- /home/tomcat/hbase-mapreduce-2.1.2.jar
    2019-01-26 14:35:55,768 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-mapreduce-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar
    2019-01-26 14:35:55,768 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar <- /home/tomcat/metrics-core-3.2.1.jar
    2019-01-26 14:35:55,770 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/metrics-core-3.2.1.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar
    2019-01-26 14:35:55,770 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar <- /home/tomcat/hbase-common-2.1.2.jar
    2019-01-26 14:35:55,771 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-common-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar
    2019-01-26 14:35:55,771 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar <- /home/tomcat/htrace-core4-4.2.0-incubating.jar
    2019-01-26 14:35:55,775 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar
    2019-01-26 14:35:55,775 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop-compat-2.1.2.jar
    2019-01-26 14:35:55,777 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar
    2019-01-26 14:35:55,777 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar <- /home/tomcat/hbase-zookeeper-2.1.2.jar
    2019-01-26 14:35:55,778 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-zookeeper-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar
    2019-01-26 14:35:55,779 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar <- /home/tomcat/hbase-shaded-miscellaneous-2.1.0.jar
    2019-01-26 14:35:55,780 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-miscellaneous-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar
    2019-01-26 14:35:55,781 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar <- /home/tomcat/protobuf-java-2.5.0.jar
    2019-01-26 14:35:55,782 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/protobuf-java-2.5.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar
    2019-01-26 14:35:55,782 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar <- /home/tomcat/jackson-annotations-2.9.2.jar
    2019-01-26 14:35:55,784 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-annotations-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar
    2019-01-26 14:35:55,784 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar <- /home/tomcat/hbase-server-2.1.2.jar
    2019-01-26 14:35:55,786 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-server-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar
    2019-01-26 14:35:55,786 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar <- /home/tomcat/hbase-metrics-api-2.1.2.jar
    2019-01-26 14:35:55,787 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-api-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar
    2019-01-26 14:35:55,788 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar <- /home/tomcat/jackson-databind-2.9.2.jar
    2019-01-26 14:35:55,789 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-databind-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar
    2019-01-26 14:35:55,790 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar <- /home/tomcat/hbase-protocol-2.1.2.jar
    2019-01-26 14:35:55,791 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar
    2019-01-26 14:35:55,791 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar <- /home/tomcat/hbase-shaded-protobuf-2.1.0.jar
    2019-01-26 14:35:55,799 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-protobuf-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar
    2019-01-26 14:35:55,852 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar
    2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar
    2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar
    2019-01-26 14:35:55,858 INFO  [main] mapreduce.Job: The url to track the job: http://localhost:8080/
    2019-01-26 14:35:55,858 INFO  [main] mapreduce.Job: Running job: job_local98574210_0001
    2019-01-26 14:35:55,861 INFO  [Thread-55] mapred.LocalJobRunner: OutputCommitter set in config null
    2019-01-26 14:35:55,892 INFO  [Thread-55] mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.hbase.mapreduce.TableOutputCommitter
    2019-01-26 14:35:55,936 INFO  [Thread-55] mapred.LocalJobRunner: Waiting for map tasks
    2019-01-26 14:35:55,938 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Starting task: attempt_local98574210_0001_m_000000_0
    2019-01-26 14:35:55,995 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    2019-01-26 14:35:56,000 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: file:/home/tomcat/output.csv:0+1703
    2019-01-26 14:35:56,008 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
    2019-01-26 14:35:56,009 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error)
    2019-01-26 14:35:56,009 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session
    2019-01-26 14:35:56,016 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420008, negotiated timeout = 40000
    2019-01-26 14:35:56,021 INFO  [LocalJobRunner Map Task Executor #0] mapreduce.TableOutputFormat: Created table instance for test
    2019-01-26 14:35:56,047 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
    2019-01-26 14:35:56,048 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error)
    2019-01-26 14:35:56,049 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session
    2019-01-26 14:35:56,052 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420009, negotiated timeout = 40000
    2019-01-26 14:35:56,116 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Session: 0x200226284420009 closed
    2019-01-26 14:35:56,116 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420009
    2019-01-26 14:35:56,138 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: 
    2019-01-26 14:35:56,280 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task: Task:attempt_local98574210_0001_m_000000_0 is done. And is in the process of committing
    2019-01-26 14:35:56,289 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Session: 0x200226284420008 closed
    2019-01-26 14:35:56,289 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420008
    2019-01-26 14:35:56,296 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: map
    2019-01-26 14:35:56,296 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task: Task 'attempt_local98574210_0001_m_000000_0' done.
    2019-01-26 14:35:56,303 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task: Final Counters for attempt_local98574210_0001_m_000000_0: Counters: 16
    	File System Counters
    		FILE: Number of bytes read=37574934
    		FILE: Number of bytes written=38237355
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    	Map-Reduce Framework
    		Map input records=30
    		Map output records=30
    		Input split bytes=93
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=8
    		Total committed heap usage (bytes)=62849024
    	ImportTsv
    		Bad Lines=0
    	File Input Format Counters 
    		Bytes Read=1703
    	File Output Format Counters 
    		Bytes Written=0
    2019-01-26 14:35:56,304 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Finishing task: attempt_local98574210_0001_m_000000_0
    2019-01-26 14:35:56,304 INFO  [Thread-55] mapred.LocalJobRunner: map task executor complete.
    2019-01-26 14:35:56,860 INFO  [main] mapreduce.Job: Job job_local98574210_0001 running in uber mode : false
    2019-01-26 14:35:56,862 INFO  [main] mapreduce.Job:  map 100% reduce 0%
    2019-01-26 14:35:56,866 INFO  [main] mapreduce.Job: Job job_local98574210_0001 completed successfully
    2019-01-26 14:35:56,899 INFO  [main] mapreduce.Job: Counters: 16
    	File System Counters
    		FILE: Number of bytes read=37574934
    		FILE: Number of bytes written=38237355
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    	Map-Reduce Framework
    		Map input records=30
    		Map output records=30
    		Input split bytes=93
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=8
    		Total committed heap usage (bytes)=62849024
    	ImportTsv
    		Bad Lines=0
    	File Input Format Counters 
    		Bytes Read=1703
    	File Output Format Counters 
    		Bytes Written=0
    

    .将tsv导入hbase, 这边使用的文件直接导入, 2.6GB, 5kw条记录花了整整29分钟, 不知道是不是放到hdfs里再导入会快一些?

    /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:key1,cf:key2,cf:key3,cf:key4,cf:key5 worktable posts.txt
    

    Update 2019-01-28: 关于TSV文件里的双引号, 字段值当中的tab字符:

    如果mysql -e导出时使用了OPTIONALLY ENCLOSED BY '"', 那么导出的tsv文件中, 凡是字符串类型的字段, 都会加上双引号, 在通过上面的语句导入到HBase中之后, 会发现双引号出现在了字段的value当中. 所以mysql -e时, 不建议使用 OPTIONALLY ENCLOSED BY '"' 参数

    如果mysql的记录中, 字段的值包含了tab, 那么在导出时, 会被自动转义, 如下

    40	2	,	[bot]	1528869876
    41	2	[bot],	1528869876
    42	2	t	[bot]"	1528869876
    43	2	t	[bot]'	1528869876
    44	2	't	[bot]'	1528869876
    45	2	"t	[bot]"	1528869876
    46	2	t	[bot]	1528869876
    47	2	tab		[bot]	1528869876

    这个和是否使用OPTIONALLY ENCLOSED BY '"' 有关, 上面是不加此参数的, 下面是加了此参数输出的内容

    40	2	",	[bot]"	1528869876
    41	2	"[bot],"	1528869876
    42	2	"t	[bot]""	1528869876
    43	2	"t	[bot]'"	1528869876
    44	2	"'t	[bot]'"	1528869876
    45	2	""t	[bot]""	1528869876
    46	2	"t	[bot]"	1528869876
    47	2	"tab		[bot]"	1528869876

    可以看到, 加了此参数后, 就不再转义tab, 而是转义双引号.

    而对于importTSV, 对于以上两种TSV文件, 这几行带tab的数据都是不能正常导入的, 会被处理为Bad Line. 因为importTSV处理分隔符时是简单地对单字符逐个处理, 并不会识别转义的tab. 具体的代码可以查看其源代码 https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java aaa其中的 public ParsedLine parse(byte[] lineBytes, int length) 方法.

    所以如果字段值带tab的话, 要么换一个不冲突的分隔符, 要么在生成TSV时替换成别的内容(例如空格)

    ==

    在hbase shell里, count 'worktable' 速度非常慢, 花了一个小时才count完

    Current count: 49458000, row: 9999791                                                           
    49458230 row(s)
    Took 3684.2802 seconds                                                                          
    => 49458230
    

    get的速度很快

    hbase(main):056:0> get 'smth','1995'
    COLUMN                    CELL                                                                  
     cf:post_time             timestamp=1548515983185, value=876546980                              
     cf:user_id               timestamp=1548515983185, value=554                                    
     cf:username              timestamp=1548515983185, value="aaa"                                 
    1 row(s)
    Took 0.0882 seconds                                                                             
    hbase(main):057:0> get 'smth','49471229'
    COLUMN                    CELL                                                                  
     cf:post_time             timestamp=1548515983185, value=1546941261                             
     cf:user_id               timestamp=1548515983185, value=161838                                 
     cf:username              timestamp=1548515983185, value="bbb"                          
    1 row(s)
    Took 0.0873 seconds
    

    .

  • 相关阅读:
    ASP.NET
    ASP.NET
    MSSQL
    ASP.NET
    HTML+CSS+JS
    HTML+CSS
    ASP.NET、WinForm、C#
    MSSQL
    WinFrom
    线性代数应该这样学一
  • 原文地址:https://www.cnblogs.com/milton/p/10316076.html
Copyright © 2011-2022 走看看