zoukankan      html  css  js  c++  java
  • 【原创】大叔经验分享(26)hive通过外部表读写elasticsearch数据

    hive通过外部表读写elasticsearch数据,和读写hbase数据差不多,差别是需要下载elasticsearch-hadoop-hive-6.6.2.jar,然后使用其中的EsStorageHandler;

    Connect the massive data storage and deep processing power of Hadoop with the real-time search and analytics of Elasticsearch. The Elasticsearch-Hadoop (ES-Hadoop) connector lets you get quick insight from your big data and makes working in the Hadoop ecosystem even better.

    官方:https://www.elastic.co/products/hadoop
    下载:https://www.elastic.co/downloads/hadoop

    目前最新的版本是6.6.2

    # wget https://artifacts.elastic.co/downloads/elasticsearch-hadoop/elasticsearch-hadoop-6.6.2.zip
    # unzip elasticsearch-hadoop-6.6.2.zip

    使用其中的elasticsearch-hadoop-6.6.2/dist/elasticsearch-hadoop-hive-6.6.2.jar

    add jar /path/to/elasticsearch-hadoop-hive-6.6.2.jar;

    CREATE EXTERNAL TABLE hive_elasticsearch_table (
    id string,
    name string,
    desc string
    )
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES(
    'es.nodes' = '$es_server1:9200,$es_server2:9200',
    'es.index.auto.create' = 'false',
    'es.resource' = 'testdoc/testtype',
    'es.read.metadata' = 'true',
    'es.mapping.names' = 'id:_metadata._id, name:name, desc:desc');

     主要是配置es.nodes、es.resource和es.mapping.names,一个是es服务器地址,一个是index名和type名,一个是hive字段和es字段的一一映射,然后就可以在hive中读写es数据:

    select * from hive_elasticsearch_table limit 10;
    insert into table hive_elasticsearch_table select '2', 'testname', 'testdesc';

    但是这样发现id是被hash过的

    +------------------------------+--------------------------------+--------------------------------+--+
    | hive_elasticsearch_table.id | hive_elasticsearch_table.name | hive_elasticsearch_table.desc |
    +------------------------------+--------------------------------+--------------------------------+--+
    | 6mpoc2gBohlnD12tvBoF | testname | testdesc |
    +------------------------------+--------------------------------+--------------------------------+--+

    还需要再加一个es.mapping.id,定义哪个字段是document的id

    CREATE EXTERNAL TABLE hive_elasticsearch_table (
    id string,
    name string,
    desc string
    )
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES(
    'es.nodes' = '$es_server1:9200,$es_server2:9200',
    'es.index.auto.create' = 'false',
    'es.resource' = 'testdoc/testtype',
    'es.read.metadata' = 'true',
    'es.mapping.id' = 'id',
    'es.mapping.names' = 'id:_metadata._id, name:name, desc:desc');

    这次正常了

    +------------------------------+--------------------------------+--------------------------------+--+
    | hive_elasticsearch_table.id | hive_elasticsearch_table.name | hive_elasticsearch_table.desc |
    +------------------------------+--------------------------------+--------------------------------+--+
    | 6mpoc2gBohlnD12tvBoF | testname | testdesc |
    | 4 | hello | world |
    +------------------------------+--------------------------------+--------------------------------+--+

    关于字段类型映射,详见:https://www.elastic.co/guide/en/elasticsearch/hadoop/current/mapping.html 

  • 相关阅读:
    优先队列
    Problem W UVA 662 二十三 Fast Food
    UVA 607 二十二 Scheduling Lectures
    UVA 590 二十一 Always on the run
    UVA 442 二十 Matrix Chain Multiplication
    UVA 437 十九 The Tower of Babylon
    UVA 10254 十八 The Priest Mathematician
    UVA 10453 十七 Make Palindrome
    UVA 10163 十六 Storage Keepers
    UVA 1252 十五 Twenty Questions
  • 原文地址:https://www.cnblogs.com/barneywill/p/10300951.html
Copyright © 2011-2022 走看看