zoukankan      html  css  js  c++  java
  • 【原创】大叔经验分享(26)hive通过外部表读写elasticsearch数据

    hive通过外部表读写elasticsearch数据,和读写hbase数据差不多,差别是需要下载elasticsearch-hadoop-hive-6.6.2.jar,然后使用其中的EsStorageHandler;

    Connect the massive data storage and deep processing power of Hadoop with the real-time search and analytics of Elasticsearch. The Elasticsearch-Hadoop (ES-Hadoop) connector lets you get quick insight from your big data and makes working in the Hadoop ecosystem even better.

    官方:https://www.elastic.co/products/hadoop
    下载:https://www.elastic.co/downloads/hadoop

    目前最新的版本是6.6.2

    # wget https://artifacts.elastic.co/downloads/elasticsearch-hadoop/elasticsearch-hadoop-6.6.2.zip
    # unzip elasticsearch-hadoop-6.6.2.zip

    使用其中的elasticsearch-hadoop-6.6.2/dist/elasticsearch-hadoop-hive-6.6.2.jar

    add jar /path/to/elasticsearch-hadoop-hive-6.6.2.jar;

    CREATE EXTERNAL TABLE hive_elasticsearch_table (
    id string,
    name string,
    desc string
    )
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES(
    'es.nodes' = '$es_server1:9200,$es_server2:9200',
    'es.index.auto.create' = 'false',
    'es.resource' = 'testdoc/testtype',
    'es.read.metadata' = 'true',
    'es.mapping.names' = 'id:_metadata._id, name:name, desc:desc');

     主要是配置es.nodes、es.resource和es.mapping.names,一个是es服务器地址,一个是index名和type名,一个是hive字段和es字段的一一映射,然后就可以在hive中读写es数据:

    select * from hive_elasticsearch_table limit 10;
    insert into table hive_elasticsearch_table select '2', 'testname', 'testdesc';

    但是这样发现id是被hash过的

    +------------------------------+--------------------------------+--------------------------------+--+
    | hive_elasticsearch_table.id | hive_elasticsearch_table.name | hive_elasticsearch_table.desc |
    +------------------------------+--------------------------------+--------------------------------+--+
    | 6mpoc2gBohlnD12tvBoF | testname | testdesc |
    +------------------------------+--------------------------------+--------------------------------+--+

    还需要再加一个es.mapping.id,定义哪个字段是document的id

    CREATE EXTERNAL TABLE hive_elasticsearch_table (
    id string,
    name string,
    desc string
    )
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES(
    'es.nodes' = '$es_server1:9200,$es_server2:9200',
    'es.index.auto.create' = 'false',
    'es.resource' = 'testdoc/testtype',
    'es.read.metadata' = 'true',
    'es.mapping.id' = 'id',
    'es.mapping.names' = 'id:_metadata._id, name:name, desc:desc');

    这次正常了

    +------------------------------+--------------------------------+--------------------------------+--+
    | hive_elasticsearch_table.id | hive_elasticsearch_table.name | hive_elasticsearch_table.desc |
    +------------------------------+--------------------------------+--------------------------------+--+
    | 6mpoc2gBohlnD12tvBoF | testname | testdesc |
    | 4 | hello | world |
    +------------------------------+--------------------------------+--------------------------------+--+

    关于字段类型映射,详见:https://www.elastic.co/guide/en/elasticsearch/hadoop/current/mapping.html 

  • 相关阅读:
    redis 1 简单介绍和存储的数据结构
    mysql 14 覆盖索引+回表
    mysql 13 B+tree中存储数据的格式 页
    java Arrays.asList() 数组转集合
    java 迭代器
    mysql 12 SQL优化策略
    mysql 11 执行计划
    mysql 10 索引面试题分享
    搭建一个开源项目2-打造另一个环境以及解决上期问题
    搭建一个开源项目1-如何搭建Linux虚拟机
  • 原文地址:https://www.cnblogs.com/barneywill/p/10300951.html
Copyright © 2011-2022 走看看