zoukankan      html  css  js  c++  java
  • hive 学习系列五(hive 和elasticsearch 的交互,很详细哦,我又来吹liubi了)

    hive 操作elasticsearch

    一,从hive 表格向elasticsearch 导入数据

    1,首先,创建elasticsearch 索引,索引如下

    curl -XPUT '10.81.179.209:9200/zebra_info_demo?pretty' -H 'Content-Type: application/json' -d'
    {
        "settings": {
            "number_of_shards":5,
            "number_of_replicas":2
        },
        "mappings": {
             "zebra_info": {
                  "properties": {
                        "name" : {"type" : "text"},
                        "type": {"type": "text"},
                        "province": {"type": "text"},
                        "city": {"type": "text"},
                        "citycode": {"type": "text", "index": "no"},
                        "district": {"type": "text"},
                        "adcode": {"type": "text", "index": "no"},
                        "township": {"type": "text"},
                        "bausiness_circle": {"type": "text"},
                        "formatted_address": {"type": "text"},
                        "location": {"type": "geo_point"},
                        "extensions": {
                          "type": "nested",
                          "properties": {
                            "map_lat": {"type": "double", "index": "no"},
                            "map_lng": {"type": "double", "index": "no"},
                            "avg_price": {"type": "double", "index": "no"},
                            "shops": {"type":"short", "index": "no"},
                            "good_comments": {"type":"short", "index": "no"},
                            "lvl": {"type":"short", "index": "no"},
                            "leisure_type": {"type": "text", "index": "no"},
                            "fun_type": {"type": "text", "index": "no"},
                            "numbers": {"type": "short", "index": "no"}
                           }
                       }
                 }
            }
        }
    }
    '
    

    2,查看elasticsearch版本,下载相应的elasticsearch-hive-hadoop jar 包

    可以用如下命令查看elastic search 的版本
    本文版本5.6.9
    image.png

    到如下maven 官网下载jar 包。
    https://repo.maven.apache.org/maven2/org/elasticsearch/elasticsearch-hadoop-hive/
    选择正确的版本即可。

    3, 把下载下来的jar 包上传到hdfs 路径下。

    本文jar 包路径,hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar
    image.png

    4,哦了,建表,用起来

    DELETE jars;
    add jar hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar;
    drop table zebra_info_demo;
    CREATE EXTERNAL  TABLE zebra_info_demo(
    name string,
    `type` string,
    province double,
    city string,
    citycode string,
    district string,
    adcode string,
    township string,
    business_circle string,
    formatted_address string,
    location string,
    extensions STRUCT<map_lat:double, map_lng:double, avg_price:double, shops:smallint, good_comments:smallint, lvl:smallint, leisure_type:STRING, fun_type:STRING, numbers:smallint>
    )
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
    TBLPROPERTIES('es.nodes' = '10.81.179.209:9200',
    'es.index.auto.create' = 'false',
    'es.resource' = 'zebra_info_demo/zebra_info',
    'es.read.metadata' = 'true',
    'es.mapping.names' = 'name:name, type:type, province:province, city:city, citycode:citycode, district:district, adcode:adcode, township:township, business_circle:business_circle, formatted_address:formatted_address, location:location, extensions:extensions');  
    
    
    

    5, 往里面填充数据,就O了。

    INSERT INTO TABLE zebra_info_demo
    SELECT 
    a.name,
    a.brands,
    a.province,
    a.city,
    null as citycode,
    null as district,
    null as adcode,
    null as township,
    a.business_circle,
    null as formatted_address,
    concat(a.map_lat, ', ', a.map_lng) as `location`,
    named_struct('map_lat', cast(a.map_lat as double), 'map_lng',cast(a.map_lng as double) ,'avg_price', cast(0 as DOUBLE), 'shops', 0S,  'good_comments', 0S, 'lvl', cast(a.lv1 as SMALLINT), 'leisure_type', '', 'fun_type', '', 'numbers', 0S) as extentions
    from medicalsite_childclinic a;
    

    运行结果:
    部分截图

    二,已知elasticsearch 索引,然后,建立hive 表格和elasticsearch 进行交互。可以join 哦,一个字,liubi

    1,先看一下索引和数据

    已知索引如下:

    curl -XPUT  '10.81.179.209:9200/join_tests?pretty' -H 'Content-Type: application/json' -d'
    {
      "mappings": {
        "cities": {
          "properties": {
            "province": {
              "type": "string"
            },
            "city": {
              "type": "string"
            }
          }
        }
        }
      }
    }
    '
    
    curl -XPUT  '10.81.179.209:9200/join_tests1?pretty' -H 'Content-Type: application/json' -d'
    {
      "mappings": {
        "shop": {
          "properties":{
            "name": {
              "type": "string"
            },
            "city": {
              "type": "string"
            }
          }
        }
       }
      }
    }
    '
    
    
    
    
    
    

    数据如下:
    join_test

    join_test1

    2,建立表格,写一堆有毒的sql 语句。

    DELETE jars;
    add jar hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar;
    create table join_tests(
        province string,
        city string
    )STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
    TBLPROPERTIES('es.nodes' = '10.81.179.209:9200',
    'es.index.auto.create' = 'false',
    'es.resource' = 'join_tests/cities',
    'es.read.metadata' = 'true',
    'es.mapping.names' = 'province:province, city:city');
    
    create table join_tests1(
        name string,
        city string
    )STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
    TBLPROPERTIES('es.nodes' = '10.81.179.209:9200',
    'es.index.auto.create' = 'false',
    'es.resource' = 'join_tests1/shop',
    'es.read.metadata' = 'true',
    'es.mapping.names' = 'name:name, city:city');
    
    
    
    
    SELECT 
        a.province,
        b.city,
        b.name
    from join_tests a LEFT JOIN join_tests1 b on a.city = b.city;
    

    3,运行结果

    运行结果

    结束语

    推荐一个useful 的工具, apache Hue, 可以用来管理hdfs 文件,hive 操作。mysql 操作等。

  • 相关阅读:
    SpringBoot构建大数据开发框架
    阿里云 docker连接总报超时 registry.cn-hangzhou.aliyuncs.com (Client.Timeout exceeded while awaiting headers
    这些保护Spring Boot 应用的方法,你都用了吗?
    那些年让你迷惑的阻塞、非阻塞、异步、同步
    spring data jpa 分页查询
    如何在Windows 10上运行Docker和Kubernetes?
    Spring Mvc和Spring Boot配置Tomcat支持Https
    Why I don't want use JPA anymore
    Spring Data JPA Batch Insertion
    MySQL 到底能不能放到 Docker 里跑?
  • 原文地址:https://www.cnblogs.com/unnunique/p/9362112.html
Copyright © 2011-2022 走看看