zoukankan      html  css  js  c++  java
  • python elasticsearch环境搭建

    windows linux环境搭建

    windows下载zip
    linux下载tar
    下载地址:https://www.elastic.co/downloads/elasticsearch

    解压后运行:bin/elasticsearch (or binelasticsearch.bat on Windows)
    检查是否成功:访问 http://localhost:9200

    linux下不能以root用户运行,
    普通用户运行报错:
    java.nio.file.AccessDeniedException

    原因:当前用户没有执行权限
    解决方法: chown linux用户名 elasticsearch安装目录 -R
    例如:chown ealsticsearch /data/wwwroot/elasticsearch-6.2.4 -R
    PS:其他Java软件报.AccessDeniedException错误也可以同样方式解决,给 执行用户相应的目录权限即可

    代码实例

    如下的代码实现类似链家网小区搜索功能。
    从文件读取小区及地址信息写入es,然后通过小区所在城市code及搜索关键字 匹配到对应小区。
    代码主要包含三部分内容:
    1.创建索引
    2.用bulk将批量数据存储到es
    3.数据搜索
    注意:
    代码的es版本交低2.xx版本,高版本在创建的索引数据类型有所不同

    #coding:utf8
    from __future__ import unicode_literals
    import os
    import time
    import config
    from datetime import datetime
    from elasticsearch import Elasticsearch
    from elasticsearch.helpers import bulk
    
    class ElasticSearch():
        def __init__(self, index_name,index_type,ip ="127.0.0.1"):
            '''
            :param index_name: 索引名称
            :param index_type: 索引类型
            '''
            self.index_name =index_name
            self.index_type = index_type
            # 无用户名密码状态
            #self.es = Elasticsearch([ip])
            #用户名密码状态
            self.es = Elasticsearch([ip],http_auth=('elastic', 'password'),port=9200)
        def create_index(self,index_name="ftech360",index_type="community"):
            '''
            创建索引,创建索引名称为ott,类型为ott_type的索引
            :param ex: Elasticsearch对象
            :return:
            '''
            #创建映射
            _index_mappings = {
                "mappings": {
                    self.index_type: {
                        "properties": {
                            "city_code": {
                                "type": "string",
                                # "index": "not_analyzed"
                            },
                            "name": {
                                "type": "string",
                                # "index": "not_analyzed"
                            },
                            "address": {
                                "type": "string",
                                # "index": "not_analyzed"
                            }
                        }
                    }
    
                }
            }
            if self.es.indices.exists(index=self.index_name) is True:
                self.es.indices.delete(index=self.index_name)
            res = self.es.indices.create(index=self.index_name, body=_index_mappings)
            print res
    
        def build_data_dict(self):
            name_dict = {}
            with open(os.path.join(config.datamining_dir,'data_output','house_community.dat')) as f:
                for line in f:
                    line_list = line.decode('utf-8').split('	')
                    community_code = line_list[6]
                    name = line_list[7]
                    city_code = line_list[0]
                    name_dict[community_code] = (name,city_code)
    
            address_dict = {}
            with open(os.path.join(config.datamining_dir,'data_output','house_community_detail.dat')) as f:
                for line in f:
                    line_list = line.decode('utf-8').split('	')
                    community_code = line_list[6]
                    address = line_list[10]
                    address_dict[community_code] = address
    
            return name_dict,address_dict
    
        def bulk_index_data(self,name_dict,address_dict):
            '''
            用bulk将批量数据存储到es
            :return:
            '''
            list_data = []
            for community_code, data in name_dict.items():
                tmp = {}
                tmp['code'] = community_code
                tmp['name'] = data[0]
                tmp['city_code'] = data[1]
                
                if community_code in address_dict:
                    tmp['address'] = address_dict[community_code]
                else:
                    tmp['address'] = ''
    
                list_data.append(tmp)
            ACTIONS = []
            for line in list_data:
                action = {
                    "_index": self.index_name,
                    "_type": self.index_type,
                    "_id": line['code'], #_id 小区code
                    "_source": {
                        "city_code": line['city_code'],
                        "name": line['name'],
                        "address": line['address']
                        }
                }
                ACTIONS.append(action)
                # 批量处理
            success, _ = bulk(self.es, ACTIONS, index=self.index_name, raise_on_error=True)
            #单条写入 单条写入速度很慢
            #self.es.index(index=self.index_name,doc_type="doc_type_test",body = action)
    
            print('Performed %d actions' % success)
    
        def delete_index_data(self,id):
            '''
            删除索引中的一条
            :param id:
            :return:
            '''
            res = self.es.delete(index=self.index_name, doc_type=self.index_type, id=id)
            print res
    
        def get_data_id(self,id):
            res = self.es.get(index=self.index_name, doc_type=self.index_type,id=id)
            # # 输出查询到的结果
            print res['_source']['city_code'], res['_id'],  res['_source']['name'], res['_source']['address']
    
        def get_data_by_body(self, name, city_code):
            # doc = {'query': {'match_all': {}}}
            doc = {
                "query": {
                    "bool":{
                        "filter":{
                            "term":{
                            "city_code": city_code
                            }
                        },
                        "must":{
                            "multi_match": {
                                "query": name,
                                "type":"phrase_prefix",
                                "fields": ['name^3', 'address'],
                                "slop":1,
                                
                                }
    
                        }
                    }
                }
            }
            _searched = self.es.search(index=self.index_name, doc_type=self.index_type, body=doc)
            data = _searched['hits']['hits']
            return data
             
    
    if __name__=='__main__':
        #数据插入es
        obj = ElasticSearch("ftech360","community")
        obj.create_index()
        name_dict, address_dict = obj.build_data_dict()
        obj.bulk_index_data(name_dict,address_dict)
    
        #从es读取数据
        obj2 = ElasticSearch("ftech360","community")
        obj2.get_data_by_body(u'保利','510100')
    
    
  • 相关阅读:
    渚漪Day18——JavaWeb 09【JSP】
    渚漪Day17——JavaWeb 08【Session】
    渚漪Day16——JavaWeb 07【Cookie】
    渚漪Day15——JavaWeb 06【HTTPServletRequest】
    渚漪Day14——JavaWeb 05【HTTPServletResponse】
    Typora编写markdown 常用入门
    Vue 笔记
    ABCNN 学习笔记
    DSSM 学习笔记
    支持向量机 SVM 学习笔记
  • 原文地址:https://www.cnblogs.com/i-love-python/p/11443978.html
Copyright © 2011-2022 走看看