zoukankan      html  css  js  c++  java
  • Django-DRF中使用Elasticsearch ,使用IK分词

    一.安装依赖

    django-haystack==2.8.1
    drf-haystack==1.8.6
    Django==2.0.5
    djangrestframework==3.8.2
    elasticsearch==6.4.0

    二.安装JAVA SDK

    先到官网下载安装包:

    下载链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    因为我装的Elasticsearch的版本是2.4.1,安装的JDK==1.8,ES 2.x后的版本使用haystack会有不兼容问题.

    安装步骤:

    # 首先:
    cd /usr/local/
    mkdir javajdk
    # 将下载的文件上传到:
    /usr/local/javajdk
    # 将文件解压到此文件夹
    tar -xzvf jdk-8u231-linux-i586.tar.gz 
    mv jdk1.8.0_231 java
    # 配置环境变量:
    vim /etc/profile

    # 在文件最后添加这几行:

    export JAVA_HOME=/usr/local/javajdk/java
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOME}/bin:$PATH

     # 然后

     source /etc/profile

    出现下面的提示则代表安装成功:

    三.安装Elasticsearch

    下载地址:https://www.elastic.co/cn/downloads/past-releases#elasticsearch

    要注意的是Elasticsearch在root用户下启动是会报错的!

    首先要新建用户:

    useradd -g elastic elastic
    # 在/home新建用户目录
    mkdir elastic
    # 将下载的安装包上传到 elastic 目录下
    tar -xzvf elasticsearch-2.4.1.tar.gz -C /home/elastic/
    # 给此目录授权
    chown -R elastic:elastic elastic
    # 切换用户
    su - elastic
    # 修改配置文件:
    vim /home/elastic/elasticsearch-2.4.1/config/elasticsearch.yml
    # 修改内容
    path.data: /home/elastic/elasticsearch-2.4.1/data
    path.logs: /home/elastic/elasticsearch-2.4.1/logs
    network.host: 172.xxx.xxx.xxx
    http.cors.allow-origin: "*"
    # 如果没有data与logs在相关目录下建立

    # 启动ES,在elasticsearch的bin目录下:
    ./elasticsearch

    如果在浏览器中看到上面的内容,则表示安装成功!

    如果出错解决方法:

    1.最大文件描述符太少了,至少要65536,修改/etc/security/limits.conf文件
    命令:vim /etc/security/limits.conf
    内容修改为:* hard nofile 65536
    
    2.一个进程可以拥有的VMA(虚拟内存区域)的数量太少了,至少要262144,修改文件  
    命令:vim /etc/sysctl.conf
    增加内容为:vm.max_map_count=262144
    
    3.最大线程太少了,至少要4096,修改/etc/security/limits.conf文件
    命令:vim /etc/security/limits.conf
    增加内容为:* hard nproc 65536

    四.安装IK分词插件

    下载安装包:

    下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v5.0.0

    所选版本应于ES版本对应:

    ES 2.4.1 对应 IK 版本是 1.10.1

    将安装包解压到es的安装目录/plugin/ik

    如果/plugin下面没有ik目录需要自己手动创建

    五.可视化插件安装。

    1.插件安装方式(推荐)
    #在Elasticsearch目录下
    elasticsearch/bin/plugin install mobz/elasticsearch-head
    
    2.下载安装方式
    从https://github.com/mobz/elasticsearch-head下载ZIP包。
    
    在 elasticsearch  目录下创建目录/plugins/head/_site 并且将刚刚解压的elasticsearch-head-master目录下所有内容COPY到当前创建的/plugins/head/_site/目录下即可。
    
    需要注意的是在5.xx后的版本,安装方法与这个不一样!
    
    3.重启elasticsearch访问:
     访问地址是http://{你的ip地址}:9200/_plugin/head/
     http  端口默认是9200  

    六.集群搭建

    Elasticsearch集群搭建:

    1. 准备三台elasticsearch服务器

      创建elasticsearch-cluster文件夹,在内部复制三个elasticsearch服务

    2. 修改每台服务器配置

      修改elasticsearch-cluster ode*configelasticsearch.yml

    如果在现有单机版本的基础上节点进行复制,需要注意的是,在当前节点的安装目录/elasticsearch/data中不能有数据,否则搭建集群会失败.需要删除data目录

    # 节点1的配置信息
    # 集群名称,保证唯一
    cluster.name:my-elasticsearch
    # 节点名称,必须不一样
    node.name:node-1
    # 必须为本机的ip地址
    network.host:172.xxx.xxx.xxx
    # 服务器端口号,在同一机器下必须不一样
    http:port:9200
    # 集群间通信端口号,在同一机器下必须不一样
    transport.tcp.port:9300
    # 设置集群自动发现机器ip集合
    discovery.zen.ping.unicast.host:["172.xxx.xxx.xxx:9300",'172.xxx.xxx.xxx:9301',"172.xxx.xxx.xxx:9303"]

     将服务启动即可

    七.在Django中配置

    首先要在app中创建一个 search_indexes.py 文件这是这django-haystack规定的 

    django-haystack:文档地址:https://django-haystack.readthedocs.io/en/master/tutorial.html#configuration

    drf-haystack:文档地址:https://drf-haystack.readthedocs.io/en/latest/07_faceting.html#serializing-faceted-results

    创建模型类:

    from django.db import models
    
    class Article(models.Model):
        title = models.CharField(max_length=128)
        files = models.FileField(upload_to='%Y/%m/')
        content = models.TextField(default='')

    创建索引类:

    from haystack import indexes
    from app001.models import Article
    
    class DocsIndex(indexes.SearchIndex, indexes.Indexable):
        # 1.构建的索引字段
        text = indexes.CharField(document=True, use_template=True)
        files = indexes.CharField(model_attr='files')
        content = indexes.CharField(model_attr='content')
    
        # 2.指定模型类
        def get_model(self):
            return Article
    
        # 3.提供数据集
        def index_queryset(self, using=None):
            """Used when the entire index for model is updated."""
            return self.get_model().objects.all()

    view视图:

    mport os
    import datetime
    import uuid
    
    from rest_framework.views import APIView
    from rest_framework import serializers
    from rest_framework.response import Response
    from django.conf import settings
    from drf_haystack.serializers import HaystackSerializer
    from drf_haystack.viewsets import HaystackViewSet
    
    from .models import Article
    from .search_indexes import DocsIndex
    
    
    class DemoSerializer(serializers.ModelSerializer):
        """
        序列化器
        """
        class Meta:
            model = Article
            fields = ('id', 'title','files')
    
    
    
    class LocationSerializer(HaystackSerializer):
        object = DemoSerializer(read_only=True)  # 只读,不可以进行反序列化
    
        class Meta:
            # The `index_classes` attribute is a list of which search indexes
            # we want to include in the search.
            index_classes = [DocsIndex]
    
            # The `fields` contains all the fields we want to include.
            # NOTE: Make sure you don't confuse these with model attributes. These
            # fields belong to the search index!
            fields = [
                 "text","files","id","title"
            ]
     
    class LocationSearchView(HaystackViewSet):
    
        # `index_models` is an optional list of which models you would like to include
        # in the search result. You might have several models indexed, and this provides
        # a way to filter out those of no interest for this particular view.
        # (Translates to `SearchQuerySet().models(*index_models)` behind the scenes.
        index_models = [Article]
    
        serializer_class = LocationSerializer

    setting配置:

    INSTALLED_APPS = [
        'django.contrib.admin',
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'rest_framework',
        'silk',
        'debug_toolbar',
        'haystack',
        'app001',
    ]


    # 搜索引擎配置:
    # haystack配置
    HAYSTACK_CONNECTIONS = {
    'default': {
    # 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
    'ENGINE': 'app001.elasticsearch_ik_backend.IKSearchEngine', # 如果配置分词需要重新制定引擎,下面会写到
    'URL': 'http://172.16.xxx.xxx:9200/',   # elasticseach 服务地址
    'INDEX_NAME': 'haystack', # 索引名称
    },
    }
    # 保持索引都是最新的
    HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
    # 搜索显示的最多条数
    HAYSTACK_SEARCH_RESULTS_PER_PAGE = 50

    重写ik分词配置引擎:

    在app中建立 elasticsearch_ik_backend.py 文件:

    from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend
    from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine
    class IKSearchBackend(ElasticsearchSearchBackend):
        DEFAULT_ANALYZER = "ik_max_word" # 这里将 es 的 默认 analyzer 设置为 ik_max_word
    
        def __init__(self, connection_alias, **connection_options):
            super().__init__(connection_alias, **connection_options)
    
        def build_schema(self, fields):
            content_field_name, mapping = super(IKSearchBackend, self).build_schema(fields)
            for field_name, field_class in fields.items():
                field_mapping = mapping[field_class.index_fieldname]
                if field_mapping["type"] == "string" and field_class.indexed:
                    if not hasattr(
                        field_class, "facet_for"
                    ) and not field_class.field_type in ("ngram", "edge_ngram"):
                        field_mapping["analyzer"] = getattr(
                            field_class, "analyzer", self.DEFAULT_ANALYZER
                        )
                mapping.update({field_class.index_fieldname: field_mapping})
            return content_field_name, mapping
    
    
    class IKSearchEngine(ElasticsearchSearchEngine):
        backend = IKSearchBackend

    在django中使用drf-haystack对查询还不是很全:

    在这我使用python 的 elasticsearch 进行查询:def-haystack的查询我觉得并不是很好用:

    class EsSearch(APIView):
        def get(self,request):
            es = Elasticsearch(["http://xxx.xxx.xxx.xxx:9200"])
            query = request.GET.get("query")
         # 这里面的搜索方式可以定制你自己想要用的查询:
          
         # https://www.elastic.co/guide/cn/elasticsearch/guide/current/match-query.html
    body = { "query":{ "multi_match": { "query": "%s" % query, "fields": [ "text", "content" ] } }, "highlight":{ "fields":{ "content":{}, "text":{} } } } result = es.search(index="haystack", doc_type="modelresult", body=body) return Response(result)

    url配置:

    """tool_bar URL Configuration
    
    The `urlpatterns` list routes URLs to views. For more information please see:
        https://docs.djangoproject.com/en/2.0/topics/http/urls/
    Examples:
    Function views
        1. Add an import:  from my_app import views
        2. Add a URL to urlpatterns:  path('', views.home, name='home')
    Class-based views
        1. Add an import:  from other_app.views import Home
        2. Add a URL to urlpatterns:  path('', Home.as_view(), name='home')
    Including another URLconf
        1. Import the include() function: from django.urls import include, path
        2. Add a URL to urlpatterns:  path('blog/', include('blog.urls'))
    """
    from django.contrib import admin
    from django.urls import path
    from django.conf import settings
    from django.conf.urls import url,include
    from django.conf.urls.static import static
    from django.conf import settings
    
    
    from app001.views import Index,Uploads
    from rest_framework import routers
    
    from app001.views import LocationSearchView,EsSearch
    from app002.views import BlogView
    
    # drf-haystack查询
    router = routers.DefaultRouter()
    router.register("search", LocationSearchView,base_name="location-search")
    
    urlpatterns = [
        # 使用自定义查询
        url(r'elastic_search/',EsSearch.as_view()),
      
    url(r"api/", include(router.urls)),
    ] 

    查询展示:

  • 相关阅读:
    BFS visit tree
    Kth Largest Element in an Array 解答
    Merge k Sorted Lists 解答
    Median of Two Sorted Arrays 解答
    Maximal Square 解答
    Best Time to Buy and Sell Stock III 解答
    Best Time to Buy and Sell Stock II 解答
    Best Time to Buy and Sell Stock 解答
    Triangle 解答
    Unique Binary Search Trees II 解答
  • 原文地址:https://www.cnblogs.com/zhaijihai/p/12167923.html
Copyright © 2011-2022 走看看