zoukankan      html  css  js  c++  java
  • Django-DRF中使用Elasticsearch ,使用IK分词

    一.安装依赖

    django-haystack==2.8.1
    drf-haystack==1.8.6
    Django==2.0.5
    djangrestframework==3.8.2
    elasticsearch==6.4.0

    二.安装JAVA SDK

    先到官网下载安装包:

    下载链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    因为我装的Elasticsearch的版本是2.4.1,安装的JDK==1.8,ES 2.x后的版本使用haystack会有不兼容问题.

    安装步骤:

    # 首先:
    cd /usr/local/
    mkdir javajdk
    # 将下载的文件上传到:
    /usr/local/javajdk
    # 将文件解压到此文件夹
    tar -xzvf jdk-8u231-linux-i586.tar.gz 
    mv jdk1.8.0_231 java
    # 配置环境变量:
    vim /etc/profile

    # 在文件最后添加这几行:

    export JAVA_HOME=/usr/local/javajdk/java
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOME}/bin:$PATH

     # 然后

     source /etc/profile

    出现下面的提示则代表安装成功:

    三.安装Elasticsearch

    下载地址:https://www.elastic.co/cn/downloads/past-releases#elasticsearch

    要注意的是Elasticsearch在root用户下启动是会报错的!

    首先要新建用户:

    useradd -g elastic elastic
    # 在/home新建用户目录
    mkdir elastic
    # 将下载的安装包上传到 elastic 目录下
    tar -xzvf elasticsearch-2.4.1.tar.gz -C /home/elastic/
    # 给此目录授权
    chown -R elastic:elastic elastic
    # 切换用户
    su - elastic
    # 修改配置文件:
    vim /home/elastic/elasticsearch-2.4.1/config/elasticsearch.yml
    # 修改内容
    path.data: /home/elastic/elasticsearch-2.4.1/data
    path.logs: /home/elastic/elasticsearch-2.4.1/logs
    network.host: 172.xxx.xxx.xxx
    http.cors.allow-origin: "*"
    # 如果没有data与logs在相关目录下建立

    # 启动ES,在elasticsearch的bin目录下:
    ./elasticsearch

    如果在浏览器中看到上面的内容,则表示安装成功!

    如果出错解决方法:

    1.最大文件描述符太少了,至少要65536,修改/etc/security/limits.conf文件
    命令:vim /etc/security/limits.conf
    内容修改为:* hard nofile 65536
    
    2.一个进程可以拥有的VMA(虚拟内存区域)的数量太少了,至少要262144,修改文件  
    命令:vim /etc/sysctl.conf
    增加内容为:vm.max_map_count=262144
    
    3.最大线程太少了,至少要4096,修改/etc/security/limits.conf文件
    命令:vim /etc/security/limits.conf
    增加内容为:* hard nproc 65536

    四.安装IK分词插件

    下载安装包:

    下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v5.0.0

    所选版本应于ES版本对应:

    ES 2.4.1 对应 IK 版本是 1.10.1

    将安装包解压到es的安装目录/plugin/ik

    如果/plugin下面没有ik目录需要自己手动创建

    五.可视化插件安装。

    1.插件安装方式(推荐)
    #在Elasticsearch目录下
    elasticsearch/bin/plugin install mobz/elasticsearch-head
    
    2.下载安装方式
    从https://github.com/mobz/elasticsearch-head下载ZIP包。
    
    在 elasticsearch  目录下创建目录/plugins/head/_site 并且将刚刚解压的elasticsearch-head-master目录下所有内容COPY到当前创建的/plugins/head/_site/目录下即可。
    
    需要注意的是在5.xx后的版本,安装方法与这个不一样!
    
    3.重启elasticsearch访问:
     访问地址是http://{你的ip地址}:9200/_plugin/head/
     http  端口默认是9200  

    六.集群搭建

    Elasticsearch集群搭建:

    1. 准备三台elasticsearch服务器

      创建elasticsearch-cluster文件夹,在内部复制三个elasticsearch服务

    2. 修改每台服务器配置

      修改elasticsearch-cluster ode*configelasticsearch.yml

    如果在现有单机版本的基础上节点进行复制,需要注意的是,在当前节点的安装目录/elasticsearch/data中不能有数据,否则搭建集群会失败.需要删除data目录

    # 节点1的配置信息
    # 集群名称,保证唯一
    cluster.name:my-elasticsearch
    # 节点名称,必须不一样
    node.name:node-1
    # 必须为本机的ip地址
    network.host:172.xxx.xxx.xxx
    # 服务器端口号,在同一机器下必须不一样
    http:port:9200
    # 集群间通信端口号,在同一机器下必须不一样
    transport.tcp.port:9300
    # 设置集群自动发现机器ip集合
    discovery.zen.ping.unicast.host:["172.xxx.xxx.xxx:9300",'172.xxx.xxx.xxx:9301',"172.xxx.xxx.xxx:9303"]

     将服务启动即可

    七.在Django中配置

    首先要在app中创建一个 search_indexes.py 文件这是这django-haystack规定的 

    django-haystack:文档地址:https://django-haystack.readthedocs.io/en/master/tutorial.html#configuration

    drf-haystack:文档地址:https://drf-haystack.readthedocs.io/en/latest/07_faceting.html#serializing-faceted-results

    创建模型类:

    from django.db import models
    
    class Article(models.Model):
        title = models.CharField(max_length=128)
        files = models.FileField(upload_to='%Y/%m/')
        content = models.TextField(default='')

    创建索引类:

    from haystack import indexes
    from app001.models import Article
    
    class DocsIndex(indexes.SearchIndex, indexes.Indexable):
        # 1.构建的索引字段
        text = indexes.CharField(document=True, use_template=True)
        files = indexes.CharField(model_attr='files')
        content = indexes.CharField(model_attr='content')
    
        # 2.指定模型类
        def get_model(self):
            return Article
    
        # 3.提供数据集
        def index_queryset(self, using=None):
            """Used when the entire index for model is updated."""
            return self.get_model().objects.all()

    view视图:

    mport os
    import datetime
    import uuid
    
    from rest_framework.views import APIView
    from rest_framework import serializers
    from rest_framework.response import Response
    from django.conf import settings
    from drf_haystack.serializers import HaystackSerializer
    from drf_haystack.viewsets import HaystackViewSet
    
    from .models import Article
    from .search_indexes import DocsIndex
    
    
    class DemoSerializer(serializers.ModelSerializer):
        """
        序列化器
        """
        class Meta:
            model = Article
            fields = ('id', 'title','files')
    
    
    
    class LocationSerializer(HaystackSerializer):
        object = DemoSerializer(read_only=True)  # 只读,不可以进行反序列化
    
        class Meta:
            # The `index_classes` attribute is a list of which search indexes
            # we want to include in the search.
            index_classes = [DocsIndex]
    
            # The `fields` contains all the fields we want to include.
            # NOTE: Make sure you don't confuse these with model attributes. These
            # fields belong to the search index!
            fields = [
                 "text","files","id","title"
            ]
     
    class LocationSearchView(HaystackViewSet):
    
        # `index_models` is an optional list of which models you would like to include
        # in the search result. You might have several models indexed, and this provides
        # a way to filter out those of no interest for this particular view.
        # (Translates to `SearchQuerySet().models(*index_models)` behind the scenes.
        index_models = [Article]
    
        serializer_class = LocationSerializer

    setting配置:

    INSTALLED_APPS = [
        'django.contrib.admin',
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'rest_framework',
        'silk',
        'debug_toolbar',
        'haystack',
        'app001',
    ]


    # 搜索引擎配置:
    # haystack配置
    HAYSTACK_CONNECTIONS = {
    'default': {
    # 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
    'ENGINE': 'app001.elasticsearch_ik_backend.IKSearchEngine', # 如果配置分词需要重新制定引擎,下面会写到
    'URL': 'http://172.16.xxx.xxx:9200/',   # elasticseach 服务地址
    'INDEX_NAME': 'haystack', # 索引名称
    },
    }
    # 保持索引都是最新的
    HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
    # 搜索显示的最多条数
    HAYSTACK_SEARCH_RESULTS_PER_PAGE = 50

    重写ik分词配置引擎:

    在app中建立 elasticsearch_ik_backend.py 文件:

    from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend
    from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine
    class IKSearchBackend(ElasticsearchSearchBackend):
        DEFAULT_ANALYZER = "ik_max_word" # 这里将 es 的 默认 analyzer 设置为 ik_max_word
    
        def __init__(self, connection_alias, **connection_options):
            super().__init__(connection_alias, **connection_options)
    
        def build_schema(self, fields):
            content_field_name, mapping = super(IKSearchBackend, self).build_schema(fields)
            for field_name, field_class in fields.items():
                field_mapping = mapping[field_class.index_fieldname]
                if field_mapping["type"] == "string" and field_class.indexed:
                    if not hasattr(
                        field_class, "facet_for"
                    ) and not field_class.field_type in ("ngram", "edge_ngram"):
                        field_mapping["analyzer"] = getattr(
                            field_class, "analyzer", self.DEFAULT_ANALYZER
                        )
                mapping.update({field_class.index_fieldname: field_mapping})
            return content_field_name, mapping
    
    
    class IKSearchEngine(ElasticsearchSearchEngine):
        backend = IKSearchBackend

    在django中使用drf-haystack对查询还不是很全:

    在这我使用python 的 elasticsearch 进行查询:def-haystack的查询我觉得并不是很好用:

    class EsSearch(APIView):
        def get(self,request):
            es = Elasticsearch(["http://xxx.xxx.xxx.xxx:9200"])
            query = request.GET.get("query")
         # 这里面的搜索方式可以定制你自己想要用的查询:
          
         # https://www.elastic.co/guide/cn/elasticsearch/guide/current/match-query.html
    body = { "query":{ "multi_match": { "query": "%s" % query, "fields": [ "text", "content" ] } }, "highlight":{ "fields":{ "content":{}, "text":{} } } } result = es.search(index="haystack", doc_type="modelresult", body=body) return Response(result)

    url配置:

    """tool_bar URL Configuration
    
    The `urlpatterns` list routes URLs to views. For more information please see:
        https://docs.djangoproject.com/en/2.0/topics/http/urls/
    Examples:
    Function views
        1. Add an import:  from my_app import views
        2. Add a URL to urlpatterns:  path('', views.home, name='home')
    Class-based views
        1. Add an import:  from other_app.views import Home
        2. Add a URL to urlpatterns:  path('', Home.as_view(), name='home')
    Including another URLconf
        1. Import the include() function: from django.urls import include, path
        2. Add a URL to urlpatterns:  path('blog/', include('blog.urls'))
    """
    from django.contrib import admin
    from django.urls import path
    from django.conf import settings
    from django.conf.urls import url,include
    from django.conf.urls.static import static
    from django.conf import settings
    
    
    from app001.views import Index,Uploads
    from rest_framework import routers
    
    from app001.views import LocationSearchView,EsSearch
    from app002.views import BlogView
    
    # drf-haystack查询
    router = routers.DefaultRouter()
    router.register("search", LocationSearchView,base_name="location-search")
    
    urlpatterns = [
        # 使用自定义查询
        url(r'elastic_search/',EsSearch.as_view()),
      
    url(r"api/", include(router.urls)),
    ] 

    查询展示:

  • 相关阅读:
    iOS堆栈-内存-代码在据算机中的运行
    iOS self和super的区别
    php代码优化
    缓存雪崩现象解决方案
    缓存失效
    分布式memcache
    Linux下编译安装Memcache
    windows 下安装 php-memcached 扩展
    Linux下安装 php-memcache 扩展
    缓存之文件缓存
  • 原文地址:https://www.cnblogs.com/zhaijihai/p/12167923.html
Copyright © 2011-2022 走看看