solr 文档二

zoukankan html css js c++ java

solr 文档二

SOLR 5.5.5文档
参考博文：
http://blog.csdn.net/matthewei6/article/details/50620600
作者：毛平
时间：2018年1月15日 17:36:22
环境搭建
solr版本5.5.5，可以独立部署，使用默认的Jetty启动。
1. 准备条件
环境：JDK需要1.7以上，最好是1.8
下载软件包：
使用清华大学的镜像包：
https://mirrors.tuna.tsinghua.edu.cn/apache/lucene/solr/5.5.5/solr-5.5.5.tgz
命令：curl https://mirrors.tuna.tsinghua.edu.cn/apache/lucene/solr/5.5.5/solr-5.5.5.tgz
或者wget https://mirrors.tuna.tsinghua.edu.cn/apache/lucene/solr/5.5.5/solr-5.5.5.tgz
2. 解压
tar -zxvf solr-5.5.5.tgz
3. 启动服务器
bin/solr start
SOLR初级
1. 创建core
说明：本文是基于容器jetty，创建core相当于创建容器中的新项目。一个独立的搜索引擎项目。
bin/solr create -c maopcore
bin/solr delete -c maopcore -------删除已创建的core
2. 添加中文分次器
说明：基于刚才新建的core，添加ik分词器。使core具备中文分词的功能。
1. 修改配置文件
managed-schema(相对路径为：${PATH}serversolrmycoreconfmanaged-schema) 添加下面的内容：

<fieldType name="text_ik" class="solr.TextField">
<analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
<analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>
2. 添加ik分词器的jar
需要确保jar和solr的版本一致
安装路径为${PATH}serversolr-webappwebappWEB-INFlib。
3. 验证ik安装正确
4. 查看分词效果
(可以看到，新建的core已经具备分词功能)。
3. 添加数据库连接
说明：参考http://blog.csdn.net/u011518678/article/details/51871925
1. 创建连接配置data-config
配置当前core的数据连接的配置文件。在路径{当前core}/conf 下创建data-config.xml文件。内容为
<dataConfig>
<dataSource name="testSource1" driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@192.168.4.229:1521:orcl" password="hermes" user="hermes_rc" />
<document>
<entity name="goods1" pk="BID" transformer="DateFormatTransformer" dataSource="testSource1"
query="select id,name,url,price,to_date(to_char(addtime,'yyyy-MM-dd HH24:mi:ss'),'yyyy-MM-dd HH24:mi:ss') addtime from lksolrtest"
deltaQuery="select id BID from lksolrtest where to_char(addtime,'yyyy-MM-dd HH24:mi:ss')>'${dataimporter.last_index_time}'"
deltaImportQuery="select id,name,url,price,to_date(to_char(addtime,'yyyy-MM-dd HH24:mi:ss'),'yyyy-MM-dd HH24:mi:ss') addtime from lksolrtest where id = '${dataimporter.delta.BID}'">
<field column="BID" name="id"/>
<field column="price" name="price" />
<field column="name" name="name" />
<field column="url" name="url" />
<field column="addtime" name="addtime" dateTimeFormat="yyyy-MM-dd HH24:mi:ss" />
</entity>
</document>
</dataConfig>
文本如下图：
2. 添加数据库连接jar
本例子使用的oracle数据库，路径{solr绝对路径}serversolr-webappwebappWEB-INFlib
3. 关联data-config
在{solr绝对路径}serversolr{#core}confsolrconfig.xml对应位置添加
<lib dir="./lib" regex=".*.jar" />
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
接着注释掉下面的内容，防止id默认为String类型

4. 域和数据库字段对应关系
配置文件的路径{core绝对路径}confmanaged-schema
<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="name" type = "text_ik" indexed="true" stored="true" />
<field name="price" type = "float" indexed="true" stored="true" />
<field name="url" type = "text_ik" indexed="true" stored="true" />
<field name="addtime" type = "date" indexed="true" stored="true" />
<uniqueKey>id</uniqueKey>
5. 索引数据导入jar
添加依赖jar(solr-dataimporthandler-5.5.5.jar、solr-dataimporthandler-extras-5.5.5.jar、mydataimportscheduler.jar，其中前两个jar在solr的dist，my….jar在需要单独找) {相对路径}solr-5.5.5serversolr-webappwebappWEB-INFlib下
6. 手动验证导入索引
1. 如下选中core(项目)，dataImport(索引导入)，索引更新方式(此处为全量更新)，选择实体，点击执行。
2. 界面查询
选择core，query菜单，点击查询，得到刚才导入的索引数据
7. 配置entry详解
4. 动态索引导入
说明：solr是一个web项目，在webapp下的web.xml文件中添加监控器，启动定时周期任务。调用增量的索引生成函数。索引动态的添加入库。
1. 调整时区为北京
说明：Solr默认时区为世界时区UTC，需要修改为GMT+08:00(北京时区)
在{solr}/bin/solr.in.cmd文件中，找到SOLR_TIMEZONE的设置行，修改为
set SOLR_TIMEZONE=GMT+08:00
2. 添加监控器配置
说明：添加监听器，ApplicationListener为mydataimportscheduler.jar中的类。他会自动调用配置文件{solr}serversolr conf dataimport.properties。会启动两个定时任务。Timer-0和timer-1.其中timer-0负责增量定时任务的调用。Timer-1负责定时全量数据的调用。
在{solr绝对路径}serversolr-webappwebappWEB-INFweb.xml文件中添加监听器
<listener>
<listener-class>
org.apache.solr.handler.dataimport.scheduler.ApplicationListener
</listener-class>
</listener>
3. 创建定时任务配置文件
说明：文件中有定时的全量更新配置，也有定时增量配置。真正使用选取其中一种即可。监控器调用当前配置文件。
在serversolr下创建文件夹conf，并创建dataimport.properties，内容如下：
#################################################
# #
# dataimport scheduler properties #
# #
#################################################
# to sync or not to sync
# 1 - active; anything else - inactive
syncEnabled=1
# which cores to schedule
# in a multi-core environment you can decide which cores you want syncronized
# leave empty or comment it out if using single-core deployment
#syncCores=liukuncore,liukuncore1
syncCores=maopcore001
# solr server name or IP address
# [defaults to localhost if empty]
server=localhost
# solr server port
# [defaults to 80 if empty]
port=8983
# application name/context
# [defaults to current ServletContextListener's context (app) name]
webapp=solr
# URL params [mandatory]
# remainder of URL
#params=/deltaimport?command=delta-import&clean=false&commit=true
params=/dataimport?command=delta-import&clean=false&commit=true
# schedule interval
# number of minutes between two runs
# [defaults to 30 if empty]
interval=1
# 重做索引的时间间隔，单位分钟，默认1440，即1天;
# 为空,为0,或者注释掉:表示永不重做索引
reBuildIndexInterval=1440
# 重做索引的参数
#reBuildIndexParams=/deltaimport?command=full-import&clean=true&commit=true
reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true
# 重做索引时间间隔的计时开始时间，第一次真正执行的时间#=reBuildIndexBeginTime+reBuildIndexInterval*60*1000；
# 两种格式：2012-04-11 03:10:00 或者 03:10:00，后一种会自动补全日期部分为服务启动时的日期
reBuildIndexBeginTime=2018-01-14 15:14:00
4. 配置文件参数详解
syncCores：调用的currentCore，如果是多个core,使用逗号隔开
server：服务ip或者名称，例如：localhost
port：服务端口
增量配置参数：
Params：增量url
Interval：增量时间间隔(单位：分钟)
全量配置参数：
reBuildIndexParams：全量url
reBuildIndexInterval：全量时间间隔(单位：分钟)
reBuildIndexBeginTime：全量第一次执行开始时间
5. 数据节点详解
说明：数据索引导入需要配置数据节点。
比如：
配置节点包含dataSource和document节点。
dataSource是数据库的配置。关注的主要有url，user，password。
1. query是获取全部数据的SQL（全量更新的sql）
2. deltaImportQuery是获取增量数据时使用的SQL
3. deltaQuery是获取增量pk的SQL
4. parentDeltaQuery是获取父Entity的pk的SQL
SOLR 中级
多表关联
Full Import工作原理：
执行本Entity的Query，获取所有数据；
针对每个行数据Row，获取pk，组装子Entity的Query；
执行子Entity的Query，获取子Entity的数据。
Delta Import工作原理：
查找子Entity，直到没有为止；
执行Entity的deltaQuery，获取变化数据的pk；
合并子Entity parentDeltaQuery得到的pk；
针对每一个pk Row，组装父Entity的parentDeltaQuery；
执行parentDeltaQuery，获取父Entity的pk；
执行deltaImportQuery，获取自身的数据；
如果没有deltaImportQuery，就组装Query
限制：
子Entity的query必须引用父Entity的pk
子Entity的parentDeltaQuery必须引用自己的pk
子Entity的parentDeltaQuery必须返回父Entity的pk
deltaImportQuery引用的必须是自己的pk
文件索引

查看全文

相关阅读:
你现在是否在高效地使用时间？
关于不使用web服务实现文本框自动完成扩展
 SpringBoot(一) -- SpringBoot入门
 微信小程序(三)--小程序UI开发
 微信小程序(二)--逻辑层与界面层
 微信小程序(一)--微信小程序的介绍
 Spring学习(七)--Spring MVC的高级技术
 Spring学习(四)--面向切面的Spring
Spring学习(三)--高级装配
 django源码分析 LazySetting对象