zoukankan      html  css  js  c++  java
  • 基于apache lucene的solr站内搜索引擎搭配手记



       基于apache lucene的solr站内搜索引擎搭配手记
       [ 预备警员.10078 @ 2009-03-23 17:15:30 ]


    由于工作关系,断断续续的测试了solr的搭建和配置的工作一周,这个企业级的全站搜索工具,应该说是专业搜索引擎的有益补充,之所以存在这样的工具,可 能会是,再好的搜索引擎都很难对一个站点的所有有价值的内容进行及时有效全部的索引,并按一定的规则组织和呈现给调用者。

    1. 初尝试lucene,lucene在apache的站点可以下载到: http://lucene.apache.org/

    下载到一个最近的包之后,解压,里面带着的一个example,可以很容易的就开展起来,尤其负责lucene的index和search服务,通过 Indexer 和 Searcher 两个对象,可以在命令下实现建立索引和查询,其余接口也都较为丰富,由于接下来会重点说一下基于 lucene的 solr的配置,所以底层的lucene怎么来配选,简略一下。

    lucene有较为丰富的文档,可以在线翻阅,同时lucene的贡献者在搜索领域有几项专利,也是这方面的专家,相信其所设计的一些底层应该没有问题。

    2. solr的安装与配置

    2.1 现有平台的环境
    openSuSE Linux 10, DELL PE 2950的机器,上面部署了 Apache+Resin+MySQL 的应用。

    针对Solr往现有平台的迁入,主要动了如下几个地方:

    2.2.1
    下载安装包:http://www.apache.org/dyn/closer.cgi/lucene/solr/
    到一个叫做 /opt/src/ (没有的话,先 mkdir -p /opt/src 一个)下面

    shell> cd /opt/src
    shell> wget "http://apache.mirror.phpchina.com/lucene/solr/1.3.0/apache-solr-1.3.0.tgz"
    shell> tar xzvf apache-solr-1.3.0.tgz
    shell> cd /opt/src/apache-solr-1.3.0

    这样就解开了压缩包并备用状态, 里面有个jetty的 WEB Server, 与solr结合的较好,可以马上就开始。 下面是从 apache solr wiki的站点cp的 get started内容,贴在这里备查参考(http://lucene.apache.org/solr /tutorial.html#Getting+Started):

    Overview
    This document covers the basics of running Solr using an example schema, and some sample data.

    Requirements
    To follow along with this tutorial, you will need...

    1.Java 1.5 or greater. Some places you can get it are from Sun, IBM, or BEA.
    Running java -version at the command line should indicate a version number starting with 1.5.
    2.A Solr release.
    3.FireFox or Mozilla is the preferred browser to view the admin pages, as the current stylesheet doesn't look good on Internet Explorer.
    Getting Started
    Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.

    Begin by unziping the Solr release and changing your working directory to be the "example" directory. (Note that the base directory name may vary with the version of Solr downloaded.)

    chrish@asimov:~solr$ ls
    solr-nightly.zip
    chrish@asimov:~solr$ unzip -q solr-nightly.zip
    chrish@asimov:~solr$ cd solr-nightly/example/
    Solr can run in any Java Servlet Container of your choice, but to simplify this tutorial, the example index includes a small installation of Jetty. In order to compile JSPs, this version of Jetty requires that you run "java" from a JDK, not from a JRE.

    To launch Jetty with the Solr WAR, and the example configs, just run the start.jar ...

    chrish@asimov:~/solr/example$ java -jar start.jar
    1 [main] INFO org.mortbay.log - Logging to org.slf4j.impl.SimpleLogger@1f436f5 via org.mortbay.log.Slf4jLog
    334 [main] INFO org.mortbay.log - Extract jar:file:/home/chrish/solr/example/webapps/solr.war!/ to /tmp/Jetty__solr/webapp
    Feb 24, 2006 5:54:52 PM org.apache.solr.servlet.SolrServlet init
    INFO: user.dir=/home/chrish/solr/example
    Feb 24, 2006 5:54:52 PM org.apache.solr.core.SolrConfig <clinit>
    INFO: Loaded Config solrconfig.xml

    ...

    1656 [main] INFO org.mortbay.log - Started SelectChannelConnector @ 0.0.0.0:8983
    This will start up the Jetty application server on port 8983, and use your terminal to display the logging information from Solr.

    You can see that the Solr is running by loading http://localhost:8983/solr/admin/ in your web browser. This is the main starting point for Administering Solr.

    Indexing Data
    Your Solr server is up and running, but it doesn't contain any data. You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.

    The exampledocs directory contains samples of the types of instructions Solr expects, as well as a java utility for posting them from the command line (a post.sh shell script is also available, but for this tutorial we'll use the cross-platform Java client).

    To try this, open a new terminal window, enter the exampledocs directory, and run "java -jar post.jar" on some of the XML files in that directory, indicating the URL of the Solr server:

    chrish@asimov:~/solr/example/exampledocs$ java -jar post.jar solr.xml monitor.xml
    SimplePostTool: version 1.2 ..
  • 相关阅读:
    redis安装及教程
    Spring Cloud Alibaba系列教程
    EasyCode代码生成工具使用介绍
    FastDFS服务器搭建
    轻量级的java HTTP Server——NanoHttpd
    java代码的初始化过程研究
    浅谈设计模式的学习(下)
    浅谈设计模式的学习(中)
    浅谈设计模式的学习(上)
    PGET,一个简单、易用的并行获取数据框架
  • 原文地址:https://www.cnblogs.com/wycg1984/p/1580439.html
Copyright © 2011-2022 走看看