zoukankan      html  css  js  c++  java
  • 1.6.9 UIMA Integration

    1. UIMA 集成

      你可以使用solr集成Apache的非结构化信息管理架构(UIMA).UIMA可以让你定义自己的分析引擎通道,逐步添加元数据到文档标注.

      关于Solr UIMA的更多信息,参考https://wiki.apache.org/solr/SolrUIMA.

    1.1 Configuring UIMA

     solr UIMA的UpdateRequestProcessor是一个自定义的更新请求处理器.发送它们给UIMA管道,然后返回具有丰富元数据的文档.按照下面步骤配置UIMA:

      1. solrconfig.xml,复制/solr-4.x.y/dist/solr-uima-4.x.y.jar包和它的contrib/uima/lib下面的类库到solr的类库目录下.

    <lib dir="../../contrib/uima/lib" />
    <lib dir="../../dist/" regex="solr-uima-d.*.jar" />

      2.schema.xml中,添加元数据字段:

    <field name="language" type="string" indexed="true" stored="true"  required="false" />
    <field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false" />
    <field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />

      3.在solrconfig.xml中添加如下片段:

    <updateRequestProcessorChain name="uima">
        <processor
            class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
            <lst name="uimaConfig">
                <lst name="runtimeParameters">
                    <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
                    <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
                    <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
                    <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
                    <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
                    <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
                </lst>
                <str name="analysisEngine">
                    /org/apache/uima/desc/OverridingParamsExtServicesAE.xml
                </st
    r>
                    <!-- Set to true if you want to continue indexing even if text processing 
                        fails. Default is false. That is, Solr throws RuntimeException and never 
                        indexed documents entirely in your session. -->
                    <bool name="ignoreErrors">true</bool>
                    <!-- This is optional. It is used for logging when text processing fails. 
                        If logField is not specified, uniqueKey will be used as logField. <str name="logField">id</str> -->
                    <lst name="analyzeFields">
                        <bool name="merge">false</bool>
                        <arr name="fields">
                            <str>text</str>
                        </arr>
                    </lst>
                    <lst name="fieldMappings">
                        <lst name="type">
                            <str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
                            <lst name="mapping">
                                <str name="feature">text</str>
                                <str name="field">concept</str>
                            </lst>
                        </lst>
                        <lst name="type">
                            <str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
                            <lst name="mapping">
                                <str name="feature">language</str>
                                <str name="field">language</str>
                            </lst>
                        </lst>
                        <lst name="type">
                            <str name="name">org.apache.uima.SentenceAnnotation</str>
                            <lst name="mapping">
                                <str name="feature">coveredText</str>
                                <str name="field">sentence</str>
                            </lst>
                        </lst>
                    </lst>
            </lst>
        </processor>
        <processor class="solr.LogUpdateProcessorFactory" />
        <processor class="solr.RunUpdateProcessorFactory" />
    </updateRequestProcessorChain>

       4. 在solrconfig.xml中替换已经存在的UpdateRequestHandler或者创建新的UpdateRequestHandler.

    <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
      <lst name="defaults">
        <str name="update.processor">uima</str>
      </lst>
    </requestHandler>
  • 相关阅读:
    phpstorm使用svn爆出“cannot load supported formats” 的解决
    本地wamp的Internal Server Error错误解决方法
    mac下apache的多站点配置
    Git 一些错误的解决方法
    【总结整理】登录模块---摘自《人人都是产品经理》
    【总结整理】产品经理优秀品质----《结网》
    【总结整理】传统行业如何合理利用互联网思维----摘自《人人都是产品经理》
    【总结整理】租房产品创业的三个方向和三个产品---摘自《人人都是产品经理》
    【总结整理】KANO 模型
    【总结整理】关于GrowingIO、友盟、google analysis等数据分析
  • 原文地址:https://www.cnblogs.com/a198720/p/4323208.html
Copyright © 2011-2022 走看看