zoukankan      html  css  js  c++  java
  • 在SolrNet中使用Apache Tika抽取文件元数据

    1.添加jar文件:

    tika-core-0.10.jar

    tika-parsers-0.10.jar

    .....

    2.修改solrconfig.xml,修改完成后重启solr实例:

      <lib dir="solr路径/dist/" regex="apache-solr-cell-\d.*\.jar" />
      <lib dir="solr路径/contrib/extraction/lib" regex=".*\.jar" />
      <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
        <lst name="defaults">
          <str name="map.Last-Modified">last_modified</str>
          <str name="uprefix">metadata_</str>
        </lst>
      </requestHandler>

    3.c#调用代码:

    var solr = ServiceLocator.Current.GetInstance<ISolrOperations<IndexDocument>>();
    
    private void AddFile(ISolrOperations<IndexDocument> solr, string id, byte[] content, string resourceName)
    {
        using (MemoryStream stream = new MemoryStream(content))
        {
            var response = solr.Extract(new ExtractParameters(stream, id, resourceName)
            {
                ExtractFormat = ExtractFormat.Text,
                ExtractOnly = false,
                Fields = new[] 
                { 
                    new ExtractField("name1", "value1"), 
                    new ExtractField("name2", "value2")
                }
            });
            Console.WriteLine(response.Content);
        }
    }

    作者:协思
    出处:http://zeeman.cnblogs.com/
    QQ交流群:32972862

  • 相关阅读:
    C# CefSharp
    C# CRC16 modbus
    C++ 调试信息输出
    运行elasticsearch.bat出错
    Windows下NodeJS安装与npm环境变量配置
    Rescue
    Catch That Cow
    7.3.1 Swagger 文档生成工具
    3.并发处理
    2.集合处理
  • 原文地址:https://www.cnblogs.com/zeeman/p/2824640.html
Copyright © 2011-2022 走看看