zoukankan      html  css  js  c++  java
  • 细节化 OpenNLP

    6 细节化

    功能介绍:文本分块由除以单词句法相关部分,如名词基,动词基的文字,但没有指定其内部结构,也没有其在主句作用。

    API:该概括化提供了一个API来培养新的概括化的模式。下面的示例代码演示了如何做到这一点:

    测试代码

    package package01;

    import opennlp.tools.chunker.ChunkerME;
    import opennlp.tools.chunker.ChunkerModel;
    import opennlp.tools.cmdline.postag.POSModelLoader;
    import opennlp.tools.postag.POSModel;
    import opennlp.tools.postag.POSSample;
    import opennlp.tools.postag.POSTaggerME;
    import opennlp.tools.tokenize.WhitespaceTokenizer;
    import opennlp.tools.util.*;

    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.nio.charset.Charset;

    public class Test06 {

    public static void main(String[] args) throws IOException {
    Test06.chunk();
    }

    /**
    * 5.序列标注:Chunker
    * @deprecated 通过使用标记生成器生成的tokens分为一个句子划分为一组块。What chunker does is to partition a sentence to a set of chunks by using the tokens generated by tokenizer.
    *
    * 输入值
    * Hi. How are you? This is Mike.
    */
    public static void chunk() throws IOException {
    POSModel model = new POSModelLoader().load(new File("E:\NLP_Practics\models\en-pos-maxent.bin"));
    //PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
    POSTaggerME tagger = new POSTaggerME(model);
    // ObjectStream<String> lineStream = new PlainTextByLineStream(new StringReader(str));

    Charset charset = Charset.forName("UTF-8");
    InputStreamFactory isf = new MarkableFileInputStreamFactory(new File("E:\myText.txt"));
    ObjectStream<String> lineStream = new PlainTextByLineStream(isf, charset);

    //perfMon.start();
    String line;
    String whitespaceTokenizerLine[] = null;
    String[] tags = null;
    while ((line = lineStream.read()) != null) {
    whitespaceTokenizerLine = WhitespaceTokenizer.INSTANCE.tokenize(line);
    tags = tagger.tag(whitespaceTokenizerLine);
    POSSample sample = new POSSample(whitespaceTokenizerLine, tags);
    System.out.println(sample.toString());
    //perfMon.incrementCounter();
    }
    //perfMon.stopAndPrintFinalResult();

    // chunker
    InputStream is = new FileInputStream("E:\NLP_Practics\models\en-chunker.bin");
    ChunkerModel cModel = new ChunkerModel(is);
    ChunkerME chunkerME = new ChunkerME(cModel);
    String result[] = chunkerME.chunk(whitespaceTokenizerLine, tags);
    for (String s : result)
    System.out.println(s);
    Span[] span = chunkerME.chunkAsSpans(whitespaceTokenizerLine, tags);
    for (Span s : span)
    System.out.println(s.toString());
    System.out.println("--------------5-------------");
    is.close();
    }
    }

      

    结果

    Loading POS Tagger model ... done (0.554s)
    Hi._NNP How_WRB are_VBP you?_JJ This_DT is_VBZ Mike._NNP
    B-NP
    B-ADVP
    O
    B-NP
    I-NP
    B-VP
    O
    [0..1) NP
    [1..2) ADVP
    [3..5) NP
    [5..6) VP
    --------------5-------------
    

      

    https://github.com/godmaybelieve
  • 相关阅读:
    算法竞赛入门经典习题2-3 韩信点兵
    ios入门之c语言篇——基本函数——5——素数判断
    ios入门之c语言篇——基本函数——4——数值交换函数
    144. Binary Tree Preorder Traversal
    143. Reorder List
    142. Linked List Cycle II
    139. Word Break
    138. Copy List with Random Pointer
    137. Single Number II
    135. Candy
  • 原文地址:https://www.cnblogs.com/yuyu666/p/15029795.html
Copyright © 2011-2022 走看看