zoukankan      html  css  js  c++  java
  • Food Log with Speech Recognition and NLP

    1. 分词 word segmentation

    国内有jieba 分词

    2. Named Entity Recognition

    1. 训练自己的Model

          

    How can I train my own NER model

    https://nlp.stanford.edu/software/crf-faq.html#a

    C:my_studyMLNLPstanford-ner-2018-02-27>java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop chinese.meal.fpp.prop
    Invoked on Thu Mar 22 16:34:06 CST 2018 with arguments: -prop chinese.meal.fpp.prop
    usePrevSequences=true
    useClassFeature=true
    useTypeSeqs2=true
    useSequences=true
    wordShape=chris2useLC
    useTypeySequences=true
    useDisjunctive=true
    noMidNGrams=true
    serializeTo=ner-model.ser.gz
    maxNGramLeng=6
    useNGrams=true
    usePrev=true
    useNext=true
    maxLeft=1
    trainFile=chinese.meal.fpp.tsv
    map=word=0,answer=1
    useWord=true
    useTypeSeqs=true
    numFeatures = 564
    Time to convert docs to feature indices: 0.0 seconds
    numClasses: 5 [0=O,1=TIME,2=QUANTITY,3=UNIT,4=FOOD]
    numDocuments: 1
    numDatums: 56
    numFeatures: 564
    Time to convert docs to data/labels: 0.0 seconds
    numWeights: 6460
    QNMinimizer called on double function of 6460 variables, using M = 25.
                   An explanation of the output:
    Iter           The number of iterations
    evals          The number of function evaluations
    SCALING        <D> Diagonal scaling was used; <I> Scaled Identity
    LINESEARCH     [## M steplength]  Minpack linesearch
                       1-Function value was too high
                       2-Value ok, gradient positive, positive curvature
                       3-Value ok, gradient negative, positive curvature
                       4-Value ok, gradient negative, negative curvature
                   [.. B]  Backtracking
    VALUE          The current function value
    TIME           Total elapsed time
    |GNORM|        The current norm of the gradient
    {RELNORM}      The ratio of the current to initial gradient norms
    AVEIMPROVE     The average improvement / current value
    EVALSCORE      The last available eval score
    
    Iter ## evals ## <SCALING> [LINESEARCH] VALUE TIME |GNORM| {RELNORM} AVEIMPROVE EVALSCORE
    
    Iter 1 evals 1 <D> [M 1.000E-1] 9.068E2 0.04s |4.550E1| {4.995E-1} 0.000E0 -
    Iter 2 evals 2 <D> [M 1.000E0] 6.222E2 0.05s |3.525E1| {3.870E-1} 2.287E-1 -
    Iter 3 evals 3 <D> [M 1.000E0] 2.386E2 0.07s |5.406E1| {5.935E-1} 9.334E-1 -
    Iter 4 evals 4 <D> [M 1.000E0] 9.082E1 0.08s |1.571E1| {1.724E-1} 2.246E0 -
    Iter 5 evals 5 <D> [M 1.000E0] 7.031E1 0.10s |1.181E1| {1.297E-1} 2.379E0 -
    Iter 6 evals 6 <D> [M 1.000E0] 5.308E1 0.11s |1.025E1| {1.125E-1} 2.681E0 -
    Iter 7 evals 7 <D> [1M 2.740E-1] 2.988E1 0.14s |7.586E0| {8.328E-2} 4.193E0 -
    Iter 8 evals 9 <D> [1M 1.292E-1] 2.234E1 0.16s |6.471E0| {7.105E-2} 4.949E0 -
    Iter 9 evals 11 <D> [1M 1.801E-1] 1.615E1 0.18s |5.573E0| {6.118E-2} 6.127E0 -
    Iter 10 evals 13 <D> [1M 1.815E-1] 1.218E1 0.24s |4.477E0| {4.915E-2} 7.346E0 -
    Iter 11 evals 15 <D> [1M 3.119E-1] 8.873E0 0.30s |4.694E0| {5.154E-2} 6.912E0 -
    Iter 12 evals 17 <D> [1M 4.760E-1] 6.621E0 0.31s |2.092E0| {2.296E-2} 3.504E0 -
    Iter 13 evals 19 <D> [M 1.000E0] 6.093E0 0.32s |1.906E0| {2.092E-2} 1.390E0 -
    Iter 14 evals 20 <D> [M 1.000E0] 5.844E0 0.33s |9.067E-1| {9.955E-3} 1.103E0 -
    Iter 15 evals 21 <D> [M 1.000E0] 5.721E0 0.33s |5.774E-1| {6.339E-3} 8.279E-1 -
    Iter 16 evals 22 <D> [M 1.000E0] 5.660E0 0.34s |3.535E-1| {3.881E-3} 4.279E-1 -
    Iter 17 evals 23 <D> [M 1.000E0] 5.640E0 0.35s |1.946E-1| {2.137E-3} 2.961E-1 -
    Iter 18 evals 24 <D> [M 1.000E0] 5.632E0 0.36s |7.832E-2| {8.599E-4} 1.868E-1 -
    Iter 19 evals 25 <D> [M 1.000E0] 5.631E0 0.38s |3.559E-2| {3.907E-4} 1.163E-1 -
    Iter 20 evals 26 <D> [M 1.000E0] 5.631E0 0.39s |2.149E-2| {2.359E-4} 5.758E-2 -
    Iter 21 evals 27 <D> [M 1.000E0] 5.631E0 0.41s |1.027E-2| {1.128E-4} 1.758E-2 -
    Iter 22 evals 28 <D> [M 1.000E0] 5.631E0 0.42s |3.631E-3| {3.986E-5} 8.218E-3 -
    Iter 23 evals 29 <D> [M 1.000E0] 5.631E0 0.44s |1.629E-3| {1.789E-5} 3.791E-3 -
    Iter 24 evals 30 <D> [M 1.000E0] 5.631E0 0.45s |9.548E-4| {1.048E-5} 1.596E-3 -
    Iter 25 evals 31 <D> [M 1.000E0] 5.631E0 0.45s |5.724E-4| {6.284E-6} 5.196E-4 -
    Iter 26 evals 32 <D> [M 1.000E0] 5.631E0 0.47s |1.578E-4| {1.732E-6} 1.686E-4 -
    QNMinimizer terminated due to average improvement: | newest_val - previous_val | / |newestVal| < TOL
    Total time spent in optimization: 0.49s
    CRFClassifier training ... done [0.6 sec].
    Serializing classifier to ner-model.ser.gz... done.

     2. 使用训练好的Model来evaluate 一下,看看效果怎么样. 

    C:my_studyMLNLPstanford-ner-2018-02-27>java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv
    Invoked on Thu Mar 22 16:30:48 CST 2018 with arguments: -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv
    testFile=chinese.meal.fpp.test.tsv
    loadClassifier=ner-model.ser.gz
    Loading classifier from ner-model.ser.gz ... done [0.1 sec].
    我      O       O
    今天    O       O
    晚上    TIME    TIME
    吃      O       O
    了      O       O
    两      QUANTITY        QUANTITY
    盘      UNIT    UNIT
    回锅肉  FOOD    FOOD
    
    CRFClassifier tagged 8 words in 1 documents at 88.89 words per second.
             Entity P       R       F1      TP      FP      FN
               FOOD 1.0000  1.0000  1.0000  1       0       0
           QUANTITY 1.0000  1.0000  1.0000  1       0       0
               TIME 1.0000  1.0000  1.0000  1       0       0
               UNIT 1.0000  1.0000  1.0000  1       0       0
             Totals 1.0000  1.0000  1.0000  4       0       0

    还不错哦!

    Ref:

    1. Standford NLP NER: https://nlp.stanford.edu/software/CRF-NER.html

    转载请注明出处 http://www.cnblogs.com/mashuai-191/
  • 相关阅读:
    Visual Studio: 一键卸载所有组件工具,彻底卸载干净。
    由于未能创建Visual C# 2015编译器,因此未能打开项目xxx。请重新安装Visual Studio。
    MySQL:ROWNUM的假实现
    mysql、MS SQL关于分页的sql查询语句 limit 和row_number() OVER函数
    Redis:默认配置文件redis.conf详解
    Redis:五种数据类型的简单增删改查
    使用控制台对Redis执行增删改查命令
    Redis:高性能文件缓存key-value储存
    redis : 桌面管理工具 redis-desktop-manager使用指南
    SqlServer :利用快捷键快速查看 字段说明查询及表结构 (小技巧)
  • 原文地址:https://www.cnblogs.com/mashuai-191/p/8621413.html
Copyright © 2011-2022 走看看