zoukankan      html  css  js  c++  java
  • Java WordNet Similarity

    WordNet词网研究7——之JWS(Java Wordnet Similarity)语义相似度计算

     

    JWS——Java WordNet Similarity是由University Of Sussex的David Hope等开发的基于java与WordNet的语义相似度计算开源项目。其中实现了许多经典的语义相似度算法。是一款值得研究的语义相似度计算开源工具。

    JWS是WordNet::Similarity(一个Perl版的WordNet相似度比较包)的Java实现版本,想用Java实现用WordNet比较词语相似度的朋友有福拉!!简述使用步骤:

    1、下载WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;

    2、下载WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://www.d.umn.edu/~tpederse/Data/;

    3、下载JWS(现有版本:beta.11.01):http://www.cogs.susx.ac.uk/users/drh21/;

    4、安装WordNet;

    5、解压WordNet-InfoContent-2.1,并将文件夹拷贝至WordNet目录D:/Program Files/WordNet/2.1下;

    6、将JWS中的两个jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷贝至Java的lib目录下,并设置环境变量;

    7、在Eclipse下运行JWS中的例子程序:TestExamples

         说明:由于下载的WordNet是2.1版本的,所以程序中有几处需要修改

         String dir = "C:/Program Files/WordNet";    //这里指定WordNet的安装路径,按照你实际安装的路径加以修改

         JWS ws = new JWS(dir, "3.0");                   //把3.0改为2.1即可

    程序实例:

    复制代码
     1 import java.util.TreeMap;
     2 import java.text.*;
     3 import edu.sussex.nlp.jws.*;
     4 
     5 
     6 // 'TestExamples': how to use Java WordNet::Similarity
     7 // David Hope, 2008
     8 public class TestExamples
     9 {
    10      public static void main(String[] args)
    11     {
    12 
    13 // 1. SET UP:
    14 //   Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s)
    15         String dir = "E:/Commonly Application/WordNet/";
    16 //   That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict
    17 //   The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.0
    18 
    19 //   Option 1  (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat]
    20         JWS    ws = new JWS(dir, "2.1");
    21 //   Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply
    22         //JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat");
    23 
    24 
    25 // 2. EXAMPLES OF USE:
    26 
    27 // 2.1 [JIANG & CONRATH MEASURE]
    28         JiangAndConrath jcn = ws.getJiangAndConrath();
    29         //System.out.println("Jiang & Conrath\n");
    30 // all senses
    31         TreeMap<String, Double>     scores1    =    jcn.jcn("apple", "banana", "n");            // all senses
    32         //TreeMap<String, Double>     scores1    =    jcn.jcn("apple", 1, "banana", "n");     // fixed;all
    33         //TreeMap<String, Double>     scores1    =    jcn.jcn("apple", "banana", 2, "n");     // all;fixed
    34         for(String s : scores1.keySet())
    35             System.out.println(s + "\t" + scores1.get(s));
    36 // specific senses
    37         //System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n");
    38 // max.
    39         ///System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n");
    40 
    41 //*/
    42 // 2.2 [LIN MEASURE]
    43         Lin lin = ws.getLin();
    44         ///System.out.println("Lin\n");
    45 // all senses
    46         TreeMap<String, Double>     scores2    =    lin.lin("like", "love", "n");            // all senses
    47         //TreeMap<String, Double>     scores2    =    lin.lin("kid", "child", "n");     // fixed;all
    48         //TreeMap<String, Double>     scores2    =    lin.lin("apple", "banana", 2, "n");     // all;fixed
    49         //for(String s : scores2.keySet())
    50             //System.out.println(s + "\t" + scores2.get(s));
    51 // specific senses
    52         System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n");
    53 // max.
    54         System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n");
    55 
    56 // ... and so on for any other measure
    57     }
    58 } // eof
    复制代码

    简单实现基于JWS的语义相似度计算程序,例如:

    复制代码
     1 import edu.sussex.nlp.jws.JWS;
     2 import edu.sussex.nlp.jws.Lin;
     3 
     4 
     5 public class Similar {
     6 
     7     private String str1;
     8     private String str2;
     9     private String dir = "E:/Commonly Application/WordNet/";
    10     private JWS    ws = new JWS(dir, "2.1");
    11     
    12     public Similar(String str1,String str2){
    13         this.str1=str1;
    14         this.str2=str2;
    15     }
    16     
    17     public double getSimilarity(){
    18         String[] strs1 = splitString(str1);
    19         String[] strs2 = splitString(str2);
    20         double sum = 0.0;
    21         for(String s1 : strs1){
    22             for(String s2: strs2){
    23                 double sc= maxScoreOfLin(s1,s2);
    24                 sum+= sc;
    25                 System.out.println("当前计算: "+s1+" VS "+s2+" 的相似度为:"+sc);
    26             }
    27         }
    28         double Similarity = sum /(strs1.length * strs2.length);
    29         sum=0;
    30         return Similarity;
    31     }
    32     
    33     private String[] splitString(String str){
    34         String[] ret = str.split(" ");
    35         return ret;
    36     }
    37     
    38     private double maxScoreOfLin(String str1,String str2){
    39         Lin lin = ws.getLin();
    40         double sc = lin.max(str1, str2, "n");
    41         if(sc==0){
    42             sc = lin.max(str1, str2, "v");
    43         }
    44         return sc;
    45     }
    46     
    47     public static void main(String args[]){
    48         String s1="departure";
    49         String s2="leaving from";
    50         Similar sm= new Similar(s1, s2);
    51         System.out.println(sm.getSimilarity());
    52     }
    53 }
    复制代码

    当时碰到想基于protege+Wordnet来处理语义分析这块,所以接触到JWS,但没有太多的时间去深入研究,是一个非常的遗憾,希望有研究的朋友,发个Blog Url,大家参考参考!

     
     
    分类: WordNet
    标签: WordNet
  • 相关阅读:
    Day4
    Day 4 -E
    Day4
    Day4
    Day4
    Day4
    Day4-F-产生冠军 HDU
    Day4
    Day4-B-最短路径问题 HDU3790
    HackerRank
  • 原文地址:https://www.cnblogs.com/Leo_wl/p/2874482.html
Copyright © 2011-2022 走看看