zoukankan      html  css  js  c++  java
  • word2vec配置到使用

    (1)首先下载word2vec,地址:https://code.google.com/p/word2vec/,可能下载的时候有问题,google上不去,那么可以从csdn上面下载。
    解压后目录如下:
     
    复制代码
    w2v/
    `-- trunk
        |-- LICENSE
        |-- README.txt
        |-- compute-accuracy.c
        |-- demo-analogy.sh
        |-- demo-classes.sh
        |-- demo-phrase-accuracy.sh
        |-- demo-phrases.sh
        |-- demo-train-big-model-v1.sh
        |-- demo-word-accuracy.sh
        |-- demo-word.sh
        |-- distance.c
        |-- makefile
        |-- questions-phrases.txt
        |-- questions-words.txt
        |-- word-analogy.c
        |-- word2phrase.c
        `-- word2vec.c
    复制代码
    (2) 进入w2c/trunk文件夹,运行make,编辑文件。从makefile中可以看到,需要编译的文件,主要有两个word2vec.c和distance.c,编译后生成word2vec和distance。但是在编译的时候可能出现问题,参照http://blog.csdn.net/zshunmiao/article/details/15339105,可以对问题进行解决。
    makefile内容如下:
    (3)然后就可以跑个demo了,运行./demo-word.sh。
    demo-word.sh内代码如下:
    复制代码
    CC = gcc
    #Using -Ofast instead of -O3 might result in faster code, but is supported only by newer GCC versions
    CFLAGS = -lm -pthread -O3 -march=native -Wall -funroll-loops -Wno-unused-result
    
    all: word2vec word2phrase distance word-analogy compute-accuracy
    
    word2vec : word2vec.c
            $(CC) word2vec.c -o word2vec $(CFLAGS)
    word2phrase : word2phrase.c
            $(CC) word2phrase.c -o word2phrase $(CFLAGS)
    distance : distance.c
            $(CC) distance.c -o distance $(CFLAGS)
    word-analogy : word-analogy.c
            $(CC) word-analogy.c -o word-analogy $(CFLAGS)
    compute-accuracy : compute-accuracy.c
            $(CC) compute-accuracy.c -o compute-accuracy $(CFLAGS)
            chmod +x *.sh
    
    clean:
            rm -rf word2vec word2phrase distance word-analogy compute-accuracy
    复制代码

    然后输入单词,就可以计算其近义词,并按照顺序排列。
    复制代码
    Enter word or sentence (EXIT to break): china       
    
    Word: china  Position in vocabulary: 486
    
                                                  Word       Cosine distance
    ------------------------------------------------------------------------
                                                 japan              0.648631
                                                taiwan              0.630534
                                             manchuria              0.599535
                                                 tibet              0.583566
                                                   prc              0.560898
                                              kalmykia              0.558937
                                                xiamen              0.556037
                                                 jiang              0.553501
                                               chinese              0.547065
                                                  liao              0.543676
                                                 india              0.536273
                                                 korea              0.534758
                                                   roc              0.530741
                                              thailand              0.529334
                                                 hunan              0.527629
                                                 liang              0.527374
                                              shanghai              0.526314
                                             chongqing              0.525559
                                               nanjing              0.521342
                                                yunnan              0.518669
                                                 wuhan              0.516914
                                                  zhao              0.513246
                                              xinjiang              0.509939
                                                  tuva              0.507322
                                             guangdong              0.507288
                                                 hubei              0.505540
                                               guangxi              0.501068
                                                taipei              0.497673
                                                 macao              0.497303
                                                hainan              0.494808
                                              shandong              0.493323
                                              shenzhen              0.491871
                                              hangzhou              0.489323
                                                balhae              0.488846
                                             guangzhou              0.486907
                                                fujian              0.485473
                                              zhejiang              0.485011
                                                harbin              0.483171
    复制代码
    积极乐观,好好coding
  • 相关阅读:
    结对编程之附加题:单元测试
    机器学习第二次作业
    第一次作业
    机器学习第二次作业
    机器学习第一次个人作业
    软工实践个人总结
    第08组 Beta版本演示
    第08组 Beta冲刺(5/5)
    第08组 Beta冲刺(4/5)
    第08组 Beta冲刺(3/5)
  • 原文地址:https://www.cnblogs.com/xiaodi914/p/4872180.html
Copyright © 2011-2022 走看看