zoukankan      html  css  js  c++  java
  • [PYTHON-TSNE]可视化Word Vector

    需要的几个文件:

    1.wordList.txt,即你要转化成vector的word list:

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    3.model,用gensim生成的word2vec model;

    4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:

    from gensim.models.word2vec import Word2Vec
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordList.txt'
    vectorFilePath = 'word2vec.txt'
    
    sentence = []
    writeStr = ''
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            sentWordList = line.strip().split(' ')
            for word in sentWordList:
                if word not in model:
                    print 'error!'
                vec = model[word]
                for vecTmp in vec:
                    writeStr += (str(vecTmp) + ' ')
            writeStr += '
    '
    
    f = open(vectorFilePath, "w")
    f.write(writeStr.strip())

    5.运行visualization.py,用于生成图片:

    import numpy as np
    from gensim.models.word2vec import Word2Vec
    import matplotlib.pyplot as plt
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordlist.txt'
    labelFilePath = 'wordlist.txt'
    
    visualizeVecs = []
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            vec = model[word.lower()]
            visualizeVecs.append(vec)
    
    visualizeWords = []
    with open(labelFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            visualizeWords.append(word.lower())
    
    visualizeVecs = np.array(visualizeVecs).astype(np.float64)
    # Y = tsne(visualizeVecs, 2, 200, 20.0);
    # # Plot.scatter(Y[:,0], Y[:,1], 20,labels);
    # # ChineseFont1 = FontProperties('SimHei')
    # for i in xrange(len(visualizeWords)):
    #     # if i<len(visualizeWords)/2:
    #     #     color='green'
    #     # else:
    #     #     color='red'
    #     color = 'red'
    #     plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1))
    # plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0])))
    # plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1])))
    # plt.show()
    
    
    # vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True))
    # temp = temp / vis_norm
    temp = (visualizeVecs - np.mean(visualizeVecs, axis=0))
    covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp)
    U, S, V = np.linalg.svd(covariance)
    coord = temp.dot(U[:, 0:2])
    for i in xrange(len(visualizeWords)):
        print i
        print coord[i, 0]
        print coord[i, 1]
        color = 'red'
        plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1),
                 fontsize=22)  # fontproperties = ChineseFont1
    plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0])))
    plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1])))
    plt.show()
    

      

    运行结果:

  • 相关阅读:
    常用的dos命令
    定时器
    自动化工具下载地址
    Eclipse自动提示
    An error occurred: No action handlers found
    生产消费的经典案例
    SpringBoot 优雅的参数效验
    40 个 SpringBoot 常用注解
    极简入门,Shiro的认证与授权流程解析
    Java多线程批量处理、线程池的使用
  • 原文地址:https://www.cnblogs.com/XBWer/p/6961960.html
Copyright © 2011-2022 走看看