zoukankan      html  css  js  c++  java
  • [PYTHON-TSNE]可视化Word Vector

    需要的几个文件:

    1.wordList.txt,即你要转化成vector的word list:

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    3.model,用gensim生成的word2vec model;

    4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:

    from gensim.models.word2vec import Word2Vec
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordList.txt'
    vectorFilePath = 'word2vec.txt'
    
    sentence = []
    writeStr = ''
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            sentWordList = line.strip().split(' ')
            for word in sentWordList:
                if word not in model:
                    print 'error!'
                vec = model[word]
                for vecTmp in vec:
                    writeStr += (str(vecTmp) + ' ')
            writeStr += '
    '
    
    f = open(vectorFilePath, "w")
    f.write(writeStr.strip())

    5.运行visualization.py,用于生成图片:

    import numpy as np
    from gensim.models.word2vec import Word2Vec
    import matplotlib.pyplot as plt
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordlist.txt'
    labelFilePath = 'wordlist.txt'
    
    visualizeVecs = []
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            vec = model[word.lower()]
            visualizeVecs.append(vec)
    
    visualizeWords = []
    with open(labelFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            visualizeWords.append(word.lower())
    
    visualizeVecs = np.array(visualizeVecs).astype(np.float64)
    # Y = tsne(visualizeVecs, 2, 200, 20.0);
    # # Plot.scatter(Y[:,0], Y[:,1], 20,labels);
    # # ChineseFont1 = FontProperties('SimHei')
    # for i in xrange(len(visualizeWords)):
    #     # if i<len(visualizeWords)/2:
    #     #     color='green'
    #     # else:
    #     #     color='red'
    #     color = 'red'
    #     plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1))
    # plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0])))
    # plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1])))
    # plt.show()
    
    
    # vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True))
    # temp = temp / vis_norm
    temp = (visualizeVecs - np.mean(visualizeVecs, axis=0))
    covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp)
    U, S, V = np.linalg.svd(covariance)
    coord = temp.dot(U[:, 0:2])
    for i in xrange(len(visualizeWords)):
        print i
        print coord[i, 0]
        print coord[i, 1]
        color = 'red'
        plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1),
                 fontsize=22)  # fontproperties = ChineseFont1
    plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0])))
    plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1])))
    plt.show()
    

      

    运行结果:

  • 相关阅读:
    Oracle数据库中truncate命令和delete命令的区别
    数组中只出现一次的数字
    数对之差的最大值
    SQL Server: Difference Between Locking, Blocking and Dead Locking
    字符串处理
    Phpcms_V9任意文件上传
    最初的梦想
    陪你走过漫长岁月
    基于MitM的RDP降级攻击
    CVE-2017-0358
  • 原文地址:https://www.cnblogs.com/XBWer/p/6961960.html
Copyright © 2011-2022 走看看