zoukankan      html  css  js  c++  java
  • [PYTHON-TSNE]可视化Word Vector

    需要的几个文件:

    1.wordList.txt,即你要转化成vector的word list:

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    3.model,用gensim生成的word2vec model;

    4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:

    from gensim.models.word2vec import Word2Vec
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordList.txt'
    vectorFilePath = 'word2vec.txt'
    
    sentence = []
    writeStr = ''
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            sentWordList = line.strip().split(' ')
            for word in sentWordList:
                if word not in model:
                    print 'error!'
                vec = model[word]
                for vecTmp in vec:
                    writeStr += (str(vecTmp) + ' ')
            writeStr += '
    '
    
    f = open(vectorFilePath, "w")
    f.write(writeStr.strip())

    5.运行visualization.py,用于生成图片:

    import numpy as np
    from gensim.models.word2vec import Word2Vec
    import matplotlib.pyplot as plt
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordlist.txt'
    labelFilePath = 'wordlist.txt'
    
    visualizeVecs = []
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            vec = model[word.lower()]
            visualizeVecs.append(vec)
    
    visualizeWords = []
    with open(labelFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            visualizeWords.append(word.lower())
    
    visualizeVecs = np.array(visualizeVecs).astype(np.float64)
    # Y = tsne(visualizeVecs, 2, 200, 20.0);
    # # Plot.scatter(Y[:,0], Y[:,1], 20,labels);
    # # ChineseFont1 = FontProperties('SimHei')
    # for i in xrange(len(visualizeWords)):
    #     # if i<len(visualizeWords)/2:
    #     #     color='green'
    #     # else:
    #     #     color='red'
    #     color = 'red'
    #     plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1))
    # plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0])))
    # plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1])))
    # plt.show()
    
    
    # vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True))
    # temp = temp / vis_norm
    temp = (visualizeVecs - np.mean(visualizeVecs, axis=0))
    covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp)
    U, S, V = np.linalg.svd(covariance)
    coord = temp.dot(U[:, 0:2])
    for i in xrange(len(visualizeWords)):
        print i
        print coord[i, 0]
        print coord[i, 1]
        color = 'red'
        plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1),
                 fontsize=22)  # fontproperties = ChineseFont1
    plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0])))
    plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1])))
    plt.show()
    

      

    运行结果:

  • 相关阅读:
    RHEL7管道与重定向
    RHEL7软件包管理
    RHEL7用户管理
    RHEL7文件管理
    RHEL7文件查找
    RHEL7文件权限
    RHEL7文件归档与压缩
    RHEL7进程管理
    博客园样式美化
    flask+python页面修改密码功能
  • 原文地址:https://www.cnblogs.com/XBWer/p/6961960.html
Copyright © 2011-2022 走看看