zoukankan      html  css  js  c++  java
  • [PYTHON-TSNE]可视化Word Vector

    需要的几个文件:

    1.wordList.txt,即你要转化成vector的word list:

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    3.model,用gensim生成的word2vec model;

    4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:

    from gensim.models.word2vec import Word2Vec
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordList.txt'
    vectorFilePath = 'word2vec.txt'
    
    sentence = []
    writeStr = ''
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            sentWordList = line.strip().split(' ')
            for word in sentWordList:
                if word not in model:
                    print 'error!'
                vec = model[word]
                for vecTmp in vec:
                    writeStr += (str(vecTmp) + ' ')
            writeStr += '
    '
    
    f = open(vectorFilePath, "w")
    f.write(writeStr.strip())

    5.运行visualization.py,用于生成图片:

    import numpy as np
    from gensim.models.word2vec import Word2Vec
    import matplotlib.pyplot as plt
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordlist.txt'
    labelFilePath = 'wordlist.txt'
    
    visualizeVecs = []
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            vec = model[word.lower()]
            visualizeVecs.append(vec)
    
    visualizeWords = []
    with open(labelFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            visualizeWords.append(word.lower())
    
    visualizeVecs = np.array(visualizeVecs).astype(np.float64)
    # Y = tsne(visualizeVecs, 2, 200, 20.0);
    # # Plot.scatter(Y[:,0], Y[:,1], 20,labels);
    # # ChineseFont1 = FontProperties('SimHei')
    # for i in xrange(len(visualizeWords)):
    #     # if i<len(visualizeWords)/2:
    #     #     color='green'
    #     # else:
    #     #     color='red'
    #     color = 'red'
    #     plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1))
    # plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0])))
    # plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1])))
    # plt.show()
    
    
    # vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True))
    # temp = temp / vis_norm
    temp = (visualizeVecs - np.mean(visualizeVecs, axis=0))
    covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp)
    U, S, V = np.linalg.svd(covariance)
    coord = temp.dot(U[:, 0:2])
    for i in xrange(len(visualizeWords)):
        print i
        print coord[i, 0]
        print coord[i, 1]
        color = 'red'
        plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1),
                 fontsize=22)  # fontproperties = ChineseFont1
    plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0])))
    plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1])))
    plt.show()
    

      

    运行结果:

  • 相关阅读:
    SQL Server 调优系列进阶篇
    封装 RabbitMQ.NET
    RabbitMQ 的行为艺术
    SQL Server 调优系列进阶篇
    SQL Server 调优系列进阶篇
    FastFrameWork 快速开发框架
    SQL Server 调优系列进阶篇
    Java基础:三目运算符
    marquee标签,好神奇啊...
    Java JFrame 和 Frame 的区别
  • 原文地址:https://www.cnblogs.com/XBWer/p/6961960.html
Copyright © 2011-2022 走看看