zoukankan      html  css  js  c++  java
  • [PYTHON-TSNE]可视化Word Vector

    需要的几个文件:

    1.wordList.txt,即你要转化成vector的word list:

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。

    spring
    maven
    junit
    ant
    swing
    xml
    jre
    jdk
    jbutton
    jpanel
    swt
    japplet
    jdialog
    jcheckbox
    jlabel
    jmenu
    slf4j
    test
    unit

    3.model,用gensim生成的word2vec model;

    4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:

    from gensim.models.word2vec import Word2Vec
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordList.txt'
    vectorFilePath = 'word2vec.txt'
    
    sentence = []
    writeStr = ''
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            sentWordList = line.strip().split(' ')
            for word in sentWordList:
                if word not in model:
                    print 'error!'
                vec = model[word]
                for vecTmp in vec:
                    writeStr += (str(vecTmp) + ' ')
            writeStr += '
    '
    
    f = open(vectorFilePath, "w")
    f.write(writeStr.strip())

    5.运行visualization.py,用于生成图片:

    import numpy as np
    from gensim.models.word2vec import Word2Vec
    import matplotlib.pyplot as plt
    from pathutil import get_base_path
    
    modelpath = 'XXX/model'
    model = Word2Vec.load(modelpath)
    sentenceFilePath = 'wordlist.txt'
    labelFilePath = 'wordlist.txt'
    
    visualizeVecs = []
    with open(sentenceFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            vec = model[word.lower()]
            visualizeVecs.append(vec)
    
    visualizeWords = []
    with open(labelFilePath, 'r') as f:
        for line in f:
            word = line.strip()
            visualizeWords.append(word.lower())
    
    visualizeVecs = np.array(visualizeVecs).astype(np.float64)
    # Y = tsne(visualizeVecs, 2, 200, 20.0);
    # # Plot.scatter(Y[:,0], Y[:,1], 20,labels);
    # # ChineseFont1 = FontProperties('SimHei')
    # for i in xrange(len(visualizeWords)):
    #     # if i<len(visualizeWords)/2:
    #     #     color='green'
    #     # else:
    #     #     color='red'
    #     color = 'red'
    #     plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1))
    # plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0])))
    # plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1])))
    # plt.show()
    
    
    # vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True))
    # temp = temp / vis_norm
    temp = (visualizeVecs - np.mean(visualizeVecs, axis=0))
    covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp)
    U, S, V = np.linalg.svd(covariance)
    coord = temp.dot(U[:, 0:2])
    for i in xrange(len(visualizeWords)):
        print i
        print coord[i, 0]
        print coord[i, 1]
        color = 'red'
        plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1),
                 fontsize=22)  # fontproperties = ChineseFont1
    plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0])))
    plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1])))
    plt.show()
    

      

    运行结果:

  • 相关阅读:
    mysql常用命令汇总
    jmeter操作JDBC
    WEB常见产品问题及预防
    WEB测试常见问题汇总
    java团员信息管理系统
    java图书信息管理系统
    java失业保险信息管理系统
    java商场信息管理系统
    java旅行社网站建设
    java教学进度信息管理系统
  • 原文地址:https://www.cnblogs.com/XBWer/p/6961960.html
Copyright © 2011-2022 走看看