zoukankan      html  css  js  c++  java
  • gensim Load embeddings

    gensim package

    
    from gensim.models.keyedvectors import KeyedVectors
    
    twitter_embedding_path = 'twitter_embedding.emb'
    twitter_vocab_path = 'twitter_model.vocab'
    foursquare_embedding_path = 'foursquare_embedding.emb'
    foursquare_vocab_path = 'foursquare_model.vocab'
    
    # load the embedding vector using gensim
    x_vectors = KeyedVectors.load_word2vec_format(foursquare_embedding_path, binary=False, fvocab=foursquare_vocab_path)
    y_vectors = KeyedVectors.load_word2vec_format(twitter_embedding_path, binary=False, fvocab=twitter_vocab_path)
    
    print('type(x_vectors)', type(x_vectors))
    print('type(x_vectors.vocab)', type(x_vectors.vocab))
    print('type(x_vectors.vocab.keys())', type(x_vectors.vocab.keys()))
    

    Content in 'twitter_embedding.emb':

    5120 64
    BarackObama -0.079930 0.106491 -0.075812 -0.026447 ...
    mashable 0.046692 -0.038019 -0.055519 ...
    ...

    Content in 'twitter_model.vocab':

    BarackObama 3475971
    mashable 2668606
    JonahLupton 2515250
    instagram 2359886
    TheEllenShow 2292545
    cnnbrk 2157283
    nytimes 2141588
    foursquare 2021352

    ...

    Write the embeddings into file

    for writing the embeddings into file
    ref code patch:

    embedding_path = data_path + 'embedding/'
    # ....
    modelX = word2vec.Word2Vec(walkList_x, negative=10, sg=1, hs=0, size=100, window=4, min_count=0, workers=15, iter=30)
    # save the embedding results
    modelX.wv.save_word2vec_format(embedding_path + 'twitter.emb', fvocab=embedding_path + 'twitter.vocab')
    
  • 相关阅读:
    elasticsearch 基础
    docker 安装ElasticSearch的中文分词器IK
    Netty的编解码,粘包拆包,心跳检测机制
    Spring IOC 常用的注解
    @JsonView的使用
    lombok的使用以及其中的坑
    Zuul的高级使用
    SpringBoot 整合 ActiveMq
    转:Maven <resource>标签
    码农经常读错的单词
  • 原文地址:https://www.cnblogs.com/sonictl/p/11220479.html
Copyright © 2011-2022 走看看