zoukankan      html  css  js  c++  java
  • gensim Load embeddings

    gensim package

    
    from gensim.models.keyedvectors import KeyedVectors
    
    twitter_embedding_path = 'twitter_embedding.emb'
    twitter_vocab_path = 'twitter_model.vocab'
    foursquare_embedding_path = 'foursquare_embedding.emb'
    foursquare_vocab_path = 'foursquare_model.vocab'
    
    # load the embedding vector using gensim
    x_vectors = KeyedVectors.load_word2vec_format(foursquare_embedding_path, binary=False, fvocab=foursquare_vocab_path)
    y_vectors = KeyedVectors.load_word2vec_format(twitter_embedding_path, binary=False, fvocab=twitter_vocab_path)
    
    print('type(x_vectors)', type(x_vectors))
    print('type(x_vectors.vocab)', type(x_vectors.vocab))
    print('type(x_vectors.vocab.keys())', type(x_vectors.vocab.keys()))
    

    Content in 'twitter_embedding.emb':

    5120 64
    BarackObama -0.079930 0.106491 -0.075812 -0.026447 ...
    mashable 0.046692 -0.038019 -0.055519 ...
    ...

    Content in 'twitter_model.vocab':

    BarackObama 3475971
    mashable 2668606
    JonahLupton 2515250
    instagram 2359886
    TheEllenShow 2292545
    cnnbrk 2157283
    nytimes 2141588
    foursquare 2021352

    ...

    Write the embeddings into file

    for writing the embeddings into file
    ref code patch:

    embedding_path = data_path + 'embedding/'
    # ....
    modelX = word2vec.Word2Vec(walkList_x, negative=10, sg=1, hs=0, size=100, window=4, min_count=0, workers=15, iter=30)
    # save the embedding results
    modelX.wv.save_word2vec_format(embedding_path + 'twitter.emb', fvocab=embedding_path + 'twitter.vocab')
    
  • 相关阅读:
    c++的输入流基础知识
    用英文加优先级来解读C的声明
    django 用imagefiled访问图片
    关于Django中的表单验证
    c#语言的一些复习
    IIS发布的网站用localhost可以访问,改成IP就无法访问的解决方案 .
    首次关于IIS配置遇到的一些问题
    常见dos操作
    vs2012中对于entity framework的使用
    几个知识点
  • 原文地址:https://www.cnblogs.com/sonictl/p/11220479.html
Copyright © 2011-2022 走看看