zoukankan html css js c++ java

gensim Load embeddings

gensim package


from gensim.models.keyedvectors import KeyedVectors

twitter_embedding_path = 'twitter_embedding.emb'
twitter_vocab_path = 'twitter_model.vocab'
foursquare_embedding_path = 'foursquare_embedding.emb'
foursquare_vocab_path = 'foursquare_model.vocab'

# load the embedding vector using gensim
x_vectors = KeyedVectors.load_word2vec_format(foursquare_embedding_path, binary=False, fvocab=foursquare_vocab_path)
y_vectors = KeyedVectors.load_word2vec_format(twitter_embedding_path, binary=False, fvocab=twitter_vocab_path)

print('type(x_vectors)', type(x_vectors))
print('type(x_vectors.vocab)', type(x_vectors.vocab))
print('type(x_vectors.vocab.keys())', type(x_vectors.vocab.keys()))

Content in 'twitter_embedding.emb':

5120 64
BarackObama -0.079930 0.106491 -0.075812 -0.026447 ...
mashable 0.046692 -0.038019 -0.055519 ...
...

Content in 'twitter_model.vocab':

BarackObama 3475971
mashable 2668606
JonahLupton 2515250
instagram 2359886
TheEllenShow 2292545
cnnbrk 2157283
nytimes 2141588
foursquare 2021352

...

Write the embeddings into file

for writing the embeddings into file
ref code patch:

embedding_path = data_path + 'embedding/'
# ....
modelX = word2vec.Word2Vec(walkList_x, negative=10, sg=1, hs=0, size=100, window=4, min_count=0, workers=15, iter=30)
# save the embedding results
modelX.wv.save_word2vec_format(embedding_path + 'twitter.emb', fvocab=embedding_path + 'twitter.vocab')

查看全文

相关阅读:
c++的输入流基础知识
 用英文加优先级来解读C的声明
 django 用imagefiled访问图片
 关于Django中的表单验证
 c#语言的一些复习
 IIS发布的网站用localhost可以访问，改成IP就无法访问的解决方案 .
首次关于IIS配置遇到的一些问题
 常见dos操作
 vs2012中对于entity framework的使用
 几个知识点

原文地址：https://www.cnblogs.com/sonictl/p/11220479.html