zoukankan      html  css  js  c++  java
  • python使用deepwalk模型算节点相似度

    待整理
    github:https://github.com/prateekjoshi565/DeepWalk
    方法:
    https://blog.csdn.net/gdh756462786/article/details/79108665/

    一、直接依赖requirements.txt会有问题,

    ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec' 

    需要把gensim的版本改成3.8.3

     

    二、具体过程

    下载源代码
    https://github.com/phanein/deepwalk

    数据集的定义
    http://leitang.net/social_dimension.html

    核心代码

    walks = graph.build_deepwalk_corpus(G, num_paths=args.number_walks, path_length=args.walk_length, alpha=0, rand=random.Random(args.seed))
    
    print("Training...")
    
    model = Word2Vec(walks, size=args.representation_size, window=args.window_size, min_count=0, workers=args.workers)


    安装

    cd deepwalk-master
    pip install -r requirements.txt
    python setup.py install


    复现试验结果
    1. BlogCatalog dataset

    生成Embedding

    deepwalk --format mat --input example_graphs/blogcatalog.mat --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/blogcatalog.embeddings


    评估

    python example_graphs/scoring.py --emb example_graphs/blogcatalog.embeddings --network example_graphs/blogcatalog.mat --num-shuffle 10 --all


    2. Karate dataset

    生成Embedding

    --format默认.adjlist文件

    deepwalk --input example_graphs/karate.adjlist --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/karate.embeddings


    评估

    --network需要.mat文件

    option如下:

    usage: scoring [-h] --emb EMB --network NETWORK
    [--adj-matrix-name ADJ_MATRIX_NAME]
    [--label-matrix-name LABEL_MATRIX_NAME]
    [--num-shuffles NUM_SHUFFLES] [--all]

    optional arguments:
    -h, --help show this help message and exit
    --emb EMB Embeddings file (default: None)
    --network NETWORK A .mat file containing the adjacency matrix and node
    labels of the input network. (default: None)
    --adj-matrix-name ADJ_MATRIX_NAME
    Variable name of the adjacency matrix inside the .mat
    file. (default: network)
    --label-matrix-name LABEL_MATRIX_NAME
    Variable name of the labels matrix inside the .mat
    file. (default: group)
    --num-shuffles NUM_SHUFFLES
    Number of shuffles. (default: 2)
    --all The embeddings are evaluated on all training percents
    from 10 to 90 when this flag is set to true. By
    default, only training percents of 10, 50 and 90 are
    used. (default: False)





    参考:https://blog.csdn.net/YizhuJiao/article/details/81095346

    github:https://github.com/phanein/deepwalk

  • 相关阅读:
    Advanced Developer's Blog
    图片文字识别
    Unit test resources
    SpringBoot-mvn插件
    flask中使用proto3
    QTA-qtaf自动化测试实践
    AttributeError: module 'virtualenv' has no attribute 'create_environment'
    qtaf dick 报错 NameError: name 'dict_values' is not defined
    24点python实现
    mysql在win下移植
  • 原文地址:https://www.cnblogs.com/StarZhai/p/15545387.html
Copyright © 2011-2022 走看看