zoukankan      html  css  js  c++  java
  • python使用deepwalk模型算节点相似度

    待整理
    github:https://github.com/prateekjoshi565/DeepWalk
    方法:
    https://blog.csdn.net/gdh756462786/article/details/79108665/

    一、直接依赖requirements.txt会有问题,

    ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec' 

    需要把gensim的版本改成3.8.3

     

    二、具体过程

    下载源代码
    https://github.com/phanein/deepwalk

    数据集的定义
    http://leitang.net/social_dimension.html

    核心代码

    walks = graph.build_deepwalk_corpus(G, num_paths=args.number_walks, path_length=args.walk_length, alpha=0, rand=random.Random(args.seed))
    
    print("Training...")
    
    model = Word2Vec(walks, size=args.representation_size, window=args.window_size, min_count=0, workers=args.workers)


    安装

    cd deepwalk-master
    pip install -r requirements.txt
    python setup.py install


    复现试验结果
    1. BlogCatalog dataset

    生成Embedding

    deepwalk --format mat --input example_graphs/blogcatalog.mat --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/blogcatalog.embeddings


    评估

    python example_graphs/scoring.py --emb example_graphs/blogcatalog.embeddings --network example_graphs/blogcatalog.mat --num-shuffle 10 --all


    2. Karate dataset

    生成Embedding

    --format默认.adjlist文件

    deepwalk --input example_graphs/karate.adjlist --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/karate.embeddings


    评估

    --network需要.mat文件

    option如下:

    usage: scoring [-h] --emb EMB --network NETWORK
    [--adj-matrix-name ADJ_MATRIX_NAME]
    [--label-matrix-name LABEL_MATRIX_NAME]
    [--num-shuffles NUM_SHUFFLES] [--all]

    optional arguments:
    -h, --help show this help message and exit
    --emb EMB Embeddings file (default: None)
    --network NETWORK A .mat file containing the adjacency matrix and node
    labels of the input network. (default: None)
    --adj-matrix-name ADJ_MATRIX_NAME
    Variable name of the adjacency matrix inside the .mat
    file. (default: network)
    --label-matrix-name LABEL_MATRIX_NAME
    Variable name of the labels matrix inside the .mat
    file. (default: group)
    --num-shuffles NUM_SHUFFLES
    Number of shuffles. (default: 2)
    --all The embeddings are evaluated on all training percents
    from 10 to 90 when this flag is set to true. By
    default, only training percents of 10, 50 and 90 are
    used. (default: False)





    参考:https://blog.csdn.net/YizhuJiao/article/details/81095346

    github:https://github.com/phanein/deepwalk

  • 相关阅读:
    2.1 JavaScript应用开发实践指南
    2 JavaScript应用开发实践指南
    一 JavaScript应用开发实践指南
    工作“触雷”经历与总结--记博弈论的应用
    设计模式之 简单工厂,工厂方法,抽象工厂
    C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是TAP(Task-based Asynchronous Pattern, 基于任务的异步模式)
    C语言知识结构
    Visual Studio 项目和解决方案 路径修改(解决部分模板丢失的问题)
    C#静态方法和实例方法的内存分配测试
    Java字符串String
  • 原文地址:https://www.cnblogs.com/StarZhai/p/15545387.html
Copyright © 2011-2022 走看看