zoukankan      html  css  js  c++  java
  • Link-based Classification相关数据集

    Link-based Classification相关数据集

    Datasets

    Document Classification Datasets:

    •  CiteSeer: The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.
    •  Cora: The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.
    •  WebKB: The WebKB dataset consists of 877 scientific publications classified into one of five classes. The citation network consists of 1608 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1703 unique words. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.

    Social Network Datasets:

    • Terrorists: This dataset contains information about terrorists and their relationships. Unlike the previous datasets, this dataset was designed for classification experiments aimed at classifying the relationships among terrorists. The dataset contains 851 relationships, each described by a 0/1-valued vector of attributes where each entry indicates the absence/presence of a feature. There are a total of 1224 distinct features. Each relationship can be assigned one or more labels out of a maximum of four labels making this dataset suitable for multi-label classification tasks. The README file provides more details. Click here to download the tarball containing the dataset.
    • Terrorist Attacks: This dataset consists of 1293 terrorist attacks each assigned one of 6 labels indicating the type of the attack. Each attack is described by a 0/1-valued vector of attributes whose entries indicate the absence/presence of a feature. There are a total of 106 distinct features. The files in the dataset can be used to create two distinct graphs. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.

    更多  http://www.cs.umd.edu/~sen/lbc-proj/LBC.html

  • 相关阅读:
    [算法]位运算问题之二
    [算法]位运算问题之一
    [算法]海量数据问题之二
    [算法]海量数据问题之一
    [算法]旋转词问题
    [算法]去掉字符串中连续出现的k个0子串
    [算法]字符串中数字子串的求和
    [算法]字符串之变形词问题
    Linux常用命令
    数据库中的事物
  • 原文地址:https://www.cnblogs.com/celia01/p/4645761.html
Copyright © 2011-2022 走看看