zoukankan      html  css  js  c++  java
  • 数据集(二)

    1、气候监测数据集 http://cdiac.ornl.gov/ftp/ndp026b

    2、几个实用的测试数据集下载的网站

       Data for MATLAB hackers (Handwritten Digits、Faces、Text)

       http://www.cs.toronto.edu/~roweis/data.html

    3、UCI KDD Archive(各类数据集)

       http://kdd.ics.uci.edu/summary.task.type.html

       http://kdd.ics.uci.edu/summary.data.type.html

    4、UCI收集的机器学习数据集

       ftp://pami.sjtu.edu.cn/  

       http://www.ics.uci.edu/~mlearn//MLRepository.htm  

    5、样本数据库

       http://kdd.ics.uci.edu/

       WWW-pages were manually classified

       http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-20/www/data/  

    6、CMU World Wide Knowledge Base (Web->KB) project(classified web pages、relational data describing pages and hyperlinks)

       http://www-2.csNaNu.edu/afs/csNaNu.edu/project/theo-11/www/wwkb/  

    7、人工智能机器学习

       http://duch-links.wikispaces.com/

    8、文本分类,即rainbow的数据集

       http://www-2.csNaNu.edu/afs/cs/project/theo-11/www/naive-bayes.html  

    9、Statlib 数理统计相关程序库

       http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm

       http://lib.statNaNu.edu/

       http://lib.statNaNu.edu/datasets/

       http://lib.statNaNu.edu/modules.php?op=modload&name=Downloads&file=index&req=viewdownload&cid=2

    10、癌症基因:

       http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

    11、金融、医药数据:

       http://lisp.vse.cz/pkdd99/Challenge/chall.htm

    12、时间序列数据的网址

       http://www.stat.wisc.edu/~reinsel/bjr-data/  

    13、kdnuggets 相关链接各种数据集:

       http://www.kdnuggets.com/datasets/index.html

    14、德国智能分析和信息系统

       http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html  

       http://dctc.sjtu.edu.cn/adaptive/datasets/  

       http://fimi.cs.helsinki.fi/data/  

    15、IBM智能信息

       http://www-958.ibm.com/software/data/cognos/manyeyes/datasets

       http://www.almaden.ibm.com/software/quest/Resources/index.shtml

    16、Frequent Set Counting

       http://miles.cnuce.cnr.it/~palmeri/datam/DCI/datasets.php

    17、评分数据集

      Movielens 电影评分数据

       基本数据描述:包括以下三个数据集:

       a.943个用户对1682个电影的10万条评分

       b.6040个用户对3900个电影的1百万条评分

       c.71567个用户对10681个电影的1千万条评分

       http://www.grouplens.org/  

       Book-Crossing 书籍评分数据

       基本数据描述:包含了278,858个用户对271,379本书籍的1,149,780条评分。该数据集由Cai-Nicolas Ziegler 在2004年8-9月用4周的时间从Book-Crossing社区用网络爬出。

       http://www.informatik.uni-freiburg.de/~cziegler/BX/

      Jester Joke Data Set 笑话评分集合

       来自UC Berkeley的Ken Goldberg发布的一个推荐系统使用的数据集。包含关于100个笑话的73,496名用户评分的410万条连续评分。

       http://www.ieor.berkeley.edu/~goldberg/jester-data/

      Netflix 数据集

       也是电影评分数据集,480,189 个用户,17,770 部电影,100,480,507 条评分记录。与它相比,MovieLens 数据集少了 2 个数量级。它的位置相信会逐渐被 Netflix 数据所替代,这是时代进步的必然结果。

       说明:以上四个均为用户评分数据

    18、GPS轨迹数据

       GeoLife GPS Trajectories

       http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/default.aspx  

       GPS Trajectories with transportation mode labels

       http://research.microsoft.com/apps/pubs/?id=141896

       Movebank 动物轨迹

       http://www.movebank.org/

    19、手机WIFI蓝牙

    A Community Resource for Archiving Wireless Data At Dartmouth

       http://crawdad.cs.dartmouth.edu/

       crowflow  手机和wifi轨迹

       http://crowdflow.net/

    20、OpenStreetMap Data

       planet.openstreetmap.org 或者 http://metro.teczno.com/

    21、openpath上传数据+API

       https://openpaths.cc/  

    22、FOURSQUARE

    23、GeoTime

       http://www.geotime.com/GeoTime(s)/January-2012/Cupid-Strikes-Again--Time-Series---GIS--Together-a.aspx  

    24、数据堂

       http://www.datatang.com/

    25、http://www.kdnuggets.com/datasets/

    26、http://appsrv.cse.cuhk.edu.hk/~kdd/data_collection.html

    IBM Almaden Research Center Data Mining Projects

    Data Sets:

    ·         Synthetic Data Generation Code for Associations and Sequential Patterns

    ·         Synthetic Data Generation Code for Classification

    ·         "Dense" Data-Sets (apriori binary format, 3.2Mb)

    ·         Enron Email Data Set

    Demos:

    ·         General Visualizations for Associations

    ·         Visualization Demo: Market Basket Analysis

    IBM Intelligent Miner:

    ·         IBM Intelligent Miner for Data

    ·         Video and image clips from IBM Data Mining T.V. Ad

    IBM Data Mining Resources:

    ·         Business Intelligence Solutions   Our colleagues offering data mining consultancy and services.

    ·         Data Abstraction Research Group   Our colleagues in IBM Thomas J. Watson Research Center.   Our colleagues in France.

    ·         Data Mining: Extending the Information Warehouse Framework   IBM White Paper on Data Mining.

    在下面的网址可以找到reuters数据集

       http://www.research.att.com/~lewis/reuters21578.html

    关于基金的数据挖掘的网站

       http://www.gotofund.com/index.asp

       http://lans.ece.utexas.edu/~strehl/

    reuters数据集

       http://www.research.att.com/~lewis/reuters21578.html

       http://www-2.csNaNu.edu/webkb

       http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf

    关联:

       http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar

       http://www.phys.uni.torun.pl/~duch/software.html

    WEKA:

       http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar  

    1。A jarfile containing 37 classification problems, originally obtained from the UCI repository

       http://prdownloads.sourceforge.net/weka/datasets-UCI.jar  

    2。A jarfile containing 37 regression problems, obtained from various sources

       http://prdownloads.sourceforge.net/weka/datasets-numeric.jar  

    3。A jarfile containing 30 regression datasets collected by Luis Torgo

       http://prdownloads.sourceforge.net/weka/regression-datasets.jar  

    数据挖掘相关比赛以及数据集

    • 2005 University of California data mining contest, predicting bad accounts and their churn date using real-world CRM data, deadline June 30, 2005.

    • ILP 2005 Challenge, on the prediction of functional classes of genes.

    • KDD Cup 2005, on classifying internet user search queries, deadline July 8.

    • Data Mining Cup 2005 (Chemnitz, Germany), for students; topic: How data mining can ascertain the risk of loss of payments and reduce this risk.

    •  KDD Cup 2004, focuses on data-mining for a several performance criteria using datasets frombioinformatics and quantum physics.

    •  InfoVis 2004 Contest, The History of InfoVis.

    • DATA MINING CUP 2004 (Chemnitz, Germany), for students.

    • InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees, results announced Sep 5, 2003.

    • KDD CUP 2003

    •  http://www.cs.cornell.edu/projects/kddcup/index.html

    •  KDD Cup 2003, focuses on problems motivated by network mining and the analysis of usage logs.

    • DATA MINING CUP 2003 (Chemnitz, Germany). The task is to identify spam emails before they reach the user′s mailbox.

    •  KDD Cup 2002, focus on data mining in molecular biology.

    •  Student Data Mining Cup (2002), Chemnitz University and Prudential Systems.

  • 相关阅读:
    什么才是java的基础知识?
    Java的背景、影响及前景
    设计模式分类
    关于日期及日期格式转换的记录
    添加同名工具后台验证后不跳转且保留用户输入的数值
    案件讨论回复中出现把多个附件当成一个评论显示,导致分页出错
    指令发布详情弹窗实现“取消”按钮
    最大间隔分离超平面的唯一性完整证明
    CART算法与剪枝原理
    Spark MLlib学习
  • 原文地址:https://www.cnblogs.com/codeOfLife/p/6773825.html
Copyright © 2011-2022 走看看