zoukankan      html  css  js  c++  java
  • Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud

    The Microsoft Research Outreach team has worked extensively with the external research community to enable adoption of cloud-based research infrastructure over the past few years. Through this process, we experienced the ubiquity of Jim Gray’s fourth paradigm of discovery based on data-intensive science – that is, almost all research projects have a data component to them. This data deluge also demonstrated a clear need for curated and meaningful datasets in the research community, not only in computer science but also in interdisciplinary and domain sciences.

    Today we are excited to launch Microsoft Research Open Data – a new data repository in the cloud dedicated to facilitating collaboration across the global research community. Microsoft Research Open Data, in a single, convenient, cloud-hosted location, offers datasets representing many years of data curation and research efforts by Microsoft that were used in published research studies.

    Why we are investing in this

    The goal is to provide a simple platform to Microsoft researchers and collaborators to share datasets and related research technologies and tools. Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research. We will continue to shape and grow this repository and add features based on feedback from the community.
    We recognize that there are dozens of data repositories already in use by researchers and expect that the capabilities of this repository will augment existing efforts.

    Figure 1 – Dataset in Microsoft Research Open Data

    “This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing”
    -Sam Madden, Professor, Massachusetts Institute of Technology

    With data growing at an exponential rate, perceived to be over 150 ZB of data available by 2025, it is now recognized that we need to prioritize bringing processing to data versus relying on data movement through Internet bandwidth that is growing at a much slower pace. We believe that there is real utility in providing an option to bring the processing to the data. Therefore, in addition to providing an option to download the data assets, users can also copy datasets directly to an Azure based Data Science virtual machine, as shown in Figure 2.

    Figure 2 – Data copied from microsoftopendata.com to an Azure based Linux virtual machine

    The Data Science virtual machine comes preloaded with a variety of development tools popular with researchers and practitioners as can been seen in Figure 3.

    Figure 3 Linux Data Science virtual machine

    “I am often asked to share my research data and the public sharing I have done in the past has been popular. Coordinating and cataloging these datasets in one place with Azure will be helpful for both internal and external researchers, giving them easy access, encouraging collaboration, and providing convenient cloud-based access to the wealth of Microsoft Research shared data.” 
    -John Krumm, Principal Researcher, Microsoft Research AI

    Datasets in Microsoft Research Open Data are categorized by their primary research area, as shown in Figure 4. You can find links to research projects or publications with the dataset. You can browse available datasets and download them or copy them directly to an Azure subscription through an automated workflow. To the extent possible, the repository meets the highest standards for data sharing to ensure that datasets are findable, accessible, interoperable and reusable; the entire corpus does not contain personally identifiable information. The site will continue to evolve as we get feedback from users.

    Figure 4 – Dataset Categories

    Microsoft Research Open Data is an outcome of the Microsoft Research Outreach Data science program and was made possible by a collaboration between many teams at Microsoft, Microsoft researchers, our industry partners, and our academic advisors.

    We would love to hear your comments and feedback! Please send us a note via the Feedback feature on the sitehttp://microsoftopendata.com and tell us what you think.

  • 相关阅读:
    ES6 学习笔记(整理一遍阮一峰大神得入门文档,纯自己理解使用)
    怪异模式和标准模式
    计算机网络七层协议模型 “开放系统互联参考模型”,即著名的OSI/RM模型(Open System Interconnection/Reference Model)
    流行得前端构建工具比较,以及gulp配置
    谈谈刚接触sea.js框架得看法
    MAC终端安装grunt--javascript世界得构建工具
    js的数组与对象关系
    JavaScript中的setInterval用法
    每周一题:平方数之和((更新JS)
    每周一题:拿硬币(更新JS)
  • 原文地址:https://www.cnblogs.com/Javi/p/9227618.html
Copyright © 2011-2022 走看看