zoukankan      html  css  js  c++  java
  • lecture 1

    1. not double pass, all homeworks submitted through give. All exams questions are short answers

    2. consultation is Friday through zoom from 1 to 2

    3. major characteristics of big data: volume, variety, velocity, value, visibility, variability, veracity

    数量(size),种类,速度,值,可见性,可变性,准确性

    4. volume

    quantity of data being created from all sources

    带来的问题是存储空间不够以及time complexity

    volume增加cost增加

    5. variety

    1) different types:

    relational data( tables ransactions ) with structures, fixed schema

    text data( books, reports) unstructures

    semi-structured data(JSON, XML ) has orgnizations as well

    graph data( social network, RDF )

    imagevideo data( Instagram, Youtube )

    2) different sources

    实际应用中,一个app会有多种来源或由不同种类的信息组合成的

    data integration的主要问题为Heterogeneous,data integration将来源不用的信息组合成一个独特的view,Heterogeneous是指the schema of view is different with each other。传统的解决方法是schema mapping。解决难度和时间复杂度与the level of heterogentity和data sources有关。另一个问题是record linkage in variety data,指wether 2 records refer to the same entity or not,需要我们尽可能详尽的使用来源不同的各种不同信息

    data curation指organization and integration of data collected from various sources,可能出现的问题是long tail of data variety

    data curation即使数据更有序可以减少long tail

    6. velocity( speed )

    很多应用需要及时回馈

    需要解决的是batch processing, real time processing与transmission

  • 相关阅读:
    mongodb里释放空间相关问题解决方案
    php计算多个集合的笛卡尔积实例详解
    Linux系统下,在文件中查找某个字符串
    Php中文件下载功能实现超详细流程分析
    jquery获取一组文本框的值
    C#找不到ConfigurationManager类
    php获取当前时间的毫秒数
    随机打乱一个数组
    mysql 语法积累
    linq给list集合数据分页
  • 原文地址:https://www.cnblogs.com/eleni/p/13032018.html
Copyright © 2011-2022 走看看