zoukankan      html  css  js  c++  java
  • hadoop elementary course

    导引
    两个主要的问题
    如何存储海量数据
    如何分析海量数据

    Hadoop就是Hadoop项目
    它包括Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Oozie

    Hadoop文件系统适合于有数据流(一次写,多次读)和运行的普通主机上的海量数据
    但是Hadoop文件系统部适合运行延迟性输入,多次写以及随意修改的小文件

    HDFS 框架
    分块:默认64M(很大,因为用于海量数据)
    名字结点:含有文件系统的目录,文件信息以及相应的分块信息(很重要)
    数据结点:储存分块信息
    HA策略:1.x只能有一个名字结点,2.x之后就有针对名字结点的活动-待机模式

    MapReduce
    就是用于处理并行计算海量数据的编程模式
    举个例子,求9个数字的最大值
    第一步,调用map函数得到每三个数的最大值,这三个数都是用Hadoop文件系统的方式储存的
    第二步,用reduce函数得到最大的值

    总结,Hadoop文件系统就是提供储存海量数据在多个主机上的方法,以及相应的策略
    而Mapreduce就是用分而治之的思想来分析数据

    INTRODUCTORY
    the two main question
    first, how to handle the mass data storage - HDFS
    second, how to analyze the mass data - MapReduce

    Hadoop = The Hadoop projects
    including Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Oozie

    Hapood is suitable for very large files which possess streaming date access and run in commodity hardware.
    but hadoop is not proper for small files which have low-latency date access, multiply writer, arbitrary modification.


    HDFS Frame
    Block: default 64M(big, because for mass data)
    NameNode: contain catalogue of the file system, file info and according block info. (crucial)
    DateNode: store block info.
    HA strategy: 1.x just has one NameNode, and after 2.x, there is active-standy pattern of NameNode.


    MapReduce
    which is progroming, using for parallel computation of mass data.
    For example, get max of the nice numbers.
    Firstly, using map function get max of three numbers respectively.
    you know that the data is stored by the HDFS.
    Secondly, using reduce function to get the maximum value.


    In conclusion, the HDFS provide the method that store mess data in many host, incluing some strategy.
    then Mapreduce analyze the data by divide and rule.

  • 相关阅读:
    多线程,死锁,DeadLock
    多线程,Socket,上传文件
    MyBatis自动创建代码
    oracle 11g 监听启动成功后立马自动关闭
    echart 报表图片不展示
    quick easyui ftp 启动报错,bind port faild,maybe another……
    jquery.easyui.min.js:1 Uncaught TypeError: $.fn.validatebox.methods[_43e] is not a function(…)
    无效的列类型
    当eclipse发送报文乱码时,在java代码中发送和接收的地方都改成utf-8编码即可
    ie js new date
  • 原文地址:https://www.cnblogs.com/chuanlong/p/2822933.html
Copyright © 2011-2022 走看看