zoukankan      html  css  js  c++  java
  • Hadoop是什么


    Hadoop是什么?

    关于回答这个问题,先让我们了解下Hadoop的诞生经历。

    Hadoop的灵魂人物Doug Cutting希望Nutch(一款可以取代当时主流搜索产品的开源搜索引擎)项目可以通过一种低开销的方式构建网页中的大量算法,刚開始,Cutting遇到非常多挑战和困难。幸运的是,Google公司发表了两篇论文:一篇论文是The Google File System,介绍怎样实现分布式地存储海量数据;还有一篇论文是Mapreduce:Simplified Data Processing on Large Clusters,介绍怎样对分布式大规模数据进行处理。Cutting在这两篇论文的启示下,基于OSS(Open Source software)的思想实现了这两篇论文中的原理,从而Hadoop诞生了。

    由此观之,Hadoop是一种开源的适合大数据的分布式存储和处理平台。

    Hadoop已归属The Apache Software Foudation 下的一个Project。我们看一下Apache软件组织是怎么介绍Hadoop?

    The Apache Hadoop project develops open-source software for reliable,scalable,distributed computing.

    Apache Hadoop 项目是开发一款可靠的可扩展性的分布式计算开源软件

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines,each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

    The project includes these modules:

    . Hadoop common: The common utilities that support the other hadoop modules.

    . Hadoop Distributed File System(HDFS): A distributed file system that provides high-throughput access to application data.

    . Hadoop YARN: A framework for job scheduling and cluster resource management.

    . Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

    上面这段文字涉及到这些关键信息:一是Hadoop是一个开源的框架;二是Hadoop可以进行大规模数据集地分布式处理;三是Hadoop可以用计算机集群存储海量数据;四是Hadoop可以从单一server扩展到成千上万的server,这些服务都可以提供本地化的存储和计算;五是Hadoop具有可以检測和处理应用层错误的能力;六是Hadoop包含Hadoop common/HDFS/Hadoop YARN/Hadoop MapReduce四个模块,每一个模块负责各自的事务。

    关于“Hadoop是什么?”这个Problem,我还想借用Chuck Lam先生在其书《Hadoop in Action》的描写叙述。“Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop""Hadoop, and large-scale distributed data processing in general, is rapidly becoming an important skill set for many programmers."这些话告诉我们,Hadoop是由Doug Cutting 先生基于Google 的MapReduce system开发的一个开源版本号,Hadoop 已成为很多程序猿的一项重要技能,Hadoop通常适合于大规模分布式数据处理。



    Source:

    1 王路情博客 http://www.wangluqing.com/2014/02/hadoop-what/

    2 The Apache Hadoop 官网  http://hadoop.apache.org/

    3  Google两篇论文:

           The Google File System  

            http://static.googleusercontent.com/media/research.google.com/zh-CN//archive/gfs-sosp2003.pdf

            MapReduce:Simplified Data Processing on Large Clusters  

           http://static.googleusercontent.com/media/research.google.com/zh-CN//archive/mapreduce-osdi04.pdf

    4  Hadoop in Action http://www.manning.com/lam/

    5  Hadoop之父Doug Cutting   http://www.programmer.com.cn/15929/





  • 相关阅读:
    spring读书笔记----Quartz Trigger JobStore出错解决
    Linux:Ubuntu16.04下创建Wifi热点
    Java:IDEA下使用JUNIT
    MYSQL:基础—存储过程
    StackExchange.Redis加载Lua脚本进行模糊查询的批量删除和修改
    EFCore执行Sql语句的方法:FromSql与ExecuteSqlCommand
    .NET Core配置文件加载与DI注入配置数据
    ASP.NET Core实现OAuth2.0的AuthorizationCode模式
    CSS实现的几款不错的菜单栏
    开发VS2008 AddIn 入门Sample
  • 原文地址:https://www.cnblogs.com/mengfanrong/p/3802332.html
Copyright © 2011-2022 走看看