zoukankan      html  css  js  c++  java
  • 实时数据分析Realtime data analysis frameworks (or stream system)

    最近的工作中涉及要设计一个系统可以实时的监控系统的状态,比如hadoop任务的执行情况,服务器的健康等。这个系统需要实时的处理对象产生的信息,并发送给用户。

    这个系统显然需要具备如下特性:

    1. 可靠性
    2. 大数据处理
    3. 实时性

    显然这将是一个基于Hadoop上的项目,目前可供参考的有

    Kafka: Kafka is a messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn’s activity stream processing pipeline. Nice talk

    S4: S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.

    Hedwig: Hedwig is a publish-subscribe system designed to carry large amounts of data across the internet in a guaranteed-delivery fashion from those who produce it (publishers) to those who are interested in it (subscribers).

    Storm: Storm is a distributed, reliable, and fault-tolerant stream processing system. Its use cases are so broad that we consider it to be a fundamental new primitive for data processing. Introduction slide

    Flume: Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop’s HDFS.

    Scribe: Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures.

    随着项目的跟进,我会继续更新。


  • 相关阅读:
    HTML
    短信发送平台-阿里大于
    java基础练习题
    2019年让程序员崩溃的 60 个瞬间,笑死我了
    JDBC连接时出现的问题总结
    Java 学习笔记 IO流与File操作
    Java 学习笔记 两大集合框架Map和Collection
    我的github博客地址
    重新认识mapreduce
    java打字游戏
  • 原文地址:https://www.cnblogs.com/ainima/p/6331304.html
Copyright © 2011-2022 走看看