zoukankan      html  css  js  c++  java
  • Apache NiFi

    What?

    https://www.tutorialspoint.com/apache_nifi/index.htm

    一个开源的数据萃取平台。

    Apache NiFi is an open source data ingestion platform. It was developed by NSA and is now being maintained and further development is supported by Apache foundation. It is based on Java, and runs in Jetty server. It is licensed under the Apache license version 2.0. In this tutorial, we will be explaining the basics of Apache NiFi and its features.

    http://nifi.apache.org/docs.html

    Put simply, NiFi was built to automate the flow of data between systems. While the term 'dataflow' is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of the systems consumed data. The problems and solution patterns that emerged have been discussed and articulated extensively. A comprehensive and readily consumed form is found in the Enterprise Integration Patterns [eip].

    The core concepts of NiFi

    http://nifi.apache.org/docs.html

    NiFi’s fundamental design concepts closely relate to the main ideas of Flow Based Programming [fbp]. Here are some of the main NiFi concepts and how they map to FBP:

    NiFi TermFBP TermDescription

    FlowFile

    Information Packet

    A FlowFile represents each object moving through the system and for each one, NiFi keeps track of a map of key/value pair attribute strings and its associated content of zero or more bytes.

    FlowFile Processor

    Black Box

    Processors actually perform the work. In [eip] terms a processor is doing some combination of data routing, transformation, or mediation between systems. Processors have access to attributes of a given FlowFile and its content stream. Processors can operate on zero or more FlowFiles in a given unit of work and either commit that work or rollback.

    Connection

    Bounded Buffer

    Connections provide the actual linkage between processors. These act as queues and allow various processes to interact at differing rates. These queues can be prioritized dynamically and can have upper bounds on load, which enable back pressure.

    Flow Controller

    Scheduler

    The Flow Controller maintains the knowledge of how processes connect and manages the threads and allocations thereof which all processes use. The Flow Controller acts as the broker facilitating the exchange of FlowFiles between processors.

    Process Group

    subnet

    A Process Group is a specific set of processes and their connections, which can receive data via input ports and send data out via output ports. In this manner, process groups allow creation of entirely new components simply by composition of other components.

    架构 - 支持集群

    NiFi Architecture

    NiFi Architecture Diagram

    NiFi executes within a JVM on a host operating system. The primary components of NiFi on the JVM are as follows:

    Web Server

    The purpose of the web server is to host NiFi’s HTTP-based command and control API.

    Flow Controller

    The flow controller is the brains of the operation. It provides threads for extensions to run on, and manages the schedule of when extensions receive resources to execute.

    Extensions

    There are various types of NiFi extensions which are described in other documents. The key point here is that extensions operate and execute within the JVM.

    FlowFile Repository

    The FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is presently active in the flow. The implementation of the repository is pluggable. The default approach is a persistent Write-Ahead Log located on a specified disk partition.

    Content Repository

    The Content Repository is where the actual content bytes of a given FlowFile live. The implementation of the repository is pluggable. The default approach is a fairly simple mechanism, which stores blocks of data in the file system. More than one file system storage location can be specified so as to get different physical partitions engaged to reduce contention on any single volume.

    Provenance Repository

    The Provenance Repository is where all provenance event data is stored. The repository construct is pluggable with the default implementation being to use one or more physical disk volumes. Within each location event data is indexed and searchable.

    NiFi is also able to operate within a cluster.

    NiFi Cluster Architecture Diagram

    Starting with the NiFi 1.0 release, a Zero-Master Clustering paradigm is employed. Each node in a N

    Flow配置示例

    https://github.com/xmlking/nifi-examples

    csv-to-json

    This flow shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText.

     

    decompression

    This flow demonstrates taking an archive that is created with several levels of compression and then continuously decompressing it using a loop until the archived file is extracted out.

    Getting Started with Apache NiFi

    https://nifi.apache.org/docs/nifi-docs/html/getting-started.html#downloading-and-installing-nifi

    Apache NiFi User Guide

    https://nifi.apache.org/docs/nifi-docs/html/user-guide.html

    Tutorial

    读取文件上传到mongo

    https://dzone.com/articles/gentle-introduction-to-apache-nifi-for-dataflow-an

    处理器

    https://nifichina.github.io/general/GettingStarted.html#%E6%9C%89%E5%93%AA%E4%BA%9B%E7%B1%BB%E5%88%AB%E7%9A%84%E5%A4%84%E7%90%86%E5%99%A8

    简单实战

    https://www.cnblogs.com/h--d/p/10079418.html

  • 相关阅读:
    标准输入/输出通道
    不要在纠结负数的表示了
    Coursera公开课-Machine_learing:编程作业7
    Heap堆
    广义表的实现
    二叉树的实现
    模拟实现strstr和strrstr
    栈和队列常考面试题(二)
    栈和队列常考面试题(一)
    vector迭代器失效的几种情况
  • 原文地址:https://www.cnblogs.com/lightsong/p/12688924.html
Copyright © 2011-2022 走看看