zoukankan      html  css  js  c++  java
  • streamsets origin 说明

    origin 是streamsets pipeline的soure 入口,只能应用一个origin 在pipeline中,
    对于运行在不同执行模式的pipeline 可以应用不同的origin

    • 独立模式
    • 集群模式
    • edge模式(agent)
    • 开发模式(方便测试)

    standalone(独立模式)组件

    In standalone pipelines, you can use the following origins:

    • Amazon S3 - Reads objects from Amazon S3.
    • Amazon SQS Consumer - Reads data from queues in Amazon Simple Queue Services (SQS).
    • Azure IoT/Event Hub Consumer - Reads data from Microsoft Azure Event Hub. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • CoAP Server - Listens on a CoAP endpoint and processes the contents of all authorized CoAP requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • Directory - Reads fully-written files from a directory. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • Elasticsearch - Reads data from an Elasticsearch cluster. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • File Tail - Reads lines of data from an active file after reading related archived files in the directory.
    • Google BigQuery - Executes a query job and reads the result from Google BigQuery.
    • Google Cloud Storage - Reads fully written objects from Google Cloud Storage.
    • Google Pub/Sub Subscriber - Consumes messages from a Google Pub/Sub subscription. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • Hadoop FS Standalone - Reads fully-written files from HDFS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • HTTP Client - Reads data from a streaming HTTP resource URL.
    • HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • HTTP to Kafka (Deprecated) - Listens on a HTTP endpoint and writes the contents of all authorized HTTP POST requests directly to Kafka.
    • JDBC Multitable Consumer - Reads database data from multiple tables through a JDBC connection. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • JDBC Query Consumer - Reads database data using a user-defined SQL query through a JDBC connection.
    • JMS Consumer - Reads messages from JMS.
    • Kafka Consumer - Reads messages from a single Kafka topic.
    • Kafka Multitopic Consumer - Reads messages from multiple Kafka topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • Kinesis Consumer - Reads data from Kinesis Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • MapR DB CDC - Reads changed MapR DB data that has been written to MapR Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • MapR DB JSON - Reads JSON documents from MapR DB JSON tables.
    • MapR FS - Reads files from MapR FS.
    • MapR FS Standalone - Reads fully-written files from MapR FS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • MapR Multitopic Streams Consumer - Reads messages from multiple MapR Streams topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • MapR Streams Consumer - Reads messages from MapR Streams.
    • MongoDB - Reads documents from MongoDB.
    • MongoDB Oplog - Reads entries from a MongoDB Oplog.
    • MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
    • MySQL Binary Log - Reads MySQL binary logs to generate change data capture records.
    • Omniture - Reads web usage reports from the Omniture reporting API.
    • OPC UA Client - Reads data from a OPC UA server.
    • Oracle CDC Client - Reads LogMiner redo logs to generate change data capture records.
    • PostgreSQL CDC Client - Reads PostgreSQL WAL data to generate change data capture records.
    • RabbitMQ Consumer - Reads messages from RabbitMQ.
    • Redis Consumer - Reads messages from Redis.
    • REST Service - Listens on an HTTP endpoint, parses the contents of all authorized requests, and sends responses back to the originating REST API. Creates multiple threads to enable parallel processing in a multithreaded pipeline. Use as part of a microservice pipeline.
    • Salesforce - Reads data from Salesforce.
    • SDC RPC - Reads data from an SDC RPC destination in an SDC RPC pipeline.
    • SDC RPC to Kafka (Deprecated) - Reads data from an SDC RPC destination in an SDC RPC pipeline and writes it to Kafka.
    • SFTP/FTP Client - Reads files from an SFTP or FTP server.
    • SQL Server CDC Client - Reads data from Microsoft SQL Server CDC tables. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • SQL Server Change Tracking - Reads data from Microsoft SQL Server change tracking tables and generates the latest version of each record. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • TCP Server - Listens at the specified ports and processes incoming data over TCP/IP connections. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • UDP Multithreaded Source - Reads messages from one or more UDP ports. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
    • UDP Source - Reads messages from one or more UDP ports.
    • UDP to Kafka (Deprecated) - Reads messages from one or more UDP ports and writes the data to Kafka.
    • WebSocket Client - Reads data from a WebSocket server endpoint.
    • WebSocket Server - Listens on a WebSocket endpoint and processes the contents of all authorized WebSocket client requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.

    集群模式的组件

    In cluster pipelines, you can use the following origins:

    • Hadoop FS - Reads data from HDFS, Amazon S3, or other file systems using the Hadoop FileSystem interface.
    • Kafka Consumer - Reads messages from Kafka. Use the cluster version of the origin.
    • MapR FS - Reads data from MapR FS.
    • MapR Streams Consumer - Reads messages from MapR Streams.

    edge 模式

    In edge pipelines, you can use the following origins:

    • Directory - Reads fully-written files from a directory.
    • File Tail - Reads lines of data from an active file after reading related archived files in the directory.
    • HTTP Client - Reads data from a streaming HTTP resource URL.
    • HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests.
    • MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
    • System Metrics - Reads system metrics from the edge device where SDC Edge is installed.
    • WebSocket Client - Reads data from a WebSocket server endpoint.
    • Windows Event Log - Reads data from a Microsoft Windows event log located on a Windows machine.

    开发模式

    To help create or test pipelines, you can use the following development origins:

    • Dev Data Generator
    • Dev Random Source
    • Dev Raw Data Source
    • Dev SDC RPC with Buffering
    • Dev Snapshot Replaying
    • Sensor Reader

    参考资料

    https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Origins/Origins_overview.html#concept_hpr_twm_jq__section_tvn_4bc_f2b

  • 相关阅读:
    Java基础知识回顾-20(泛型)
    Java基础知识回顾-19(Collect接口,Iterator迭代器与增强for循环)
    Java基础知识回顾-18(Math类,Arrays类和大数据运算)
    Java基础知识回顾-17(基本类型包装类与System类)
    Java基础知识回顾-16(Date,DateFormat和Calendar)
    PSP DAILY软件功能说明书
    第六周PSP
    王者荣耀交流协会第二次Scrum立会
    找bug——加分作业
    第五周PSP
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/9505324.html
Copyright © 2011-2022 走看看