zoukankan      html  css  js  c++  java
  • [SAA] 32. Data Engineering

    AWS Batch Overview

    • Run batch jobs as Docker images
    • Dynamic provisioning of the instances (EC2 & Spot Instances) - in VPC
    • Optimal quantity and type based on volume and requirements
    • No need to manage clusters, fully serverless
    • You just pay for the underlying EC2 instance
    • Example: batch process of images, running thousands of concurrent jobs
    • Schedule Batch Jobs using CloudWatch Events
    • Orchestrate Batch Jobs using AWS Step Functions

     

    Lambda vs Batch

    Lambda

    • Time limit: 15 mins
    • Limted runtime
    • Limited temporary disk space
    • Serverless

    Batch

    • No time limit
    • Any runtinme as long as it's package as a Docker image
    • Rely on EBS / instance store for disk space
    • Relies on EC2 (can be managed by AWS)

    Compute Environments

    Managed Compute Environment

    • AWS Batch managed the capacity and instance types within the environment
    • You can choose On-Demand or Spot Instance
    • You can set a maximum price for Spot instance
    • Launched within your own VPC
      • If you launch within your own private subnet, make sure it has access to the ECS service
      • Either using a NAT Gateway / instance or using VPC Endpoint for ECS

    Unmanaged Compute Environment

    • You control and manage instance configuration, provisioning and scaling

     

    Kinesis

    CloudWatch cannot send to Kinesis Data Firehose or Kinesis Data Streams

    Near real-time: Kinesis Data Firehose

    Kinesis agent can directly configured to send data to Kinesis Data Firehose

    Firehose can connect to S3

    Kinesis Data Firehose is near real-time

    Using Lambda to send to ElasticSearch

    Athena

    • Quicksight for visiulization dashboard
    • CloudTrail can stream logs to CloudWatch

    • EMR can choose to use Spot Fleet to control the cost
    • Athena: data must stay in S3
    • Redshift Spectrum for serverless queries on S3

     

     

     

     

  • 相关阅读:
    如何查找并启动 Reporting Services 工具
    数据压缩技术
    压缩算法
    新版压缩库发布
    如何处理海量数据
    安卓手机获得Root权限
    安卓项目的源码
    压缩算法1
    ODBC, OLEDB, ADO, ADO.Net的演化简史
    C# 文件压缩与解压(ZIP格式)
  • 原文地址:https://www.cnblogs.com/Answer1215/p/15323895.html
Copyright © 2011-2022 走看看