zoukankan      html  css  js  c++  java
  • [SAA] 32. Data Engineering

    AWS Batch Overview

    • Run batch jobs as Docker images
    • Dynamic provisioning of the instances (EC2 & Spot Instances) - in VPC
    • Optimal quantity and type based on volume and requirements
    • No need to manage clusters, fully serverless
    • You just pay for the underlying EC2 instance
    • Example: batch process of images, running thousands of concurrent jobs
    • Schedule Batch Jobs using CloudWatch Events
    • Orchestrate Batch Jobs using AWS Step Functions

     

    Lambda vs Batch

    Lambda

    • Time limit: 15 mins
    • Limted runtime
    • Limited temporary disk space
    • Serverless

    Batch

    • No time limit
    • Any runtinme as long as it's package as a Docker image
    • Rely on EBS / instance store for disk space
    • Relies on EC2 (can be managed by AWS)

    Compute Environments

    Managed Compute Environment

    • AWS Batch managed the capacity and instance types within the environment
    • You can choose On-Demand or Spot Instance
    • You can set a maximum price for Spot instance
    • Launched within your own VPC
      • If you launch within your own private subnet, make sure it has access to the ECS service
      • Either using a NAT Gateway / instance or using VPC Endpoint for ECS

    Unmanaged Compute Environment

    • You control and manage instance configuration, provisioning and scaling

     

    Kinesis

    CloudWatch cannot send to Kinesis Data Firehose or Kinesis Data Streams

    Near real-time: Kinesis Data Firehose

    Kinesis agent can directly configured to send data to Kinesis Data Firehose

    Firehose can connect to S3

    Kinesis Data Firehose is near real-time

    Using Lambda to send to ElasticSearch

    Athena

    • Quicksight for visiulization dashboard
    • CloudTrail can stream logs to CloudWatch

    • EMR can choose to use Spot Fleet to control the cost
    • Athena: data must stay in S3
    • Redshift Spectrum for serverless queries on S3

     

     

     

     

  • 相关阅读:
    转:POI操作Excel:cell的背景颜色类型
    在table中tr的display:block在firefox下显示布局错乱问题
    [转]:颜色 16进制对照表
    js时间操作
    SQL 复制数据库里面的表到另一个表
    js 去除空格
    判断一个表单是否被修改过
    判断数据库,函数名,表名,存储过程名称等是否存在
    JS 获取radiobuttonlist checkboxlist的值
    Asp 结合JQuery EasyUI 框架完成的一个增删改查
  • 原文地址:https://www.cnblogs.com/Answer1215/p/15323895.html
Copyright © 2011-2022 走看看