zoukankan      html  css  js  c++  java
  • [SAA] 32. Data Engineering

    AWS Batch Overview

    • Run batch jobs as Docker images
    • Dynamic provisioning of the instances (EC2 & Spot Instances) - in VPC
    • Optimal quantity and type based on volume and requirements
    • No need to manage clusters, fully serverless
    • You just pay for the underlying EC2 instance
    • Example: batch process of images, running thousands of concurrent jobs
    • Schedule Batch Jobs using CloudWatch Events
    • Orchestrate Batch Jobs using AWS Step Functions

     

    Lambda vs Batch

    Lambda

    • Time limit: 15 mins
    • Limted runtime
    • Limited temporary disk space
    • Serverless

    Batch

    • No time limit
    • Any runtinme as long as it's package as a Docker image
    • Rely on EBS / instance store for disk space
    • Relies on EC2 (can be managed by AWS)

    Compute Environments

    Managed Compute Environment

    • AWS Batch managed the capacity and instance types within the environment
    • You can choose On-Demand or Spot Instance
    • You can set a maximum price for Spot instance
    • Launched within your own VPC
      • If you launch within your own private subnet, make sure it has access to the ECS service
      • Either using a NAT Gateway / instance or using VPC Endpoint for ECS

    Unmanaged Compute Environment

    • You control and manage instance configuration, provisioning and scaling

     

    Kinesis

    CloudWatch cannot send to Kinesis Data Firehose or Kinesis Data Streams

    Near real-time: Kinesis Data Firehose

    Kinesis agent can directly configured to send data to Kinesis Data Firehose

    Firehose can connect to S3

    Kinesis Data Firehose is near real-time

    Using Lambda to send to ElasticSearch

    Athena

    • Quicksight for visiulization dashboard
    • CloudTrail can stream logs to CloudWatch

    • EMR can choose to use Spot Fleet to control the cost
    • Athena: data must stay in S3
    • Redshift Spectrum for serverless queries on S3

     

     

     

     

  • 相关阅读:
    各种排序算法的时间复杂度和空间复杂度
    fork/join框架
    全文检索之solr学习
    【设计模式最终总结】概述、分类、原则
    ASP.NET MVC5+EF6+EasyUI 后台管理系统(75)-微信公众平台开发-用户管理
    下拉列表自己封装的
    下拉列表
    一个原生的JavaScript拖动方法
    JavaScript的jsonp
    angular2 的依赖注入
  • 原文地址:https://www.cnblogs.com/Answer1215/p/15323895.html
Copyright © 2011-2022 走看看