zoukankan      html  css  js  c++  java
  • [SAA] 32. Data Engineering

    AWS Batch Overview

    • Run batch jobs as Docker images
    • Dynamic provisioning of the instances (EC2 & Spot Instances) - in VPC
    • Optimal quantity and type based on volume and requirements
    • No need to manage clusters, fully serverless
    • You just pay for the underlying EC2 instance
    • Example: batch process of images, running thousands of concurrent jobs
    • Schedule Batch Jobs using CloudWatch Events
    • Orchestrate Batch Jobs using AWS Step Functions

     

    Lambda vs Batch

    Lambda

    • Time limit: 15 mins
    • Limted runtime
    • Limited temporary disk space
    • Serverless

    Batch

    • No time limit
    • Any runtinme as long as it's package as a Docker image
    • Rely on EBS / instance store for disk space
    • Relies on EC2 (can be managed by AWS)

    Compute Environments

    Managed Compute Environment

    • AWS Batch managed the capacity and instance types within the environment
    • You can choose On-Demand or Spot Instance
    • You can set a maximum price for Spot instance
    • Launched within your own VPC
      • If you launch within your own private subnet, make sure it has access to the ECS service
      • Either using a NAT Gateway / instance or using VPC Endpoint for ECS

    Unmanaged Compute Environment

    • You control and manage instance configuration, provisioning and scaling

     

    Kinesis

    CloudWatch cannot send to Kinesis Data Firehose or Kinesis Data Streams

    Near real-time: Kinesis Data Firehose

    Kinesis agent can directly configured to send data to Kinesis Data Firehose

    Firehose can connect to S3

    Kinesis Data Firehose is near real-time

    Using Lambda to send to ElasticSearch

    Athena

    • Quicksight for visiulization dashboard
    • CloudTrail can stream logs to CloudWatch

    • EMR can choose to use Spot Fleet to control the cost
    • Athena: data must stay in S3
    • Redshift Spectrum for serverless queries on S3

     

     

     

     

  • 相关阅读:
    使用 C# 2008 Express Edition 编写的猜数字游戏
    话说三层
    在asp.net 1.1 中使用Ajax
    vs2005 调试时出现“无法附加。绑定句柄无效”的解决办法
    解决“你可能没有权限使用网络资源”的问题
    html&js 在firefox与IE中呈现存在差异的解决方法总结
    sql 事务 全攻略
    mssql的TSQL教程(从建登陆到建库、表和约束)(1)
    数据库练习题
    用批处理附加数据库
  • 原文地址:https://www.cnblogs.com/Answer1215/p/15323895.html
Copyright © 2011-2022 走看看