zoukankan      html  css  js  c++  java
  • 【原创】大数据基础之ETL vs ELT or DataWarehouse vs DataLake

    ETL

    ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts the data from different RDBMS source systems then transforms the data like applying calculations, concatenations, etc. and then load the data into the Data Warehouse system.

    In ETL data is flows from the source to the target. In ETL process transformation engine takes care of any data changes.

    ELT

    ELT is a different method of looking at the tool approach to data movement. Instead of transforming the data before it's written, ELT lets the target system to do the transformation. The data first copied to the target and then transformed in place.

    ELT usually used with no-Sql databases like Hadoop cluster, data appliance or cloud installation.

    Data Warehouse vs Data Lake

    ETL对应的是Data Warehouse,而ELT对应Data Lake,那什么是Data Lake?

    A data lake is a system or repository of data stored in its natural format, usually object blobs or files. A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).

    Pentaho CTO James Dixon has generally been credited with coining the term “data lake”. He describes a data mart (a subset of a data warehouse) as akin to a bottle of water…”cleansed, packaged and structured for easy consumption” while a data lake is more like a body of water in its natural state. Data flows from the streams (the source systems) to the lake. Users have access to the lake to examine, take samples or dive in.

    参考:
    https://www.guru99.com/etl-vs-elt.html
    https://aws.amazon.com/cn/big-data/datalakes-and-analytics/what-is-a-data-lake/
    https://www.blue-granite.com/blog/bid/402596/top-five-differences-between-data-lakes-and-data-warehouses
    https://www.forbes.com/sites/bernardmarr/2018/08/27/what-is-a-data-lake-a-super-simple-explanation-for-anyone/#672125e776e0
    https://blog.panoply.io/etl-vs-elt-the-difference-is-in-the-how
    https://www.xplenty.com/blog/etl-vs-elt/

  • 相关阅读:
    【leetcode 968. 1028. 从先序遍历还原二叉树】解题报告[待完善...]
    【leetcode 3. 无重复字符的最长子串】解题报告
    【leetcode 76. 最小覆盖子串】解题报告
    【leetcode 239. 滑动窗口最大值】解题报告
    【leetcode 114. 二叉树展开为链表】解题报告
    【leetcode 105. 从前序与中序遍历序列构造二叉树】解题报告
    【leetcode 106. 从中序与后序遍历序列构造二叉树】解题报告
    【leetcode 968. 监控二叉树】解题报告
    【leetcode 145. 二叉树的后序遍历】解题报告
    linux springboot快捷启动脚本
  • 原文地址:https://www.cnblogs.com/barneywill/p/10718837.html
Copyright © 2011-2022 走看看