hadoop what is difference between Pig and Hive? Stack Overflow

zoukankan html css js c++ java

hadoop what is difference between Pig and Hive? Stack Overflow

hadoop - what is difference between Pig and Hive? - Stack Overflow
Apache Pig and Hive are two projects that layer on top of Hadoop, and provide a higher-level language for using Hadoop's MapReduce library. Apache Pig provides a scripting language for describing operations like reading, filtering, transforming, joining, and writing data -- exactly the operations that MapReduce was originally designed for. Rather than expressing these operations in thousands of lines of Java code that uses MapReduce directly, Pig lets users express them in a language not unlike a bash or perl script. Pig is excellent for prototyping and rapidly developing MapReduce-based jobs, as opposed to coding MapReduce jobs in Java itself.
If Pig is "scripting for Hadoop", then Hive is "SQL queries for Hadoop". Apache Hive offers an even more specific and higher-level language, for querying data by running Hadoop jobs, rather than directly scripting step-by-step the operation of several MapReduce jobs on Hadoop. The language is, by design, extremely SQL-like. Hive is still intended as a tool for long-running batch-oriented queries over massive data; it's not "real-time" in any sense. Hive is an excellent tool for analysts and business development types who are accustomed to SQL-like queries and Business Intelligence systems; it will let them easily leverage your shiny new Hadoop cluster to perform ad-hoc queries or generate report data across data stored in storage systems mentioned above.

查看全文

相关阅读:
Google布隆过滤器与Redis布隆过滤器详解
 这个面试问题很难么 | 如何处理大数据中的数据倾斜
 阿里巴巴微服务架构演进
 快手HBase在千亿级用户特征数据分析中的应用与实践
 基于Kafka+Flink+Redis的电商大屏实时计算案例
 阿里云E-MapReduce产品探秘，快速构建可扩展的高性能大数据平台
 阿里云Spark Shuffle的优化
 Flink CheckPoint奇技淫巧 | 原理和在生产中的应用
 你需要的不是实时数仓 | 你需要的是一款合适且强大的OLAP数据库(上)
你需要的不是实时数仓 | 你需要的是一款强大的OLAP数据库(下)

原文地址：https://www.cnblogs.com/lexus/p/2594445.html