zoukankan      html  css  js  c++  java
  • HiveQL(HiveSQL)跟普通SQL最大区别一直使用PIG,而今也需要兼顾HIVE

    HiveQL(Hive SQL)跟普通SQL最大区别

      一直使用PIG,而今也需要兼顾HIVE。网上搜了点资料,感觉挺有用,这里翻译过来。翻译估计不太准确,待自己熟悉HIVE后再慢慢总结。

      * No true date/time data types, no interval types, and many missing UDFs for manipulating dates (e.g. ADD_MONTH)

      * Strict type matching without support for automatic coercion or typed literals (e.g. CASE <bigint expr> WHEN 1 THEN ... END)

      * All queries must reference a table (no 'dual' or table-less queries)

      * No session-scoped temp tables

      * No 'IN' predicate

      * No 'FIND' string search function for producing the offset to a match

      * No find/replace string functions for plain strings (i.e. not regex)

      * XPATH UDFs cannot return a string representing an entire subtree in the DOM, which prevents composition.

      * Few mechanisms for collapsing arrays to scalar types (e.g. 'join' complement of string 'split'; aggregations other than 'size' for numeric arrays; etc.)

      粗略的翻译:

      1.HiveQL没有真正的日期/时间类型,自增类型,以及操作日期和时间的一些函数如(ADD_MONTH)

      2.HiveQL有着非常严格的类型匹配,不支持类型自动转换(如不支持: CASE big_int_number WHEN 1 THEN ... END),我的理解是big int类型不可以自动帮你转换为int

      3.HiveQL只能对表进行查询,普通的SQL可以对结果集查询,如一般的嵌套查询)

      4.HiveQL没有临时表的概念

      5.HiveQL没有IN操作

      6.HiveQL对于字符串没有FIND和REPLACE函数

      7.HiveQL中的XPATH UDF不能够返回一个代表子DOM树的字符串实体,为了阻止composition.

      8.Few mechanisms for collapsing arrays to scalar types (e.g. 'join' complement of string 'split'; aggregations other than 'size' for numeric arrays; etc.)

      ===========================================================================================================================================================

      1.No windowing functions. IE, SUM(sales) OVER (PARTITION BY date). Its difficult to do a lot things common to warehousing, like a running sum, without having to write custom mappers/reducers or a UDF.

      2.No regular UNION, INTERSECT, or MINUS operators.

      3.Null values are treated differently than empty string, and are exported differently. IE, empty strings are exported as ' ' and nulls are exported as nulls. I know this isn't unique to Hive but still annoying when exporting data from Hive into another system.

      4.No hierarchical/self referencing querying. I know most distributed computing solutions can't do this, but it can be very handy.

      5.No Update or Delete statements.

      6.Haven't been able to find any kind of cost-based explain plans. Running explain plans generally just shows the path of accessing data. Useful to some degree but it would be great if it was more advanced in that it could help the user understand which steps are causing the biggest slowdowns.

      =======================================================================================================================================================================

      1. For row format delimiter for line termination, it only supports ' '.

      2. Hive does not support the ability to run a query that select from tables in more than one database.

      3. Hive does not support sub-queries such as those connected by IN/EXISTS in the WHERE clause.

      4. Hive does not support the truncation of data from a table.

      ===========================================================================================================================================================

  • 相关阅读:
    iOS开发allocWithZone介绍
    如何快速的查看一段代码的执行时间
    iOS关于setContentOffset的一些细节问题
    iOS开发libz.dylib介绍
    C#窗体无法接受Keydown事件
    visual studio 2010 C#编程时 没有.NET framework 2.0目标框架的解决办法
    StringBuilder类与String类的区别
    Refresh和Invalidate的比较
    正则表达式
    Queue 先进先出队列的操作
  • 原文地址:https://www.cnblogs.com/catWang/p/4367347.html
Copyright © 2011-2022 走看看