<Impala><Overview><UDF>

zoukankan html css js c++ java

<Impala><Overview><UDF>
Overview
- Apache Impala (incubating) is the open source, native analytic database for apache Hadoop.
Features
- Do BI-style Queries on Hadoop:
  
  low latency and high concurrency for BI/analytic queries on Hadoop(not delivered by batch frameworks such as Apache Hive).
  
  scales linearly, even in multitenant environments.
- Unify ur Infrasturecture: Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment—no redundant infrastructure or data conversion/duplication.
- Implement Quickly: supports SQL
- Count on Enterprise-class Security
- Retain Freedom from Lock-in: open-source
- Expand the Hadoop User-verse
Architecuture
- Circumvents MapReduce to avoid latency, directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs.
- Some advantages:
  
  Thx to local processing on data nodes, network bottlenecks are avoided.
  
  A signle, open, and unified metadata store can be utilized.
  
  Costly data format conversion is unnecessary and thus no overhead is incurred.
  
  All data is immediately query-able, with no delays for ETL.
  
  All hardware is utilized for Impala queries as well as for MR.
  
  Only a single machine pool is needed to scale.
Documentation

... skip

Impala User-Defined Functions(UDFs)
- UDF let you code ur own application logic for processing column values during an Impala query.
UDFS Concepts
- U can code either scalar functions for producing results one row at a time.
- Or more complex aggregate functions for doing analysis across.
UDFs and UDAFs
- The most general kind of udf takes single input value and produces a single output value. When used in a query, it is called once for each row in the result set. eg:
  
  select customer_name, is_frequent_customer(customer_id) from customers; select obfuscate(sensitive_column) from sensitive_data;
- A user-defined aggergate function(UDAF) accepts a group of values and returns a single value. U can use UDAFs to summarize and condense sets of rows, in the same style as the built-in COUNT, MAX(), SUM(), and AVG() functions. When called in a query that uses the GROUP BY clause, the function is called once for each combination of GROUP BY values. eg:
  
  -- Evaluates multiple rows but returns a single value select closest_restaurant(latitude, longitude) from places; -- Evaluates batches of rows and returns a separate value for each batch. select most_profitable_locartion(store_id, sales, expenses, tax_rate, depreciation) from franchise_data group by year;
- Currently, Impala does not support other categories of udf, such as user-defined table functions(UDTFs) or window functions.
Native Impala UDFs
- Impala supports UDFs written in C++, in addition to supporting existing Hive UDFs written in Java.
- Where practical, use C++ UDFs because the compiled native code can yield higher performance, with UDF execution time often 10x faster for a C++ UDF than the equivalent Java UDF.
Using Hive UDFs with Impala
- Impala can run Java-based user-defined functions (UDFs), originally written for Hive, with no changes, subject to the following conditions:
  
  The parameter and return value must all use scalar data types supported by Impala. That's to say, complex or nested types are not supported.
  
  Currently, Hive UDFs that accept or return the TIMESTAMP type are not supported.
  
  Hive UDAFs and UDTFs are not supported.
  
  Typically, a Java UDF will execute several times slower in Impala than the equivalent native UDF written in C++.
- What to do next?
  
  write ur udf
  
  upload the jar to a hdfs path(where impala can read)
  
  for each Java-based UDF that u want to call through Impala, issue a CREATE FUNCTION statement, with a LOCATION clause containing the full HDFS path or the JAR file, and a SYMBOL clause with the fully qualified name of the class, using dots as separators and without the .class extension. eg:
  
  create function my_neg(bigint) returns bigint location '/user/hive/udfs/hive.jar' symbol = 'org.apache.hadoop.hive.ql.udf.UDFOPNegative';
  
  call the function from ur queries, passing arguments of the correct type to match the function signature.
FYI
查看全文

相关阅读:
关于sqlite数据库在使用过程中应该注意以下几点
 关于THREAD线程中CurrentCulture与CurrentUICulture的学习
 转：ASP.NET MVC3升级到ASP.NET MVC4
win8 iis安装及网站发布
 转： CKEditor/CKFinder升级心得
 [更新]Windows Phone 实现类似“微博”下拉刷新效果
 EntityFramework中使用Include可能带来的问题
 [更新]Luke.Net for Pangu 盘古分词版更新
 文件大小友好显示类
 找出最慢的查询语句Find your slowest queries

原文地址：https://www.cnblogs.com/wttttt/p/7236469.html

<Impala><Overview><UDF>

Overview

Features

Architecuture

Documentation

Impala User-Defined Functions(UDFs)

UDFS Concepts

UDFs and UDAFs

Native Impala UDFs

Using Hive UDFs with Impala

FYI