zoukankan      html  css  js  c++  java
  • Hadoop-Impala学习笔记之管理

    配置参数管理

    待补充。。。 

    资源分配管理(Admission Control)

    Impala有资源池的概念,允许某些查询在特定的资源池执行,不过在白天不跑批/晚上不跑adhoc的DSS系统中,该机制并不常用(oracle、cgroup性质都类似),有兴趣可以参考《Impala Guide 中的Admission Control and Query Queuing》。

    安全管理(跟一般的RDBMS差不多,只不过认证和授权是外部的,比较复杂)

    Impala认证基于Kerberos框架《Enabling Kerberos Authentication for Impala》,Impala授权框架基于Sentry开源项目《Enabling Sentry Authorization for Impala》,从Impala 1.1.0开始加入,审计特性从1.1.1开始支持。

    kerberos安装:https://www.jianshu.com/p/fc2d2dbd510b

    kerberos介绍:https://www.cnblogs.com/ulysses-you/p/8107862.html

    CDH集成Kerberos配置:https://blog.csdn.net/qxf1374268/article/details/79321951

    如何在CDH5.12集群中启用Kerberos认证:https://blog.csdn.net/cy309173854/article/details/79288491

    优化

    启用short-circuit读

    该特性使得Impala可以从文件系统直接读取本地数据,避免了和DataNodes通信的必要性,提升性能,它要求使用libhadoop.so(hadoop原生库)。tarball安装中不包含此库,.rpm, .deb, parcel中包含。

    该特性可以通过修改hdfs-site.xml或Cloudera Manager修改。

    启用块位置跟踪

    该特性可以使得Impala更好地利用底层的磁盘,如果Impala不是由Cloudera Manager管理,则需要启用块位置跟踪特性。该特性同样可以通过hdfs-site.xml修改。

    JDBC访问

    JDBC 2.0及之后的版本可通过21050访问Impala,可通过impalad启动参数--hs2_port修改默认端口 。

    在Impala 2.0+,可通过Cloudera JDBC Connector和Hive 0.13(0.12之前的版本无法访问2.0) JDBC访问。

    连接串:jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;...

    jdbc:hive2://myhost.example.com:21050/;auth=noSasl

    jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM -- Kerberos认证的Impala

    当前版本的驱动在对Kudu表执行DML操作时,如果发生一些错误如唯一性约束违反,不会报错。如果有此要求,可以使用Kudu Java API而不是JDBC。

    impala jdbc没有发布在共有的maven仓库中,需要自己从https://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-43.html下载,并维护到本地maven仓库,https://github.com/onefoursix/Cloudera-Impala-JDBC-Example包含了一个例子,它使用就和普通的JDBC一样的,没什么特别的。

    Impala支持的HDFS文件格式

    其中Snappy在压缩率和解压效率之间取得平衡,是推荐的做法。Gzip可以得到最好的压缩率。如果数据几乎一直驻留内存,则不用考虑压缩,因为节省不了I/O。

    默认情况下,Impala创建的就是文本文件格式的表。

    Parquet是列式存储的二进制文件格式,适合于访问少数列的场景。要创建Parquet格式的表,可以在create table中声明STORED AS PARQUET;子句,如下:

    [impala-host:21000] > create table parquet_table_name (x INT, y STRING) STORED AS PARQUET;

    还可以直接从Parquet推断出列定义:

    CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET
    '/user/etl/destination/datafile1.dat'
    STORED AS PARQUET
    LOCATION '/user/etl/destination';

    Impala使用的端口列表

    ComponentServicePortAccess RequirementComment

    Impala Daemon

    Impala Daemon Frontend Port

    21000

    External

    Used to transmit commands and receive results by impala-shell and some ODBC drivers.

    Impala Daemon

    Impala Daemon Frontend Port

    21050

    External

    Used to transmit commands and receive results by applications, such as Business Intelligence tools, using JDBC, the Beeswax query editor in Hue, and some ODBC drivers.

    Impala Daemon

    Impala Daemon Backend Port

    22000

    Internal

    Internal use only. Impala daemons use this port for Thrift based communication with each other.

    Impala Daemon

    StateStoreSubscriber Service Port

    23000

    Internal

    Internal use only. Impala daemons listen on this port for updates from the statestore daemon.

    Catalog Daemon

    StateStoreSubscriber Service Port

    23020

    Internal

    Internal use only. The catalog daemon listens on this port for updates from the statestore daemon.

    Impala Daemon

    Impala Daemon HTTP Server Port

    25000

    External

    Impala web interface for administrators to monitor and troubleshoot.

    Impala StateStore Daemon

    StateStore HTTP Server Port

    25010

    External

    StateStore web interface for administrators to monitor and troubleshoot.

    Impala Catalog Daemon

    Catalog HTTP Server Port

    25020

    External

    Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and higher.

    Impala StateStore Daemon

    StateStore Service Port

    24000

    Internal

    Internal use only. The statestore daemon listens on this port for registration/unregistration requests.

    Impala Catalog Daemon

    Catalog Service Port

    26000

    Internal

    Internal use only. The catalog service uses this port to communicate with the Impala daemons. New in Impala 1.2 and higher.

    Impala Daemon

    KRPC Port

    27000

    Internal

    Internal use only. Impala daemons use this port for KRPC based communication with each other.

    Impala Daemon

    Llama Callback Port

    28000

    Internal

    Internal use only. Impala daemons use to communicate with Llama. New in Impala 1.3and higher.

    Impala Llama ApplicationMaster

    Llama Thrift Admin Port

    15002

    Internal

    Internal use only. New in Impala 1.3 and higher.

    Impala Llama ApplicationMaster

    Llama Thrift Port

    15000

    Internal

    Internal use only. New in Impala 1.3 and higher.

    Impala Llama ApplicationMaster

    Llama HTTP Port

    15001

    External

    Llama service web interface for administrators to monitor and troubleshoot. New in Impala 1.3 and higher.

  • 相关阅读:
    中台之交付
    mysql之事务
    中台之中台的设计
    0318 guava并发工具
    0312 java接口测试三棱军刺rest-assured
    0309 软件基本原理1
    0308 软件系统的非功能需求
    PELT(Per-Entity Load Tracking)
    CPU亲和度
    硬件相关知识随手笔记
  • 原文地址:https://www.cnblogs.com/zhjh256/p/10664130.html
Copyright © 2011-2022 走看看