zoukankan      html  css  js  c++  java
  • Pentaho的Mondrian对Hive的支持

    需求描述

    考虑直接在Hive或者Impala等Big Data方案,能够支持MDX查询,现调研一下Mondrian对hive的支持情况。

    环境准备

    hive环境,采用hive-0.10-cdh4.2.1 客户端程序使用的类库:mondrian-3.6.0、olap4j-1.2.0-SNAPSHOT

    数据准备

    来源于网上一个数据源,准备四张表 Customer - 客户信息维表 Product - 产品维表 ProductType - 产品类表维表 Sale - 销售记录表 为了方便测试数据与MDX正确性,将数据导入到MySQL中一份,用来与Hive查询结果进行对比。

    在MySQL创建对应表及数据

    具体SQL语句:

    create database hive_test;
    
    use hive_test;
    
    /**用户信息表*/
    create table Customer (
           cusId int not null,
           gender char(1) null,
           constraint PK_CUSTOMER primary key(cusId)
    );
    
    /**产品表*/
    create table Product (
           proId int not null,
           proTypeId int null,
           proName varchar(32) null,
           constraint PK_PRODUCT primary key(proId)
    );
    
    /**产品类别表*/
    create table ProductType (
           proTypeId int not null,
           proTypeName varchar(32) null,
           constraint PK_PRODUCTTYPE primary key (proTypeId)
    );
    
    /**销售记录表/
    create table Sale (
           saleId int not null,
           proId int null,
           cusId int null,
           unitPrice float null,
           number int null,
           constraint PK_SALE primary key(saleId)
    );
    
    insert into Customer(cusId,gender) values(1,'F');
    insert into Customer(cusId,gender) values(2,'M');
    insert into Customer(cusId,gender) values(3,'M');
    insert into Customer(cusId,gender) values(4,'F');
    
    
    insert into ProductType(proTypeId,proTypeName) values(1,'electrical');
    insert into ProductType(proTypeId,proTypeName) values(2,'digital');
    insert into ProductType(proTypeId,proTypeName) values(3,'furniture');
    
    insert into Product(proId,proTypeId,proName) values(1,1,'washing machine');
    insert into Product(proId,proTypeId,proName) values(2,1,'television');
    insert into Product(proId,proTypeId,proName) values(3,2,'mp3');
    insert into Product(proId,proTypeId,proName) values(4,2,'mp4');
    insert into Product(proId,proTypeId,proName) values(5,2,'camera');
    insert into Product(proId,proTypeId,proName) values(6,3,'chair');
    insert into Product(proId,proTypeId,proName) values(7,3,'desk');
    insert into sale(saleId,proId,cusId,unitPrice,number) values(1,1,1,340.34,2);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(2,1,2,140.34,1);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(3,2,3,240.34,3);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(4,3,4,540.34,4);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(5,4,1,80.34,5);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(6,5,2,90.34,26);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(7,6,3,140.34,7);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(8,7,4,640.34,28);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(9,6,1,140.34,29);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(10,7,2,740.34,29);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(11,5,3,30.34,28);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(12,4,4,1240.34,72);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(13,3,1,314.34,27);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(14,3,2,45.34,27);
    View Code

    hive中测试数据准备

    在虚拟机准备好hive测试环境,采用hive-0.10-cdh4.2.1版本 具体语句:

    create database mondrian;
    use mondrian;
    create table Sale (saleId INT, proId INT, cusId INT, unitPrice FLOAT, number INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    create table Product (proId INT, proTypeId INT, proName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    create table ProductType (proTypeId INT, proTypeName STRING)   ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    create table Customer (cusId INT, gender STRING)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    
    # Customer文件
    1,F
    2,M
    3,M
    4,F
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Customer" OVERWRITE into table Customer;
    
    # ProductType文件
    1,electrical
    2,digital
    3,furniture
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/ProductType" into table ProductType;
    
    # Product数据文件
    1,1,washing machine
    2,1,television
    3,2,mp3
    4,2,mp4
    5,2,camera
    6,3,chair
    7,3,desk
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Product" into table Product;
    
    # Sale数据文件
    1,1,1,340.34,2
    2,1,2,140.34,1
    3,2,3,240.34,3
    4,3,4,540.34,4
    5,4,1,80.34,5
    6,5,2,90.34,26
    7,6,3,140.34,7
    8,7,4,640.34,28
    9,6,1,140.34,29
    10,7,2,740.34,29
    11,5,3,30.34,28
    12,4,4,1240.34,72
    13,3,1,314.34,27
    14,3,2,45.34,27
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Sale" into table Sale;
    View Code

    元数据定义

    Cube、Measure等元数据定义见:

    <Schema name="hello">
    <Cube name="Sales">
    <!--  事实表(fact table)  -->
    <Table name="Sale"/>
    <!--  客户维  -->
    <Dimension name="cusGender" foreignKey="cusId">
    <Hierarchy hasAll="true" allMemberName="allGender" primaryKey="cusId">
    <Table name="Customer"/>
    <Level name="gender" column="gender"/>
    </Hierarchy>
    </Dimension>
    <!--  产品类别维  -->
    <Dimension name="proType" foreignKey="proId">
    <Hierarchy hasAll="true" allMemberName="allPro" primaryKey="proId" primaryKeyTable="Product">
    <join leftKey="proTypeId" rightKey="proTypeId">
    <Table name="Product"/>
    <Table name="ProductType"/>
    </join>
    <Level name="proTypeId" column="proTypeId" nameColumn="proTypeName" uniqueMembers="true" table="ProductType"/>
    <Level name="proId" column="proId" nameColumn="proName" uniqueMembers="true" table="Product"/>
    </Hierarchy>
    </Dimension>
    <Measure name="numb" column="number" aggregator="sum" datatype="Numeric"/>
    <Measure name="totalSale" aggregator="sum" formatString="$ #,##0.00">
    <!--  unitPrice*number所得值的列  -->
    <MeasureExpression>
    <SQL dialect="generic">unitPrice*number</SQL>
    </MeasureExpression>
    </Measure>
    <CalculatedMember name="averPri" dimension="Measures">
    <Formula>[Measures].[totalSale] / [Measures].[numb]</Formula>
    <CalculatedMemberProperty name="FORMAT_STRING" value="$ #,##0.00"/>
    </CalculatedMember>
    </Cube>
    </Schema>

    测试MDX

    1. 查询所有类别产品销售总件数、平均价格和总销售额

    "select " + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," + "{([proType].[allPro],[cusGender].[allGender])} " + "on rows " + "from [Sales]" 

    对Hive支持情况

    建立Connection连接方式

    建立Connection连接方式有两种:

    使用mondrian自带的DriverManager获取Connection实例

    mondrian中自带的API

    # 这里的Connection、DriverManager、Query、Result等都是mondrian提供的API接口
            Connection connection = DriverManager.getConnection(
                    "Provider=mondrian;" +
                    "Jdbc=jdbc:hive2://node02:10000/mondrian;" +
                    "JdbcUser=;JdbcPassword=;" +
                    "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
                    "JdbcDrivers=org.apache.hive.jdbc.HiveDriver", null);
    
            Query query = connection.parseQuery(
                    "select 
    " +
                            "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns,
    " +
                            "{([proType].[allPro],[cusGender].[allGender])} 
    " +
                            "on rows
    " +
                            "from [Sales]
    ");
    
            @SuppressWarnings("deprecation")
            Result result = connection.execute(query);
            PrintWriter pw = new PrintWriter(System.out);
            result.print(pw);
            pw.flush();

      对应的连接MySQL,只需要将getConnection中的connectString换成如下即可:

    Connection connection =  DriverManager.getConnection(
                    "Provider=mondrian;" +
                            "Jdbc=jdbc:mysql://localhost:3306/hive_test; JdbcUser=root;" +
                            "JdbcPassword=123;" +
                            "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
                            "JdbcDrivers=com.mysql.jdbc.Driver", null);

      测试的时候连接MySQL时,没什么问题,在使用相同的API连Hive的时候,有点问题。down了一下源码发现它的过程是这样的:先去连接池中取一个Connection实例,没有的话通过Factory创建一个Connection放入池里。而在Mondrian创建Factory的时候指定了两个属性:autoCommit和readOnly,RDBMS的Driver都没什么问题,Hive的JDBC提供的HiveConnection中对这两个属性的set方法实现得很诡异,都是直接抛异常了:

      public void setReadOnly(boolean readOnly) throws SQLException {
        // TODO Auto-generated method stub
        throw new SQLException("Method not supported");
      }
      public void setAutoCommit(boolean autoCommit) throws SQLException {
        if (autoCommit) {
          throw new SQLException("enabling autocommit is not supported");
        }
      }

    将这两行抛出异常的地方注释掉,rebuild一下jar包,MDX就可以顺利执行完了。

    使用JDK原生的DriverManager获取Connection实例

      可以使用JDK原生的DriverManager获取Connection然后再使用Olap4j的封装成OLapConnection然后再去执行MDX 具体连接示例如下:

     Class.forName("mondrian.olap4j.MondrianOlap4jDriver");
     Connection nativeConn = DriverManager.getConnection("jdbc:mondrian:Jdbc=jdbc:hive2://node02:10000/mondrian; JdbcUser=;" +
             "JdbcPassword=;" +
             "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
             "JdbcDrivers=org.apache.hive.jdbc.HiveDriver");
    
     OlapConnection olapConn = nativeConn.unwrap(OlapConnection.class);
    
     if (olapConn == null) {
         throw new IllegalStateException("Connection is null");
     }
     OlapStatement statement = olapConn.createStatement();
     CellSet cellSet = statement.executeOlapQuery("select " +
             "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," +
             "{([proType].[allPro],[cusGender].[allGender])} " +
             "on rows " +
             "from [Sales]") ;
     //formatter.
     RectangularCellSetFormatter formatter =
             new RectangularCellSetFormatter(false);
    
     // Print out.
     PrintWriter writer = new PrintWriter(System.out);
     formatter.format(cellSet, writer);
     writer.flush();
     statement.close();
     olapConn.close();
     nativeConn.close();

    指定Database

      hive也有类似RDBMS一样有database的概念,在Hive提供的普通Java API中虽然在连接字符串中指定了database,但是它默认的并非你指定的database而是上一次当前客户端或线程使用的database(注:并非default),所以一般使用Hive 客户端必须先执行一下use database。而在OlapConnection和Mondrian提供的Connection都不支持"use database"操作。暂时的解决办法,每次去进行MDX查询的时候先通过普通的Java Api执行一下use database,指定到当前需要查询的数据库中。

  • 相关阅读:
    Atitit 华为基本法 attilax读后感
    Atitit 华为管理者内训书系 以奋斗者为本 华为公司人力资源管理纲要 attilax读后感
    Atitit 项目版本管理gitflow 与 Forking的对比与使用
    Atitit 管理的模式扁平化管理 金字塔 直线型管理 垂直管理 水平管理 矩阵式管理 网状式样管理 多头管理 双头管理
    Atitit 乌合之众读后感attilax总结 与读后感结构规范总结
    深入理解 JavaScript 异步系列(4)—— Generator
    深入理解 JavaScript 异步系列(3)—— ES6 中的 Promise
    深入理解 JavaScript 异步系列(2)—— jquery的解决方案
    深入理解 JavaScript 异步系列(1)——基础
    使用 github + jekyll 搭建个人博客
  • 原文地址:https://www.cnblogs.com/nexiyi/p/mondrian_hive_test.html
Copyright © 2011-2022 走看看