zoukankan      html  css  js  c++  java
  • Pentaho的Mondrian对Hive的支持

    需求描述

    考虑直接在Hive或者Impala等Big Data方案,能够支持MDX查询,现调研一下Mondrian对hive的支持情况。

    环境准备

    hive环境,采用hive-0.10-cdh4.2.1 客户端程序使用的类库:mondrian-3.6.0、olap4j-1.2.0-SNAPSHOT

    数据准备

    来源于网上一个数据源,准备四张表 Customer - 客户信息维表 Product - 产品维表 ProductType - 产品类表维表 Sale - 销售记录表 为了方便测试数据与MDX正确性,将数据导入到MySQL中一份,用来与Hive查询结果进行对比。

    在MySQL创建对应表及数据

    具体SQL语句:

    create database hive_test;
    
    use hive_test;
    
    /**用户信息表*/
    create table Customer (
           cusId int not null,
           gender char(1) null,
           constraint PK_CUSTOMER primary key(cusId)
    );
    
    /**产品表*/
    create table Product (
           proId int not null,
           proTypeId int null,
           proName varchar(32) null,
           constraint PK_PRODUCT primary key(proId)
    );
    
    /**产品类别表*/
    create table ProductType (
           proTypeId int not null,
           proTypeName varchar(32) null,
           constraint PK_PRODUCTTYPE primary key (proTypeId)
    );
    
    /**销售记录表/
    create table Sale (
           saleId int not null,
           proId int null,
           cusId int null,
           unitPrice float null,
           number int null,
           constraint PK_SALE primary key(saleId)
    );
    
    insert into Customer(cusId,gender) values(1,'F');
    insert into Customer(cusId,gender) values(2,'M');
    insert into Customer(cusId,gender) values(3,'M');
    insert into Customer(cusId,gender) values(4,'F');
    
    
    insert into ProductType(proTypeId,proTypeName) values(1,'electrical');
    insert into ProductType(proTypeId,proTypeName) values(2,'digital');
    insert into ProductType(proTypeId,proTypeName) values(3,'furniture');
    
    insert into Product(proId,proTypeId,proName) values(1,1,'washing machine');
    insert into Product(proId,proTypeId,proName) values(2,1,'television');
    insert into Product(proId,proTypeId,proName) values(3,2,'mp3');
    insert into Product(proId,proTypeId,proName) values(4,2,'mp4');
    insert into Product(proId,proTypeId,proName) values(5,2,'camera');
    insert into Product(proId,proTypeId,proName) values(6,3,'chair');
    insert into Product(proId,proTypeId,proName) values(7,3,'desk');
    insert into sale(saleId,proId,cusId,unitPrice,number) values(1,1,1,340.34,2);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(2,1,2,140.34,1);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(3,2,3,240.34,3);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(4,3,4,540.34,4);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(5,4,1,80.34,5);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(6,5,2,90.34,26);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(7,6,3,140.34,7);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(8,7,4,640.34,28);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(9,6,1,140.34,29);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(10,7,2,740.34,29);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(11,5,3,30.34,28);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(12,4,4,1240.34,72);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(13,3,1,314.34,27);
    insert into sale(saleId,proId,cusId,unitPrice,number) values(14,3,2,45.34,27);
    View Code

    hive中测试数据准备

    在虚拟机准备好hive测试环境,采用hive-0.10-cdh4.2.1版本 具体语句:

    create database mondrian;
    use mondrian;
    create table Sale (saleId INT, proId INT, cusId INT, unitPrice FLOAT, number INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    create table Product (proId INT, proTypeId INT, proName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    create table ProductType (proTypeId INT, proTypeName STRING)   ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    create table Customer (cusId INT, gender STRING)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
    
    # Customer文件
    1,F
    2,M
    3,M
    4,F
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Customer" OVERWRITE into table Customer;
    
    # ProductType文件
    1,electrical
    2,digital
    3,furniture
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/ProductType" into table ProductType;
    
    # Product数据文件
    1,1,washing machine
    2,1,television
    3,2,mp3
    4,2,mp4
    5,2,camera
    6,3,chair
    7,3,desk
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Product" into table Product;
    
    # Sale数据文件
    1,1,1,340.34,2
    2,1,2,140.34,1
    3,2,3,240.34,3
    4,3,4,540.34,4
    5,4,1,80.34,5
    6,5,2,90.34,26
    7,6,3,140.34,7
    8,7,4,640.34,28
    9,6,1,140.34,29
    10,7,2,740.34,29
    11,5,3,30.34,28
    12,4,4,1240.34,72
    13,3,1,314.34,27
    14,3,2,45.34,27
    load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Sale" into table Sale;
    View Code

    元数据定义

    Cube、Measure等元数据定义见:

    <Schema name="hello">
    <Cube name="Sales">
    <!--  事实表(fact table)  -->
    <Table name="Sale"/>
    <!--  客户维  -->
    <Dimension name="cusGender" foreignKey="cusId">
    <Hierarchy hasAll="true" allMemberName="allGender" primaryKey="cusId">
    <Table name="Customer"/>
    <Level name="gender" column="gender"/>
    </Hierarchy>
    </Dimension>
    <!--  产品类别维  -->
    <Dimension name="proType" foreignKey="proId">
    <Hierarchy hasAll="true" allMemberName="allPro" primaryKey="proId" primaryKeyTable="Product">
    <join leftKey="proTypeId" rightKey="proTypeId">
    <Table name="Product"/>
    <Table name="ProductType"/>
    </join>
    <Level name="proTypeId" column="proTypeId" nameColumn="proTypeName" uniqueMembers="true" table="ProductType"/>
    <Level name="proId" column="proId" nameColumn="proName" uniqueMembers="true" table="Product"/>
    </Hierarchy>
    </Dimension>
    <Measure name="numb" column="number" aggregator="sum" datatype="Numeric"/>
    <Measure name="totalSale" aggregator="sum" formatString="$ #,##0.00">
    <!--  unitPrice*number所得值的列  -->
    <MeasureExpression>
    <SQL dialect="generic">unitPrice*number</SQL>
    </MeasureExpression>
    </Measure>
    <CalculatedMember name="averPri" dimension="Measures">
    <Formula>[Measures].[totalSale] / [Measures].[numb]</Formula>
    <CalculatedMemberProperty name="FORMAT_STRING" value="$ #,##0.00"/>
    </CalculatedMember>
    </Cube>
    </Schema>

    测试MDX

    1. 查询所有类别产品销售总件数、平均价格和总销售额

    "select " + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," + "{([proType].[allPro],[cusGender].[allGender])} " + "on rows " + "from [Sales]" 

    对Hive支持情况

    建立Connection连接方式

    建立Connection连接方式有两种:

    使用mondrian自带的DriverManager获取Connection实例

    mondrian中自带的API

    # 这里的Connection、DriverManager、Query、Result等都是mondrian提供的API接口
            Connection connection = DriverManager.getConnection(
                    "Provider=mondrian;" +
                    "Jdbc=jdbc:hive2://node02:10000/mondrian;" +
                    "JdbcUser=;JdbcPassword=;" +
                    "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
                    "JdbcDrivers=org.apache.hive.jdbc.HiveDriver", null);
    
            Query query = connection.parseQuery(
                    "select 
    " +
                            "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns,
    " +
                            "{([proType].[allPro],[cusGender].[allGender])} 
    " +
                            "on rows
    " +
                            "from [Sales]
    ");
    
            @SuppressWarnings("deprecation")
            Result result = connection.execute(query);
            PrintWriter pw = new PrintWriter(System.out);
            result.print(pw);
            pw.flush();

      对应的连接MySQL,只需要将getConnection中的connectString换成如下即可:

    Connection connection =  DriverManager.getConnection(
                    "Provider=mondrian;" +
                            "Jdbc=jdbc:mysql://localhost:3306/hive_test; JdbcUser=root;" +
                            "JdbcPassword=123;" +
                            "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
                            "JdbcDrivers=com.mysql.jdbc.Driver", null);

      测试的时候连接MySQL时,没什么问题,在使用相同的API连Hive的时候,有点问题。down了一下源码发现它的过程是这样的:先去连接池中取一个Connection实例,没有的话通过Factory创建一个Connection放入池里。而在Mondrian创建Factory的时候指定了两个属性:autoCommit和readOnly,RDBMS的Driver都没什么问题,Hive的JDBC提供的HiveConnection中对这两个属性的set方法实现得很诡异,都是直接抛异常了:

      public void setReadOnly(boolean readOnly) throws SQLException {
        // TODO Auto-generated method stub
        throw new SQLException("Method not supported");
      }
      public void setAutoCommit(boolean autoCommit) throws SQLException {
        if (autoCommit) {
          throw new SQLException("enabling autocommit is not supported");
        }
      }

    将这两行抛出异常的地方注释掉,rebuild一下jar包,MDX就可以顺利执行完了。

    使用JDK原生的DriverManager获取Connection实例

      可以使用JDK原生的DriverManager获取Connection然后再使用Olap4j的封装成OLapConnection然后再去执行MDX 具体连接示例如下:

     Class.forName("mondrian.olap4j.MondrianOlap4jDriver");
     Connection nativeConn = DriverManager.getConnection("jdbc:mondrian:Jdbc=jdbc:hive2://node02:10000/mondrian; JdbcUser=;" +
             "JdbcPassword=;" +
             "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" +
             "JdbcDrivers=org.apache.hive.jdbc.HiveDriver");
    
     OlapConnection olapConn = nativeConn.unwrap(OlapConnection.class);
    
     if (olapConn == null) {
         throw new IllegalStateException("Connection is null");
     }
     OlapStatement statement = olapConn.createStatement();
     CellSet cellSet = statement.executeOlapQuery("select " +
             "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," +
             "{([proType].[allPro],[cusGender].[allGender])} " +
             "on rows " +
             "from [Sales]") ;
     //formatter.
     RectangularCellSetFormatter formatter =
             new RectangularCellSetFormatter(false);
    
     // Print out.
     PrintWriter writer = new PrintWriter(System.out);
     formatter.format(cellSet, writer);
     writer.flush();
     statement.close();
     olapConn.close();
     nativeConn.close();

    指定Database

      hive也有类似RDBMS一样有database的概念,在Hive提供的普通Java API中虽然在连接字符串中指定了database,但是它默认的并非你指定的database而是上一次当前客户端或线程使用的database(注:并非default),所以一般使用Hive 客户端必须先执行一下use database。而在OlapConnection和Mondrian提供的Connection都不支持"use database"操作。暂时的解决办法,每次去进行MDX查询的时候先通过普通的Java Api执行一下use database,指定到当前需要查询的数据库中。

  • 相关阅读:
    (暂时弃坑)(半成品)ACM数论之旅18---反演定理 第二回 Mobius反演(莫比乌斯反演)((づ ̄3 ̄)づ天才第一步,雀。。。。)
    ACM数论之旅17---反演定理 第一回 二项式反演(神说要有光 于是就有了光(´・ω・`))
    ACM数论之旅16---母函数(又名生成函数)(痛并快乐着(╭ ̄3 ̄)╭)
    (暂时弃坑)ACM数论之旅15---置换群与Polya定理(我把标题看成poi了,poipoipoi(*≧▽≦)ツ)
    ACM数论之旅14---抽屉原理,鸽巢原理,球盒原理(叫法不一又有什么关系呢╮(╯▽╰)╭)
    ACM数论之旅13---容斥原理(一切都是命运石之门的选择(=゚ω゚)ノ)
    ACM数论之旅12---康托展开((*゚▽゚*)装甲展开,主推进器启动,倒计时3,2,1......)
    还有哪些东西要写눈_눈
    ACM数论之旅11---浅谈指数与对数(长篇)(今天休息,不学太难的数论> 3<)
    ACM数论之旅10---大组合数-卢卡斯定理(在下卢卡斯,你是我的Master吗?(。-`ω´-) )
  • 原文地址:https://www.cnblogs.com/nexiyi/p/mondrian_hive_test.html
Copyright © 2011-2022 走看看