zoukankan      html  css  js  c++  java
  • DW(六):polybase访问Azure Blob Storage

    目录:

    • 连接hadoop配置语法
    • 配置hadoop连接
    • Pushdown配置
    • Create external tables for Azure blob storage

    连接hadoop配置语法:  


    global configuration settings for PolyBase Hadoop and Azure blob storage connectivity, Syntax:

    • sp_configure:  List all of the configuration options
    • 配置与hadoop连接语法:
    • sp_configure [ @configname = ] 'hadoop connectivity',  
                   [ @configvalue = ] { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 }  
      [;]  
        
      RECONFIGURE  
      [;]

       

      1. Option 0: Disable Hadoop connectivity
      2. Option 1: Hortonworks HDP 1.3 on Windows Server
      3. Option 1: Azure blob storage (WASB[S])
      4. Option 2: Hortonworks HDP 1.3 on Linux
      5. Option 3: Cloudera CDH 4.3 on Linux
      6. Option 4: Hortonworks HDP 2.0 on Windows Server
      7. Option 4: Azure blob storage (WASB[S])
      8. Option 5: Hortonworks HDP 2.0 on Linux
      9. Option 6: Cloudera 5.1, 5.2, 5.3, 5.4, and 5.5 on Linux
      10. Option 7: Hortonworks 2.1, 2.2, and 2.3 on Linux
      11. Option 7: Hortonworks 2.1, 2.2, and 2.3 on Windows Server
      12. Option 7: Azure blob storage (WASB[S])      
    • 示例代码:
      1. 显示所有可用的配置列表: EXEC sp_configure;
      2. 显示某个配置项的配置信息:  EXEC sp_configure @configname='hadoop connectivity';
      3. 设置配置项示例语法如下:       
      4. sp_configure @configname = 'hadoop connectivity', @configvalue = 7;  
        GO  
          
        RECONFIGURE  
        GO   
        

         

     配置hadoop连接


    • 在本节示例中,连接HDInsight3.2集,对应关系如下:
    • 配置polybase连接配置项信息信息如下:
    • sp_configure @configname = 'hadoop connectivity', @configvalue = 7;  
      GO  
        
      RECONFIGURE  
      GO
      

       

    • 命令行:  services.msc, 重启下面的服务
      1. SQL Server PolyBase Data Movement Service
      2. SQL Server PolyBase Engine

     Pushdown配置:


    • To improve query performance, enable pushdown computation to a Hadoop cluster
    1. Find the file yarn-site.xml in the installation path of SQL Server. Typically, the path is: C:Program FilesMicrosoft SQL ServerMSSQL13.MSSQLSERVERMSSQLBinnPolybaseHadoopconf
    2. On the Hadoop machine, find the analogous(类似) file in the Hadoop configuration directory. In the file, find and copy the value of the configuration key yarn.application.classpath
    3. On the SQL Server machine, in the yarn-site.xml file, find the yarn.application.classpath property. Paste the value from the Hadoop machine into the value element.
    4. $HADOOP_CONF_DIR,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*

       

     Create external tables for Azure blob storage


    • The Elastic(弹性) Database query feature relies on(依靠) the these four DDL statements. Typically, these DDL statements are used once or rarely when the schema of your application changes
      1. [CREATE MASTER KEY]    (https://msdn.microsoft.com/library/ms174382.aspx)
      2. [CREATE CREDENTIAL]    (https://msdn.microsoft.com/library/ms189522.aspx)
      3. [CREATE DATABASE SCOPED CREDENTIAL]   (https://msdn.microsoft.com/library/mt270260.aspx)
      4. [CREATE/DROP EXTERNAL DATA SOURCE]    (https://msdn.microsoft.com/library/dn935022.aspx)
      5. [CREATE/DROP EXTERNAL TABLE]                (https://msdn.microsoft.com/library/dn935021.aspx)
    • You can use the following syntax to drop the master key and credentials
      1. DROP CREDENTIAL <credential_name> ON DATABASE;
      2. DROP DATABASE SCOPED CREDENTIAL <credential_name>;
      3. DROP MASTER KEY; 
    • 模拟简单文本数据,如下:
    •      
    • 将文本文件上传到 Azure blob storage, 如下图:
    • 远程连接ploybase1, 打开ssms(注:此版本需单独安装),执行脚本如下:
    • CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password01!'; 
      
      CREATE DATABASE SCOPED CREDENTIAL JNAzureCredit   
      WITH IDENTITY = 'zhushy', Secret =  XXXX';  /*存储访问Key*/ 
      
      
      CREATE EXTERNAL DATA SOURCE JNAzureStorage with (  
              TYPE = HADOOP,   
              LOCATION ='wasb://<blob_container_name>@<azure_storage_account_name>.blob.core.chinacloudapi.cn',  
              CREDENTIAL = JNAzureCredit  
      );
      
      
      CREATE EXTERNAL FILE FORMAT TextFileFormat WITH (  
              FORMAT_TYPE = DELIMITEDTEXT,   
              FORMAT_OPTIONS (FIELD_TERMINATOR =' ', USE_TYPE_DEFAULT = TRUE))
      
      CREATE EXTERNAL TABLE [dbo].[hospital] (  
              [ID] varchar(128) NULL,
              [name] varchar(128) null
      )  
      WITH (LOCATION='/zhu/data1.txt',   
              DATA_SOURCE = JNAzureStorage,  
              FILE_FORMAT = TextFileFormat  
      );  

       

    • 验证数据,输入如下SQL: select * from [dbo].[hospital] ,如下图,则OK

     

  • 相关阅读:
    Maven pom.xml中的元素modules、parent、properties以及import
    基于SpringBoot搭建应用开发框架(一) —— 基础架构
    Spring Boot项目使用Eclipse进行断点调试Debug
    eclipse 运行springboot项目
    如何在eclipse中使用mvn clean install
    https://www.cnblogs.com/zy-jiayou/p/7661415.html
    SpringBoot系列三:SpringBoot基本概念(统一父 pom 管理、SpringBoot 代码测试、启动注解分析、配置访问路径、使用内置对象、项目打包发布)
    WebJars are client-side web libraries (e.g. jQuery & Bootstrap) packaged into JAR (Java Archive) files
    在EF中使用Expression自动生成p=>new Entity(){X="",Y="",..}格式的Lambda表达式灵活实现按需更新
    EF跨库查询,DataBaseFirst下的解决方案
  • 原文地址:https://www.cnblogs.com/tgzhu/p/5913508.html
Copyright © 2011-2022 走看看