zoukankan      html  css  js  c++  java
  • 大数据平台搭建 cdh5.11.1 hue安装及集成其他组件

    一、简介

    hue是一个开源的apache hadoop ui系统,由cloudear desktop演化而来,最后cloudera公司将其贡献给了apache基金会的hadoop社区,它基于python框架django实现的。

    通过使用hue,我们可以使用可视化的界面在web浏览器上与hadoop集群交互来分析处理数据,例如操作hdfs上的数据,运行MapReduce Job,查看HBase中的数据

    二、安装

    (1)下载

    http://archive.cloudera.com/cdh5/cdh/5/

    从这里下载cdh5.11.1的最新版本的hue,3.9.0版本,到本地,并上传到服务器,解压缩到app目录下

    (2)必要的组件准备

    需要先安装好mysql数据库

    需要安装好下面的组件

    sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel python-simplejson sqlite-devel gmp-devel -y

     (3)编译

    到hue的根目录下,运行

    make apps

    三、配置

    (1)hue基本配置

    打开desktop/conf/hue.ini文件

    [desktop]
    
      # Set this to a random string, the longer the better.
      # This is used for secure hashing in the session store.
      secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
    
      # Webserver listens on this address and port
      http_host=hadoop001
      http_port=8888
    
      # Time zone name
      time_zone=Asia/Shanghai
    
      # Enable or disable Django debug mode.
      django_debug_mode=false
    
      # Enable or disable backtrace for server error
      http_500_debug_mode=false
    
      # Enable or disable memory profiling.
      ## memory_profiler=false
    
      # Server email for internal error messages
      ## django_server_email='hue@localhost.localdomain'
    
      # Email backend
      ## django_email_backend=django.core.mail.backends.smtp.EmailBackend
    
      # Webserver runs as this user
      server_user=hue
      server_group=hue
    
      # This should be the Hue admin and proxy user
      ## default_user=hue
    
      # This should be the hadoop cluster admin
      #default_hdfs_superuser=hadoop
    

     

    (2)配置hue集成hadoop

    首先hadoop里设置代理用户,需要配置hadoop的core-site.xml

    <property>
    	  <name>hadoop.proxyuser.hue.hosts</name>
    	  <value>*</value>
    	</property>
    	<property>
    	  <name>hadoop.proxyuser.hue.groups</name>
    	  <value>*</value>
    </property> 

    加入这两个属性即可。

    然后重启hadoop集群

    sbin/stop-dfs.sh

    sbin/stop-yarn.sh

    sbin/start-dfs.sh

    sbin/start-yarn.sh

     

    [hadoop]
    
      # Configuration for HDFS NameNode
      # ------------------------------------------------------------------------
      [[hdfs_clusters]]
        # HA support by using HttpFs
    
        [[[default]]]
          # Enter the filesystem uri
          fs_defaultfs=hdfs://hadoop001:8020
    
          # NameNode logical name.
          ## logical_name=
    
          # Use WebHdfs/HttpFs as the communication mechanism.
          # Domain should be the NameNode or HttpFs host.
          # Default port is 14000 for HttpFs.
          webhdfs_url=http://hadoop001:50070/webhdfs/v1
    
          # Change this if your HDFS cluster is Kerberos-secured
          ## security_enabled=false
    
          # Default umask for file and directory creation, specified in an octal value.
          ## umask=022
    
          # Directory of the Hadoop configuration
          hadoop_conf_dir=/home/hadoop/app/hadoop/etc/hadoop
    
      # Configuration for YARN (MR2)
      # ------------------------------------------------------------------------
      [[yarn_clusters]]
    
        [[[default]]]
          # Enter the host on which you are running the ResourceManager
          resourcemanager_host=hadoop002
    
          # The port where the ResourceManager IPC listens on
          resourcemanager_port=8032
    
          # Whether to submit jobs to this cluster
          submit_to=True
    
          # Resource Manager logical name (required for HA)
          ## logical_name=
    
          # Change this if your YARN cluster is Kerberos-secured
          ## security_enabled=false
    
          # URL of the ResourceManager API
          resourcemanager_api_url=http://hadoop002:8088
    
          # URL of the ProxyServer API
          proxy_api_url=http://hadoop002:8088
    
          # URL of the HistoryServer API
          history_server_api_url=http://hadoop002:19888
    
          # In secure mode (HTTPS), if SSL certificates from Resource Manager's
          # Rest Server have to be verified against certificate authority
          ## ssl_cert_ca_verify=False
    
        # HA support by specifying multiple clusters
        # e.g.
    
        # [[[ha]]]
          # Resource Manager logical name (required for HA)
          ## logical_name=my-rm-name
    
      # Configuration for MapReduce (MR1)
    

     

    (3)配置hue集成hive

    [beeswax]
    
      # Host where HiveServer2 is running.
      # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
        hive_server_host=hadoop001
    
      # Port where HiveServer2 Thrift server runs on.
        hive_server_port=10000
    
      # Hive configuration directory, where hive-site.xml is located
        hive_conf_dir=/home/hadoop/app/hive/conf
    
      # Timeout in seconds for thrift calls to Hive service
      server_conn_timeout=120
    

    (4)配置hue集成hbase

    ###########################################################################
    # Settings to configure HBase Browser
    ###########################################################################
    
    [hbase]
      # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
      # Use full hostname with security.
      # If using Kerberos we assume GSSAPI SASL, not PLAIN.
      hbase_clusters=(Cluster|hadoop004:9090)
    
      # HBase configuration directory, where hbase-site.xml is located.
      hbase_conf_dir=/home/hadoop/app/hbase/conf
    
      # Hard limit of rows or columns per row fetched before truncating.
      ## truncate_limit = 500
    
      # 'buffered' is the default of the HBase Thrift Server and supports security.
      # 'framed' can be used to chunk up responses,
      # which is useful when used in conjunction with the nonblocking server in Thrift.
      ## thrift_transport=buffered
    

     

    (5)配置hue集成oozie

    ###########################################################################
    # Settings to configure liboozie
    ###########################################################################
    
    [liboozie]
      # The URL where the Oozie service runs on. This is required in order for
      # users to submit jobs. Empty value disables the config check.
      oozie_url=http://hadoop004:11000/oozie
    
      # Requires FQDN in oozie_url if enabled
      ## security_enabled=false
    
      # Location on HDFS where the workflows/coordinator are deployed when submitted.
      #remote_deployement_dir=/user/hue/oozie/deployments
    

    (6)配置hue使用mysql数据库

    先配置一个mysql数据库

     [[database]]
        # Database engine is typically one of:
        # postgresql_psycopg2, mysql, sqlite3 or oracle.
        #
        # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name
        # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
        # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
        # Note for MariaDB use the 'mysql' engine.
        engine=mysql
        host=hadoop004
        port=3306
        user=root
        password=123456
    	name=hue
        # conn_max_age option to make database connection persistent value in seconds
        # https://docs.djangoproject.com/en/1.9/ref/databases/#persistent-connections
        ## conn_max_age=0
        # Execute this script to produce the database password. This will be used when 'password' is not set.
        ## password_script=/path/script
        ## name=desktop/desktop.db
        ## options={}
        # Database schema, to be used only when public schema is revoked in postgres
        ## schema=public

    然后去mysql中,建立一个数据库,就是上面的配置文件中配置的数据库oozie

    create database hue default character set utf8 default collate utf8_general_ci;

    接着,进入hue的build/env/bin目录下,运行:

    ./hue syncdb
    ./hue migrate
     
    完毕之后,数据库中的表其实默认引擎是MyISAM,需要调整成InnoDB,否则在hue的首页会出现下面的警告信息:

    PREFERRED_STORAGE_ENGINE   We recommend MySQL InnoDB engine over MyISAM which does not support transactions.

    那么运行下面的语句即可批量调整:

    mysql -u root -proot -e \
    "SELECT CONCAT('ALTER TABLE ',table_schema,'.',table_name,' engine=InnoDB;') \
    FROM information_schema.tables \
    WHERE engine = 'MyISAM' AND table_schema = 'hue';" \
    | grep "ALTER TABLE hue" > /tmp/set_engine_innodb.ddl
    
     
    mysql -u root -proot < /tmp/set_engine_innodb.ddl
    

    (注意空格上面是一句,空格下面是一句,分两次执行)

      

     

    (7)启动hue

    先启动hive的metastore服务,和hiveserver2服务

    nohup hive --service metastore &
    nohup hive --service hiveserver2 &
     
    再启动hue
    nohup /home/hadoop/app/hue/build/env/bin/supervisor &

    (6)访问hue

    http://hadoop004:8888

     

    四、可能会遇到的问题

    Failed to contact an active Resource Manager: YARN RM returned a failed response: { "RemoteException" : { "message" : "User: hue is not allowed to impersonate admin", "exception" : "AuthorizationException", "javaClassName" : "org.apache.hadoop.security.authorize.AuthorizationException" } } (error 403)

    这个问题是hadoop的core-site.xml配置的代理的用户和hue配置文件的用户不一致造成的。

    比如,hadoop的core-site.xml是这样配置的

    <property>
    <name>hadoop.proxyuser.hue.hosts</name>
    <value>*</value>
    </property>
    <property>
    <name>hadoop.proxyuser.hue.groups</name>
    <value>*</value>
    </property>

    代理用户是hue。

    而hue里面是这样配置的:

    # Webserver runs as this user
    #server_user=hue
    #server_group=hue

    需要把server_user和server_group设置成hue,即可

     

  • 相关阅读:
    适配器模式(16)
    状态模式(15)
    用反射技术替换工厂种的switch分支(14)
    2017年目标与规划
    抽象工厂模式(13)
    观察者模式(12)
    建造者模式(11)
    TCP 可靠传输与流量控制的实现
    TCP报文段的首部格式
    TCP可靠传输的工作原理
  • 原文地址:https://www.cnblogs.com/nicekk/p/9028606.html
Copyright © 2011-2022 走看看