zoukankan      html  css  js  c++  java
  • hadoop-spark-hive-hbase配置相关说明

    1. zookeeper

    • 配置
    • cp app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo_sample.cfg app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo.cfg
    •  
    • vim app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo.cfg
    • dataDir=/home/cdh5/tmp/zookeeper
    • clientPort=2183
    • server.1=ocdata09:2888:3888
    • mkdir -p /home/cdh5/tmp/zookeeper
    • vim /home/cdh5/tmp/zookeeper/myid
    • echo "1" > /home/cdh5/tmp/zookeeper/myid
    • 初始化操作:

    或者 ./runRemoteCmd.sh '~/och200/zookeeper/bin/zkServer-initialize.sh --myid=1' zoo

    • 分发配置
    • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/zoo.cfg app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/ zoo
    • ./runRemoteCmd.sh "app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/zkServer.sh start" zoo
    • ./runRemoteCmd.sh 'echo ruok | nc localhost 2183' zoo
    • ./runRemoteCmd.sh "app/ochadoop-och3.0.0-SNAPSHOT/zookeeper-3.4.5-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/zkServer.sh stop" zoo
    • 启动
    • 验证
    • 停止

    2. HDFS

    • 配置hadoop
    • hdfs-site.xml
    • <property>
    •     <name>dfs.nameservices</name>
    •     <value>cdh5cluster</value>
    •     <description>
    •         Comma-separated list of nameservices.
    •     </description>
    • </property>
    • <property>
    •     <name>dfs.datanode.address</name>
    •     <value>0.0.0.0:50011</value>
    •     <description>
    •       The datanode server address and port for data transfer.
    •       If the port is 0 then the server will start on a free port.
    •     </description>
    • </property>
    • <property>
    •     <name>dfs.datanode.http.address</name>
    •     <value>0.0.0.0:50076</value>
    •     <description>
    •       The datanode http server address and port.
    •       If the port is 0 then the server will start on a free port.
    •     </description>
    • </property>
    • <property>
    •     <name>dfs.datanode.ipc.address</name>
    •     <value>0.0.0.0:50021</value>
    •     <description>
    •       The datanode ipc server address and port.
    •       If the port is 0 then the server will start on a free port.
    •     </description>
    • </property>
    •  
    • <property>
    •    ()
    •     <name>dfs.nameservices</name>
    •     <value>cdh5cluster</value>
    • </property>
    •  
    • <property>
    •   (命名空间中所有NameNode的唯一标示名称。可以配置多个,使用逗号分隔。该名称是可以让DataNode知道每个集群的所有NameNode.当前,每个集群最多只能配置两个NameNode)
    •     <name>dfs.ha.namenodes.cdh5cluster</name>
    •     <value>nn1,nn2</value>
    •     <description></description>
    • </property>
    •  
    • <property>
    •     <name>dfs.namenode.name.dir</name>
    •     <value>file:///data1/cdh5/dfs/name</value>
    •     <description>Determines where on the local filesystem the DFS name node should store the name table.If this is a comma-delimited list of directories,then name table is replicated in all of the directories,for redundancy.</description>
    •     <final>true</final>
    • </property>
    •  
    • <property>
    •       <name>dfs.datanode.data.dir</name>
    • <value>file:///data1/cdh5/dfs/data,file:///data2/cdh5/dfs/data,file:///data3/cdh5/dfs/data</value>
    •       <final>true</final>
    • </property>
    •  
    • <property>
    •       <name>dfs.replication</name>
    •       <value>3</value>
    • </property>
    •  
    • <property>
    •       <name>dfs.permission</name>
    •       <value>true</value>
    • </property>
    •  
    • <property>
    •     <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    •     <value>true</value>
    •     <description>
    •       Boolean which enables backend datanode-side support for the experimental DistributedFileSystem*getFileVBlockStorageLocations API.
    •     </description>
    • </property>
    •  
    • <property>
    •     <name>dfs.permissions.enabled</name>
    •     <value>false</value>
    •     <description>
    •       If "true", enable permission checking in HDFS.
    •       If "false", permission checking is turned off,
    •       but all other behavior is unchanged.
    •       Switching from one parameter value to the other does not change the mode,
    •       owner or group of files or directories.
    •     </description>
    • </property>
    •  
    • <property>
    • (每个NAMENODE监听的RPC地址)
    •     <name>dfs.namenode.rpc-address.cdh5cluster.nn1</name>
    •     <value>ocdata09:8030</value>
    •     <description>节点NN1的RPC地址</description>
    • </property>
    •  
    • <property>
    •     <name>dfs.namenode.rpc-address.cdh5cluster.nn2</name>
    •     <value>ocdata08:8030</value>
    •     <description>节点NN2的RPC地址</description>
    • </property>
    •  
    • <property>
    •     <name>dfs.namenode.http-address.cdh5cluster.nn1</name>
    •     <value>ocdata09:50082</value>
    •     <description>节点NN1的HTTP地址</description>
    • </property>
    •  
    • <property>
    •     <name>dfs.namenode.http-address.cdh5cluster.nn2</name>
    •     <value>ocdata08:50082</value>
    •     <description>节点NN2的HTTP地址</description>
    • </property>
    •  
    • <property>
    • (这是NameNode读写JNs的uri。通过这个uri,NameNodes可以读写edit log内容。URI的格式”qjournal://host1:port1;host2:port2;host3:port3/journalId”。这里的host1、host2、host3指的是Journal Node的地址,这里必须是奇数个,至少3个;其中journaId是集群的唯一标示符,对于多个联邦命名空间,也使用同一个journaId。配置如下J
    •     <name>dfs.namenode.shared.edits.dir</name>
    • <value>qjournal://ocdata05:8488;ocdata06:8488;ocdata07:8488/cdh5cluster</value>
    •     <description>采用3个journalnode节点存储元数据,这是IP与端口</description>
    • </property>

     

    • <property>
    •     <name>dfs.journalnode.edits.dir</name>
    •     <value>/home/cdh5/journaldata/jn</value>
    •     <description>journaldata的存储路径</description>
    • </property>
    •  
    • <property>
    •     <name>dfs.journalnode.rpc-address</name>
    •     <value>0.0.0.0:8488</value>
    • </property>
    •  
    • <property>
    •     <name>dfs.journalnode.http-address</name>
    •     <value>0.0.0.0:8483</value>
    • </property>
    •  
    • <property>
    •     <name>dfs.client.failover.proxy.provider.cdh5cluster</name>
    •     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    •     <description>该类用来判断哪个namenode处于生效状态</description>
    • </property>
    •  
    • <property>
    •     <name>dfs.ha.fencing.methods</name>
    •     <value>shell(/bin/true)</value>
    • </property>
    •  
    • <property>
    •     <name>dfs.ha.fencing.ssh.connect-timeout</name>
    •     <value>10000</value>
    • </property>
    •  
    • <property>
    •     <name>dfs.ha.automatic-failover.enabled</name>
    •     <value>true</value>
    •     <description>
    •       Whether automatic failover is enabled. See the HDFS High
    •       Availability documentation for details on automatic HA
    •       configuration.
    •     </description>
    • </property>
    •  
    • <property>
    •     <name>ha.zookeeper.quorum</name>
    •     <value>ocdata09:2183</value>
    •     <description>1个zookeeper节点</description>
    • </property>
    •  
    • <property>
    •     <name>dfs.datanode.max.xcievers</name>
    •     <value>4096</value>
    • </property>
    •  
    • <property>
    •     <name>dfs.datanode.max.transfer.threads</name>
    •     <value>4096</value>
    •     <description>
    •           Specifies the maximum number of threads to use for transferring data
    •           in and out of the DN.
    •     </description>
    • </property>
    •  
    • <property>
    •     <name>dfs.blocksize</name>
    •     <value>64m</value>
    •     <description>
    •         The default block size for new files, in bytes.
    •         You can use the following suffix (case insensitive):
    •         k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
    •         Or provide complete size in bytes (such as 134217728 for 128 MB).
    •     </description>
    • </property>
    •  
    • <property>
    •     <name>dfs.namenode.handler.count</name>
    •     <value>20</value>
    •     <description>The number of server threads for the namenode.</description>
    • </property>
    • <property>
    • (这是一个描述集群中NameNode节点的URI(包括协议、主机名称端口号—)集群里面的每一台机器都要知道NAMENODE的地址。DataNode节点会先在NAMENODE上注册,这样它们的数据才可以被使用。独立的客户端程序通过这个URI跟DATANODE交互,以取得文件的块列表。)
    •     <name>fs.defaultFS</name>
    •     <value>hdfs://cdh5cluster</value>
    • </property>
    •  
    • <property>
    • (hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中)
    •     <name>hadoop.tmp.dir</name>
    •     <value>/home/cdh5/tmp/hadoop/hadoop-${user.name}</value>
    • </property>
    •  
    • <property>
    • core-site.xml

                (对本地jar包进行加载)

    •     <name>io.native.lib.available</name>
    •     <value>true</value>
    •     <description>Should native hadoop libraries, if present, be used.</description>
    • </property>
    • (压缩和解压编码类列表,用逗号分隔,这些类是使用java ServiceLoader加载,如果不设置就为null)
    • <property>
    •     <name>io.compression.codecs</name>         <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
    • </property>
    • ocdata05
    • ocdata06
    • ocdata07
    • ocdata08
    • ocdata09
    • ocdata05
    • ocdata06
    • export JAVA_HOME=/home/cdh5/app/jdk1.7.0_21
    • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc all
    • slaves
    • masters
    • hadoop-env.sh
    • 分发
    • 初始化HDFS:

    主节点执行

    app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hdfs zkfc -formatZK

    ./runRemoteCmd.sh 'app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh start journalnode' jn

    app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hdfs namenode -format -initializeSharedEdits

    app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh start namenode

    备节点执行

    app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hdfs namenode -bootstrapStandby

    完成

    app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh stop namenode

    ./runRemoteCmd.sh 'app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/hadoop-daemon.sh stop journalnode' jn

    • 启动HDFS
    • app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/start-dfs.sh
    • http://10.1.253.99:50082/dfshealth.html (active)
    • http://10.1.253.98:50082/dfshealth.html (standby)
    • http://10.1.253.97:8483/journalstatus.jsp
    • http://10.1.253.96:8483/journalstatus.jsp
    • http://10.1.253.95:8483/journalstatus.jsp
    • app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/stop-dfs.sh
    • 验证:
    • 停止HDFS

    3. Yarn

    配置YARN

    • mapred-site.xml
    • cp app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml.template app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml
    •  
    • vim app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml
    •  
    • <property>
    •     <name>mapreduce.framework.name</name>
    •     <value>yarn</value>
    • </property>
    •  
    • <property>
    •     <name>mapreduce.shuffle.port</name>
    •     <value>8350</value>
    • </property>
    •  
    • <property>
    •     <name>mapreduce.jobhistory.address</name>
    •     <value>0.0.0.0:10121</value>
    • </property>
    •  
    • <property>
    •     <name>mapreduce.jobhistory.webapp.address</name>
    •     <value>0.0.0.0:19868</value>
    • </property>
    •  
    • <property>
    •     <name>mapreduce.jobtracker.http.address</name>
    •     <value>0.0.0.0:50330</value>
    • </property>
    •  
    • <property>
    •     <name>mapreduce.tasktracker.http.address</name>
    •     <value>0.0.0.0:50360</value>
    • </property>
    • vim app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop/yarn-site.xml
    •  
    • <!-- Resource Manager Configs -->
    • <property>
    •     <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    •     <value>2000</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.ha.enabled</name>
    •     <value>true</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    •     <value>true</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    •     <value>true</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.cluster-id</name>
    •     <value>yarn-rm-cluster</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.ha.rm-ids</name>
    •     <value>rm1,rm2</value>
    • </property>
    • <property>
    •     <description>Id of the current ResourceManager. Must be set explicitly on each ResourceManager to the appropriate value.</description>
    •     <name>yarn.resourcemanager.ha.id</name>
    •     <value>rm1</value>
    •     <!-- rm1上配置为rm1, rm2上配置rm2 -->
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.recovery.enabled</name>
    •     <value>true</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.store.class</name>
    •     <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.zk.state-store.address</name>
    •     <value>ocdata09:2183</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.zk-address</name>
    •     <value>ocdata09:2183</value>
    • </property>
    • <property>
    •     <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    •     <value>5000</value>
    • </property>
    • <!-- RM1 configs -->
    • <property>
    •     <name>yarn.resourcemanager.address.rm1</name>
    •     <value>ocdata08:23140</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.scheduler.address.rm1</name>
    •     <value>ocdata08:23130</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.webapp.address.rm1</name>
    •     <value>ocdata08:23188</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    •     <value>ocdata08:23125</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.admin.address.rm1</name>
    •     <value>ocdata08:23141</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.ha.admin.address.rm1</name>
    •     <value>ocdata08:23142</value>
    • </property>
    • <!-- RM2 configs -->
    • <property>
    •     <name>yarn.resourcemanager.address.rm2</name>
    •     <value>ocdata09:23140</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.scheduler.address.rm2</name>
    •     <value>ocdata09:23130</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.webapp.address.rm2</name>
    •     <value>ocdata09:23188</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    •     <value>ocdata09:23125</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.admin.address.rm2</name>
    •     <value>ocdata09:23141</value>
    • </property>
    • <property>
    •     <name>yarn.resourcemanager.ha.admin.address.rm2</name>
    •     <value>ocdata09:23142</value>
    • </property>
    • <!-- Node Manager Configs -->
    • <property>
    •     <description>Address where the localizer IPC is.</description>
    •     <name>yarn.nodemanager.localizer.address</name>
    •     <value>0.0.0.0:23344</value>
    • </property>
    • <property>
    •     <description>NM Webapp address.</description>
    •     <name>yarn.nodemanager.webapp.address</name>
    •     <value>0.0.0.0:23999</value>
    • </property>
    • <property>
    •     <name>yarn.nodemanager.aux-services</name>
    •     <value>mapreduce_shuffle</value>
    • </property>
    • <property>
    •     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    •     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    • </property>
    • <property>
    •     <name>yarn.nodemanager.local-dirs</name>
    •     <value>/tmp/pseudo-dist/yarn/local</value>
    • </property>
    • <property>
    •     <name>yarn.nodemanager.log-dirs</name>
    •     <value>/tmp/pseudo-dist/yarn/log</value>
    • </property>
    • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc all
    • app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/start-yarn.sh
    • yarn-site.xml
    • 分发
    • Yarn的启动停止 YARN不需要初始化,登录主节点执行

    cdh5 yarn的ha需要手动启动备节点

        ./runRemoteCmd.sh "cd app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin; ./yarn-daemon.sh start resourcemanager" rm2

    验证

    http://10.1.253.98:23188/cluster (有节点列表,active)

    http://10.1.253.99:23188/cluster (无节点列表,standby)

     

    app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/bin/hadoop jar app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar TestDFSIO -write -nrFiles 40 -fileSize 20MB

    停止YARN

    app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin/stop-yarn.sh

    手动停止备节点

    ./runRemoteCmd.sh "cd app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/sbin; ./yarn-daemon.sh stop resourcemanager" rm2

    4. hive

    • 配置
    • cp hive-env.sh.template hive-env.sh
    • vim hive-env.sh
    •  
    • export HADOOP_HOME=/home/cdh5/app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT
    •  
    • cp hive-default.xml.template hive-site.xml
    • vim hive-site.xml

    删除其他配置项,只保留:

    <property>

    (配置元数据库,一般为mysql)

        <name>javax.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://10.1.252.69:3306/cdh5?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>

        </property>

     

    <property>

        (配置元数据库的Driver)

        <name>javax.jdo.option.ConnectionDriverName</name>

        <value>com.mysql.jdbc.Driver</value>

        <description>Driver class name for a JDBC metastore</description>

    </property>

     

    <property>

          (配置元数据库的名称)

        <name>javax.jdo.option.ConnectionUserName</name>

        <value>cdh5</value>

        <description>username to use against metastore database</description>

    </property>

     

    <property>

           (配置元数据库的密码)

        <name>javax.jdo.option.ConnectionPassword</name>

        <value>cdh5</value>

        <description>password to use against metastore database</description>

    </property>

    • 元数据库配置
    • CREATE USER cdh5 IDENTIFIED BY 'cdh5';
    • CREATE DATABASE cdh5;
    • alter database cdh5 character set latin1;
    • grant all privileges on *.* to cdh5@"%" identified by "cdh5";
    • flush privileges;
    • scp mysql-connector-java-5.1.26.jar cdh5@10.1.253.99:/home/cdh5/app/ochadoop-och3.0.0-SNAPSHOT/hive-0.12.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/lib/
    • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hive-0.12.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT app/ochadoop-och3.0.0-SNAPSHOT/ hive
    • nohup ./hiveserver2 &
    • jdbc:
    • jdbc:hive2://10.1.253.99:10000/default
    • org.apache.hive.jdbc.HiveDriver
    • lib: Hadoop和hive下所有jar包
    •  
    • !connect jdbc:hive2://10.1.253.99:10000/default
    • Enter username:dmp
    • Enter password:dmp
    •  
    • show tables;
    • +--------------+
    • |   tab_name   |
    • +--------------+
    • | shaoaq_test  |
    • +--------------+
    •  
    • select * from shaoaq_test;
    • +-----+
    • | id  |
    • +-----+
    • +-----+
    • 上传jdbc jar包
    • 分发
    • 启动
    • 验证

    5. hbase

    • 配置
    • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/regionservers
    •  
    • ocdata05
    • ocdata06
    • ocdata07
    •  
    • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/backup-masters
    •  
    • ocdata08
    •  
    • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/hbase-site.xml
    •  
    • <property>
    •     <name>hbase.rootdir</name>
    •     <value>hdfs://cdh5cluster/hbase</value>
    • </property>
    • <property> 
    •     <name>hbase.cluster.distributed</name> 
    •     <value>true</value> 
    • </property>
    • <property>
    •     <name>hbase.zookeeper.quorum</name>
    •     <value>ocdata09</value>
    • </property>
    • <property>
    •     <name>hbase.zookeeper.property.clientPort</name>
    •     <value>2183</value>
    • </property>
    • <property>
    •     <name>hbase.regionserver.port</name>
    •     <value>60328</value>
    • </property>
    • <property>
    •     <name>hbase.regionserver.info.port</name>
    •     <value>62131</value>
    • </property>
    •  
    • vim app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf/hbase-env.sh
    •  
    • export JAVA_HOME=/home/cdh5/app/jdk1.7.0_51
    • export HBASE_CLASSPATH=/home/cdh5/app/ochadoop-och3.0.0-SNAPSHOT/hadoop-2.2.0-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/etc/hadoop
    • export HBASE_HOME=/home/cdh5/app/hbase
    • export HADOOP_HOME=/home/cdh5/app/hadoop
    • export HADOOP_CONF_DIR=${HADOOP_HOME}/conf
    • export HBASE_LIBRARY_PATH=${HBASE_HOME}/lib/native
    • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HBASE_HOME}/lib/native
    • export HBASE_MANAGES_ZK=false
    • ./deploy.sh app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/conf app/ochadoop-och3.0.0-SNAPSHOT/hbase-0.96.1.1-cdh5.0.0-beta-2-och3.0.0-SNAPSHOT/ all
    • ./start-hbase.sh
    • ./hbase shell
    • create 'hb_test', 'cf'
    • put 'hb_test','row1','cf:a','123'
    • get 'hb_test','row1'
    • COLUMN                            CELL
    • cf:a                             timestamp=1395204538429, value=123
    • 1 row(s) in 0.0490 seconds
    • quit
    • ./stop-hbase.sh
    • 分发配置
    • 启动
    • 验证
    • 停止

    5. spark

    spark当前可解压即用,yarn-client模式无需分发,只需修改客户端若干配置;

    • spark-1.1.0 on yarn的几个配置说明如下:
    • vim spark-env.sh
    •  
    • MASTER:部署模式,yarn-client/yarn-cluster/local
    • HADOOP_CONF_DIR:(必填)hadoop配置文件目录
    • SCALA_HOME:scala安装路径
    • SPARK_EXECUTOR_INSTANCES:spark申请的yarn worker总数
    • SPARK_EXECUTOR_CORES:每个worker申请的vcore数目
    • SPARK_EXECUTOR_MEMORY:每个worker申请的内存大小
    • SPARK_DRIVER_MEMORY:spark申请的appMaster内存大小
    • SPARK_YARN_APP_NAME:yarn中显示的spark任务名称
    • SPARK_YARN_QUEUE:spark任务队列
    • SPARK_SUBMIT_LIBRARY_PATH:spark任务执行时需要的库目录,如hadoop的native目录
    • SPARK_CLASSPATH:spark任务的classpath
    • SPARK_JAVA_OPTS:JVM进程参数,如gc类型、gc日志、dmp输出等
    • SPARK_HISTORY_OPTS:spark history-server配置参数,一般需要指定webUI端口、记录个数以及Event存储目录等
    •  
    • vim spark-defaults.conf
    •  
    • spark.local.dir:spark任务执行时的本地临时目录
    • spark.yarn.executor.memoryOverhead:每个worker的堆外内存大小,单位MB,yarn模式下需配置以防止内存溢出
    • spark.eventLog.enabled:是否记录Spark事件,用 于应用程序在完成后重构webUI
    • spark.eventLog.dir:保存日志相关信息的路径,可以是hdfs://开头的HDFS路径,也可以是file://开头的本地路径,都需要提前创建
    • spark.eventLog.compress:是否压缩记录Spark事件,前提spark.eventLog.enabled为true,默认使用snappy
    • 启停thrift-server:

    使用spark-sql/thrift-server组件前需将hive-site.xml复制到$SPARK_HOME/conf目录下以使用hive的元数据和若干配置如server端口,可能需要去掉其中的一些多余或不支持的配置项,请留意;

        $SPARK_HOME/sbin/start-thriftserver.sh

        $SPARK_HOME/sbin/stop-thriftserver.sh

    • 启停history-server:
    • $SPARK_HOME/sbin/start-history-server.sh
    • $SPARK_HOME/sbin/stop-history-server.sh
    • 注意:
    1. 如hadoop中启用了lzo压缩需将hadoop-lzo-*.jar复制到SPARK_HOME/lib/目录下;
    2. SPARK-1.1.0版本中spark-examples-*.jar关联的thrift版本与spark-assembly-*.jar不一致,需删除;
    • 配置样例:

    spark-env.sh

        MASTER="yarn-client"

        SPARK_HOME=/home/ochadoop/app/spark

        SCALA_HOME=/home/ochadoop/app/scala

        JAVA_HOME=/home/ochadoop/app/jdk

        HADOOP_HOME=/home/ochadoop/app/hadoop

        HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

     

        SPARK_EXECUTOR_INSTANCES=50

        SPARK_EXECUTOR_CORES=2

        SPARK_EXECUTOR_MEMORY=4G

        SPARK_DRIVER_MEMORY=3G

        SPARK_YARN_APP_NAME="Spark-1.1.0"

        #export SPARK_YARN_QUEUE="default"

     

        SPARK_SUBMIT_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native

        SPARK_JAVA_OPTS="-verbose:gc -XX:-UseGCOverheadLimit -XX:+UseCompressedOops -XX:-PrintGCDetails -XX:+PrintGCTimeStamps $SPARK_JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ochadoop/app/spark/`date +%m%d%H%M%S`.hprof"

        export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=1000 -Dspark.history.fs.logDirectory=hdfs://testcluster/eventLog"

    spark-defaults.conf

        spark.serializer                    org.apache.spark.serializer.KryoSerializer

        spark.local.dir                     /data2/ochadoop/data/pseudo-dist/spark/local,/data3/ochadoop/data/pseudo-dist/spark/local

        spark.io.compression.codec          snappy

        spark.speculation                   false

        spark.yarn.executor.memoryOverhead  512

        #spark.storage.memoryFraction       0.4

        spark.eventLog.enabled              true

        spark.eventLog.dir                  hdfs://testcluster/eventLog

        spark.eventLog.compress             true

    如下命令都是用root身份安装,或者在命令前加上sudo
    采用yum安装方式安装
    yum install mysql     #安装mysql客户端
    yum install mysql-server  #安装mysql服务端
    判断MySQL是否已经安装好:
    chkconfig --list|grep mysql
    启动mysql服务:
    service mysqld start或者/etc/init.d/mysqld start
    检查是否启动mysql服务:
    /etc/init.d/mysqld status
    设置MySQL开机启动:
    chkconfig mysqld on 
    检查设置MySQL开机启动是否配置成功:
    chkconfig --list|grep mysql
    显示2 3 4 5为on
    创建root管理员
    mysqladmin -uroot password root
    登录
    mysql -uroot -proot

  • 相关阅读:
    js截取字符串区分汉字字母代码
    List 去处自定义重复对象方法
    63. Unique Paths II
    62. Unique Paths
    388. Longest Absolute File Path
    41. First Missing Positive
    140. Word Break II
    139. Word Break
    239. Sliding Window Maximum
    5. Longest Palindromic Substring
  • 原文地址:https://www.cnblogs.com/yangsy0915/p/4867423.html
Copyright © 2011-2022 走看看