zoukankan      html  css  js  c++  java
  • Steps to Install Hadoop on CentOS/RHEL 6---reference

    http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/#

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Read More

    This article will help you for step by step install and configure single node hadoop cluster.

    Step 1. Install Java

    Before installing hadoop make sure you have java installed on your system. If you do not have java installed use following article to install Java.

    Steps to install JAVA on CentOS and RHEL 5/6

    Step 2. Create User Account

    Create a system user account to use for hadoop installation.

    # useradd hadoop
    # passwd hadoop
    
    Changing password for user hadoop.
    New password:
    Retype new password:
    passwd: all authentication tokens updated successfully.
    
    Step 3. Configuring Key Based Login

    Its required to setup hadoop user to ssh itself without password. Using following method it will enable key based login for hadoop user.

    # su - hadoop
    $ ssh-keygen -t rsa
    $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    $ chmod 0600 ~/.ssh/authorized_keys
    $ exit
    
    Step 4. Download and Extract Hadoop Source

    Downlaod hadoop latest availabe version from its official site, and follow below steps.

    # mkdir /opt/hadoop
    # cd /opt/hadoop/
    # wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
    # tar -xzf hadoop-1.2.1.tar.gz
    # mv hadoop-1.2.1 hadoop
    # chown -R hadoop /opt/hadoop
    # cd /opt/hadoop/hadoop/
    
    Step 5: Configure Hadoop

    First edit hadoop configuration files and make following changes.
    5.1 Edit core-site.xml

    # vim conf/core-site.xml
    
    #Add the following inside the configuration tag
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000/</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    

    5.2 Edit hdfs-site.xml

    # vim conf/hdfs-site.xml
    
    # Add the following inside the configuration tag
    <property>
    	<name>dfs.data.dir</name>
    	<value>/opt/hadoop/hadoop/dfs/name/data</value>
    	<final>true</final>
    </property>
    <property>
    	<name>dfs.name.dir</name>
    	<value>/opt/hadoop/hadoop/dfs/name</value>
    	<final>true</final>
    </property>
    <property>
    	<name>dfs.replication</name>
    	<value>2</value>
    </property>
    

    5.3 Edit mapred-site.xml

    # vim conf/mapred-site.xml
    
    # Add the following inside the configuration tag
    <property>
            <name>mapred.job.tracker</name>
    	<value>localhost:9001</value>
    </property>
    

    5.4 Edit hadoop-env.sh

    # vim conf/hadoop-env.sh
    
    export JAVA_HOME=/opt/jdk1.7.0_17
    export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
    

    Set JAVA_HOME path as per your system configuration for java.

    Next to format Name Node

    # su - hadoop
    $ cd /opt/hadoop/hadoop
    $ bin/hadoop namenode -format
    
    13/06/02 22:53:48 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = srv1.tecadmin.net/192.168.1.90
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.2.1
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May  6 06:59:37 UTC 2013
    STARTUP_MSG:   java = 1.7.0_17
    ************************************************************/
    13/06/02 22:53:48 INFO util.GSet: Computing capacity for map BlocksMap
    13/06/02 22:53:48 INFO util.GSet: VM type       = 32-bit
    13/06/02 22:53:48 INFO util.GSet: 2.0% max memory = 1013645312
    13/06/02 22:53:48 INFO util.GSet: capacity      = 2^22 = 4194304 entries
    13/06/02 22:53:48 INFO util.GSet: recommended=4194304, actual=4194304
    13/06/02 22:53:49 INFO namenode.FSNamesystem: fsOwner=hadoop
    13/06/02 22:53:49 INFO namenode.FSNamesystem: supergroup=supergroup
    13/06/02 22:53:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
    13/06/02 22:53:49 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    13/06/02 22:53:49 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    13/06/02 22:53:49 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
    13/06/02 22:53:49 INFO namenode.NameNode: Caching file names occuring more than 10 times
    13/06/02 22:53:49 INFO common.Storage: Image file of size 112 saved in 0 seconds.
    13/06/02 22:53:49 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
    13/06/02 22:53:49 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
    13/06/02 22:53:49 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted.
    13/06/02 22:53:49 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at srv1.tecadmin.net/192.168.1.90
    ************************************************************/
    
    Step 6: Start Hadoop Services

    Use the following command to start all hadoop services.

    $ bin/start-all.sh
    

    [sample output]

    starting namenode, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-namenode-ns1.tecadmin.net.out
    localhost: starting datanode, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-datanode-ns1.tecadmin.net.out
    localhost: starting secondarynamenode, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-ns1  .tecadmin.net.out
    starting jobtracker, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-ns1.tecadmin.net.out
    localhost: starting tasktracker, logging to /opt/hadoop/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-ns1.tecadmin.ne  t.out
    
    
    Step 7: Test and Access Hadoop Services

    Use ‘jps‘ command to check if all services are started well.

    $ jps
    or
    $ $JAVA_HOME/bin/jps
    
    26049 SecondaryNameNode
    25929 DataNode
    26399 Jps
    26129 JobTracker
    26249 TaskTracker
    25807 NameNode
    

    Web Access URLs for Services

      http://srv1.tecadmin.net:50030/   for the Jobtracker
      http://srv1.tecadmin.net:50070/   for the Namenode
      http://srv1.tecadmin.net:50060/   for the Tasktracker
    

    Hadoop JobTracker: 
    Hadoop-Jobtracker

    Hadoop Namenode: 
    Hadoop-Namenode

    Hadoop TaskTracker: 
    Hadoop-Tasktracker

    Step 8: Stop Hadoop Services

    If you do no need anymore hadoop. Stop all hadoop services using following command.

    # bin/stop-all.sh
  • 相关阅读:
    【python小练】0020
    【python小练】0017-将xls文件内容写入xml文件中
    【python小练】图片爬虫之BeautifulSoup4
    【python小练】0013
    【python小练】0014题 和 0015 题
    【python小练】0012题
    【python小练】0011题
    【python小练】0010
    【python小练】0005
    2018.09.11python学习第1天
  • 原文地址:https://www.cnblogs.com/davidwang456/p/3567503.html
Copyright © 2011-2022 走看看