zoukankan      html  css  js  c++  java
  • centos6安装hadoop1.2.1

    1->下载hadoop-1.2.1.tar.gz

    tar -zxvf hadoop-1.2.1.tar.gz  解压 这里假设解压的文件在 /root/soft

    2->创建 hadoop 账户

    groupadd hadoop

    useradd -g haddop -d /home/hadoop

    chown -R hadoop:hadoop /home/hadoop

    mv /root/soft/hadoop-1.2.1 /home/hadoop/hadoop-1.2.1

    3->hadoop账户免登陆设置

    su - hadoop

    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

    cd .ssh && chmod 710 authorized_keys

    4->安装jdk 

    下载  jdk-8u40-linux-i586.rpm  

    linux下安装  rpm -ivh jdk-8u40-linux-i586.rpm  

    安装完成后 被安装在/usr/java/jdk1.8.0_40

    设置环境变量 

    /etc/profile添加

    JAVA_HOME=/usr/java/jdk1.8.0_40

    export PATH=$JAVA_HOME/bin:$PATH

    进入hadoop根目录/conf/hadoop-env.sh 添加javahome

    export JAVA_HOME=/usr/java/jdk1.8.0_40

    5->设置hadoop环境变量

    /etc/profile添加

    HADOOP_HOME=/home/hadoop/hadoop-1.2.1
    PATH=$HADOOP_HOME/bin:$PATH

    进入hadoop根目录/conf

    修改三个配置文件

    conf/core-site.xml:

    <configuration>
         <property>
             <name>fs.default.name</name>
             <value>hdfs://localhost:9000</value>
         </property>
    </configuration>
    


    conf/hdfs-site.xml:

    <configuration>
         <property>
             <name>dfs.replication</name>
             <value>1</value>
         </property>
    </configuration>
    


    conf/mapred-site.xml:

    <configuration>
         <property>
             <name>mapred.job.tracker</name>
             <value>localhost:9001</value>
         </property>
    </configuration>

    6->启动hadoop

    su - hadoop

    start-all.sh

    或者 依次运行start-dfs.sh   start-mapred.sh 

    启动完成后  查看 hdfs文件系统的目录里的文件 可以理解为是个ftp服务器

    hadoop fs -lsa /

    这里 运行 hadoop 根目录下的 hadoop-examples-1.2.1.jar 测试程序

    hadoop jar hadoop-examples-1.2.1.jar 可以看到 这个测试程序 有多个程序

    [root@localhost hadoop-1.2.1]# hadoop jar hadoop-examples-1.2.1.jar

    An example program must be given as the first argument.
    Valid program names are:
      aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
      aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
      dbcount: An example job that count the pageview counts from a database.
      grep: A map/reduce program that counts the matches of a regex in the input.
      join: A job that effects a join over sorted, equally partitioned datasets
      multifilewc: A job that counts words from several files.
      pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
      pi: A map/reduce program that estimates Pi using monte-carlo method.
      randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
      randomwriter: A map/reduce program that writes 10GB of random data per node.
      secondarysort: An example defining a secondary sort to the reduce.
      sleep: A job that sleeps at each map and reduce task.
      sort: A map/reduce program that sorts the data written by the random writer.
      sudoku: A sudoku solver.
      teragen: Generate data for the terasort
      terasort: Run the terasort
      teravalidate: Checking results of terasort
      wordcount: A map/reduce program that counts the words in the input files.

    这里测试个wordcount  这个程序 是通过mapreduce程序 将hdfs某个目录下的文本文件 统计他里面出现的所有单词 以及单词出现的次数

    首先 在hdfs文件系统上创建一个文件夹 这里的/就是我们定义的

    conf/core-site.xml:

    <configuration>
         <property>
             <name>fs.default.name</name>
             <value>hdfs://localhost:9000</value>
         </property>
    </configuration>

    当然 一般情况下要写成 hadoop fs -mkdir hdfs://localhost:9000/test 下面简便写如下

    hadoop fs -mkdir /test

     hadoop fs -mkdir /test/input

    下面也可以将 /test/input 改成 hdfs://localhost:9000/test/input

    [hadoop@localhost hadoop-1.2.1]$ hadoop fs -put /home/hadoop/hadoop-1.2.1/conf/*.xml /test/input
    [hadoop@localhost hadoop-1.2.1]$ hadoop fs -lsr /
    drwxr-xr-x   - hadoop supergroup          0 2015-03-31 14:36 /test
    drwxr-xr-x   - hadoop supergroup          0 2015-03-31 14:38 /test/input
    -rw-r--r--   1 hadoop supergroup       7457 2015-03-31 14:38 /test/input/capacity-scheduler.xml
    -rw-r--r--   1 hadoop supergroup        294 2015-03-31 14:38 /test/input/core-site.xml
    -rw-r--r--   1 hadoop supergroup        327 2015-03-31 14:38 /test/input/fair-scheduler.xml
    -rw-r--r--   1 hadoop supergroup       4644 2015-03-31 14:38 /test/input/hadoop-policy.xml
    -rw-r--r--   1 hadoop supergroup        274 2015-03-31 14:38 /test/input/hdfs-site.xml
    -rw-r--r--   1 hadoop supergroup       2033 2015-03-31 14:38 /test/input/mapred-queue-acls.xml
    -rw-r--r--   1 hadoop supergroup        285 2015-03-31 14:38 /test/input/mapred-site.xml
    drwxr-xr-x   - hadoop supergroup          0 2015-03-31 13:21 /tmp
    drwxr-xr-x   - hadoop supergroup          0 2015-03-31 13:21 /tmp/hadoop-hadoop
    drwxr-xr-x   - hadoop supergroup          0 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred
    drwx------   - hadoop supergroup          0 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred/system
    -rw-------   1 hadoop supergroup          4 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred/system/jobtracker.info

    运行程序 

    hadoop jar hadoop-examples-1.2.1.jar wordcount /test/input /test/output

    可以通过http://192.168.2.88:50070/ 查看hdfs文件情况

    这里看 input为7个文件

    Goto : 


    Go to parent directory
    Name Type Size Replication Block Size Modification Time Permission Owner Group
    capacity-scheduler.xml file 7.28 KB 1 64 MB 2015-03-31 14:38 rw-r--r-- hadoop supergroup
    core-site.xml file 0.29 KB 1 64 MB 2015-03-31 14:38 rw-r--r-- hadoop supergroup
    fair-scheduler.xml file 0.32 KB 1 64 MB 2015-03-31 14:38 rw-r--r-- hadoop supergroup
    hadoop-policy.xml file 4.54 KB 1 64 MB 2015-03-31 14:38 rw-r--r-- hadoop supergroup
    hdfs-site.xml file 0.27 KB 1 64 MB 2015-03-31 14:38 rw-r--r-- hadoop supergroup
    mapred-queue-acls.xml file 1.99 KB 1 64 MB 2015-03-31 14:38 rw-r--r-- hadoop supergroup
    mapred-site.xml file 0.28 KB 1 64 MB 2015-03-31 14:38 rw-r--r-- hadoop supergroup

    可以通过http://192.168.2.88:50030/  查看mapreduce运行情况

    User: hadoop
    Job Name: word count
    Job File: hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201503311441_0001/job.xml
    Submit Host: localhost.localdomain
    Submit Host Address: 127.0.0.1
    Job-ACLs: All users are allowed
    Job Setup: Successful
    Status: Succeeded
    Started at: Tue Mar 31 14:45:01 PDT 2015
    Finished at: Tue Mar 31 14:45:40 PDT 2015
    Finished in: 38sec
    Job Cleanup: Successful


    Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed
    Task Attempts
    map 100.00%
    7 0 0 7 0 0 / 0
    reduce 100.00%
    1 0 0 1 0 0 / 0

    这里通过 这个结果 可以理解 什么是map 什么是reduce  这里 input目录下 可以看到有 7个文件 就启动了 7个 task 也就是7个map 

    就是7个线程单独统计 各自文件中的单词及其单词出现的次数 

    最后 再来一次 reduce 这里可以理解为统计  7个线程 统计出来的7个结果 合并到一起 排序后 最后所有相同单词 统计到一起 就是reduce的过程

    这里 map 运行的就是  tasktracker 而tasktracker 处理的文件处于 datanode中 所以一般 运行中tasktracker 和datanode处于同一台机器

    而 jobtracker 可以理解为 tasktracker 的启动者 创建多少个tasktracker 都由jobtracker 调度


    集群安装 参考

    http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html

  • 相关阅读:
    oracle中的事务
    delect 删除
    update更新修改数据
    insert插入数据
    复制表、复制表结构、复制数据
    子查询
    分组函数
    C++实现红外Fir谱图文件转BMP图片文件
    windows下安装mysql-5.7.20-winx64
    数据库设计——数值类型
  • 原文地址:https://www.cnblogs.com/liaomin416100569/p/9331286.html
Copyright © 2011-2022 走看看