zoukankan      html  css  js  c++  java
  • Data

    系统版本

    anliven@Ubuntu1604:~$ uname -a
    Linux Ubuntu1604 4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5 09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
    anliven@Ubuntu1604:~$ 
    anliven@Ubuntu1604:~$ cat /proc/version
    Linux version 4.8.0-36-generic (buildd@lgw01-18) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Sun Feb 5 09:39:57 UTC 2017
    anliven@Ubuntu1604:~$ 
    anliven@Ubuntu1604:~$ lsb_release -a
    No LSB modules are available.
    Distributor ID:	Ubuntu
    Description:	Ubuntu 16.04.2 LTS
    Release:	16.04
    Codename:	xenial
    anliven@Ubuntu1604:~$ 
    

    创建hadoop用户

    anliven@Ubuntu1604:~$ sudo useradd -m hadoop -s /bin/bash
    anliven@Ubuntu1604:~$ sudo passwd hadoop
    输入新的 UNIX 密码: 
    重新输入新的 UNIX 密码: 
    passwd:已成功更新密码
    anliven@Ubuntu1604:~$ 
    anliven@Ubuntu1604:~$ sudo adduser hadoop sudo
    正在添加用户"hadoop"到"sudo"组...
    正在将用户“hadoop”加入到“sudo”组中
    完成。
    anliven@Ubuntu1604:~$ 
    

    更新apt及安装vim

    hadoop@Ubuntu1604:~$ sudo apt-get update
    命中:1 http://mirrors.aliyun.com/ubuntu xenial InRelease
    命中:2 http://mirrors.aliyun.com/ubuntu xenial-updates InRelease
    命中:3 http://mirrors.aliyun.com/ubuntu xenial-backports InRelease
    命中:4 http://mirrors.aliyun.com/ubuntu xenial-security InRelease
    正在读取软件包列表... 完成                       
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ sudo apt-get install vim
    正在读取软件包列表... 完成
    正在分析软件包的依赖关系树       
    正在读取状态信息... 完成       
    vim 已经是最新版 (2:7.4.1689-3ubuntu1.2)。
    升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 50 个软件包未被升级。
    hadoop@Ubuntu1604:~$ 
    

    配置SSH免密码登录

    hadoop@Ubuntu1604:~$ sudo apt-get install openssh-server
    正在读取软件包列表... 完成
    正在分析软件包的依赖关系树       
    正在读取状态信息... 完成       
    openssh-server 已经是最新版 (1:7.2p2-4ubuntu2.1)。
    升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 50 个软件包未被升级。
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ cd ~
    hadoop@Ubuntu1604:~$ mkdir .ssh
    hadoop@Ubuntu1604:~$ cd .ssh
    hadoop@Ubuntu1604:~/.ssh$ ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    The key fingerprint is:
    SHA256:DzjVWgTQB5I1JGRBmWi6gVHJ03V4WnJZEdojtbou0DM hadoop@Ubuntu1604
    The key's randomart image is:
    +---[RSA 2048]----+
    | o.o =X@B=*o     |
    |. + +.*+*B..     |
    | o +   *+.*      |
    |. o   .o = .     |
    |   o .o S        |
    |  . . E. +       |
    |     . o. .      |
    |      ..         |
    |       ..        |
    +----[SHA256]-----+
    hadoop@Ubuntu1604:~/.ssh$ 
    hadoop@Ubuntu1604:~/.ssh$ cat id_rsa.pub >> authorized_keys
    hadoop@Ubuntu1604:~/.ssh$ ls -l
    总用量 12
    -rw-rw-r-- 1 hadoop hadoop  399 4月  27 07:33 authorized_keys
    -rw------- 1 hadoop hadoop 1679 4月  27 07:32 id_rsa
    -rw-r--r-- 1 hadoop hadoop  399 4月  27 07:32 id_rsa.pub
    hadoop@Ubuntu1604:~/.ssh$ 
    hadoop@Ubuntu1604:~/.ssh$ cd 
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ ssh localhost
    The authenticity of host 'localhost (127.0.0.1)' can't be established.
    ECDSA key fingerprint is SHA256:fZ7fAvnnFk0/Imkn0YPdc2Gzxnfr0IJGSRb1swbm7oU.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
    Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.8.0-36-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
    
    44 个可升级软件包。
    0 个安全更新。
    
    *** 需要重启系统 ***
    Last login: Thu Apr 27 07:25:26 2017 from 192.168.16.1
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ exit
    注销
    Connection to localhost closed.
    hadoop@Ubuntu1604:~$ 
    

    安装Java

    hadoop@Ubuntu1604:~$ dpkg -l |grep jdk
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ sudo apt-get install openjdk-8-jre openjdk-8-jdk
    正在读取软件包列表... 完成
    正在分析软件包的依赖关系树       
    正在读取状态信息... 完成       
    将会同时安装下列软件:
    ......
    ......
    ......
    done.
    正在处理用于 libc-bin (2.23-0ubuntu7) 的触发器 ...
    正在处理用于 ca-certificates (20160104ubuntu1) 的触发器 ...
    Updating certificates in /etc/ssl/certs...
    0 added, 0 removed; done.
    Running hooks in /etc/ca-certificates/update.d...
    done.
    done.
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ dpkg -l |grep jdk
    ii  openjdk-8-jdk:amd64                        8u121-b13-0ubuntu1.16.04.2                    amd64        OpenJDK Development Kit (JDK)
    ii  openjdk-8-jdk-headless:amd64               8u121-b13-0ubuntu1.16.04.2                    amd64        OpenJDK Development Kit (JDK) (headless)
    ii  openjdk-8-jre:amd64                        8u121-b13-0ubuntu1.16.04.2                    amd64        OpenJDK Java runtime, using Hotspot JIT
    ii  openjdk-8-jre-headless:amd64               8u121-b13-0ubuntu1.16.04.2                    amd64        OpenJDK Java runtime, using Hotspot JIT (headless)
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ dpkg -L openjdk-8-jdk | grep '/bin$'
    /usr/lib/jvm/java-8-openjdk-amd64/bin
    hadoop@Ubuntu1604:~$  
    hadoop@Ubuntu1604:~$ vim ~/.bashrc
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ head ~/.bashrc |grep java
    export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ source ~/.bashrc
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ echo $JAVA_HOME
    /usr/lib/jvm/java-8-openjdk-amd64
    hadoop@Ubuntu1604:~$ 
    hadoop@Ubuntu1604:~$ java -version
    openjdk version "1.8.0_121"
    OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
    OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)
    hadoop@Ubuntu1604:~$ 
    

    安装Hadoop

    hadoop@Ubuntu1604:~$ sudo tar -zxf ~/hadoop-2.8.0.tar.gz -C /usr/local
    [sudo] hadoop 的密码: 
    hadoop@Ubuntu1604:~$ cd /usr/local
    hadoop@Ubuntu1604:/usr/local$ sudo mv ./hadoop-2.8.0/ ./hadoop
    hadoop@Ubuntu1604:/usr/local$ sudo chown -R hadoop ./hadoop
    hadoop@Ubuntu1604:/usr/local$ ls -l |grep hadoop
    drwxr-xr-x 9 hadoop dialout 4096 3月  17 13:31 hadoop
    hadoop@Ubuntu1604:/usr/local$ cd ./hadoop
    hadoop@Ubuntu1604:/usr/local/hadoop$ ls -l
    总用量 148
    drwxr-xr-x 2 hadoop dialout  4096 3月  17 13:31 bin
    drwxr-xr-x 3 hadoop dialout  4096 3月  17 13:31 etc
    drwxr-xr-x 2 hadoop dialout  4096 3月  17 13:31 include
    drwxr-xr-x 3 hadoop dialout  4096 3月  17 13:31 lib
    drwxr-xr-x 2 hadoop dialout  4096 3月  17 13:31 libexec
    -rw-r--r-- 1 hadoop dialout 99253 3月  17 13:31 LICENSE.txt
    -rw-r--r-- 1 hadoop dialout 15915 3月  17 13:31 NOTICE.txt
    -rw-r--r-- 1 hadoop dialout  1366 3月  17 13:31 README.txt
    drwxr-xr-x 2 hadoop dialout  4096 3月  17 13:31 sbin
    drwxr-xr-x 4 hadoop dialout  4096 3月  17 13:31 share
    hadoop@Ubuntu1604:/usr/local/hadoop$ ./bin/hadoop version
    Hadoop 2.8.0
    Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009
    Compiled by jdu on 2017-03-17T04:12Z
    Compiled with protoc 2.5.0
    From source with checksum 60125541c2b3e266cbf3becc5bda666
    This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.8.0.jar
    hadoop@Ubuntu1604:/usr/local/hadoop$ 
    

    运行Hadoop单机配置下的grep示例

    Hadoop 默认模式为非分布式模式(本地模式),无需进行其他配置即可运行。非分布式即单 Java 进程,方便进行调试。

    hadoop@Ubuntu1604:~$ cd /usr/local/hadoop/
    hadoop@Ubuntu1604:/usr/local/hadoop$ mkdir ./input
    hadoop@Ubuntu1604:/usr/local/hadoop$ cp ./etc/hadoop/*.xml ./input/
    hadoop@Ubuntu1604:/usr/local/hadoop$ ls -l input/
    总用量 56
    drwxrwxr-x  2 hadoop hadoop  4096 4月  27 22:23 ./
    drwxr-xr-x 10 hadoop dialout 4096 4月  27 22:23 ../
    -rw-r--r--  1 hadoop hadoop  4942 4月  27 22:23 capacity-scheduler.xml
    -rw-r--r--  1 hadoop hadoop   774 4月  27 22:23 core-site.xml
    -rw-r--r--  1 hadoop hadoop  9683 4月  27 22:23 hadoop-policy.xml
    -rw-r--r--  1 hadoop hadoop   775 4月  27 22:23 hdfs-site.xml
    -rw-r--r--  1 hadoop hadoop   620 4月  27 22:23 httpfs-site.xml
    -rw-r--r--  1 hadoop hadoop  3518 4月  27 22:23 kms-acls.xml
    -rw-r--r--  1 hadoop hadoop  5546 4月  27 22:23 kms-site.xml
    -rw-r--r--  1 hadoop hadoop   690 4月  27 22:23 yarn-site.xml
    hadoop@Ubuntu1604:/usr/local/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar grep ./input ./output 'dfs[a-z.]+'
    17/04/27 22:29:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    17/04/27 22:29:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    17/04/27 22:29:45 INFO input.FileInputFormat: Total input files to process : 8
    17/04/27 22:29:45 INFO mapreduce.JobSubmitter: number of splits:8
    ......
    ......
    ......
    17/04/27 22:29:49 INFO mapreduce.Job: Counters: 30
    	File System Counters
    		FILE: Number of bytes read=1273712
    		FILE: Number of bytes written=2504878
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    	Map-Reduce Framework
    		Map input records=1
    		Map output records=1
    		Map output bytes=17
    		Map output materialized bytes=25
    		Input split bytes=121
    		Combine input records=0
    		Combine output records=0
    		Reduce input groups=1
    		Reduce shuffle bytes=25
    		Reduce input records=1
    		Reduce output records=1
    		Spilled Records=2
    		Shuffled Maps =1
    		Failed Shuffles=0
    		Merged Map outputs=1
    		GC time elapsed (ms)=0
    		Total committed heap usage (bytes)=1054867456
    	Shuffle Errors
    		BAD_ID=0
    		CONNECTION=0
    		IO_ERROR=0
    		WRONG_LENGTH=0
    		WRONG_MAP=0
    		WRONG_REDUCE=0
    	File Input Format Counters 
    		Bytes Read=123
    	File Output Format Counters 
    		Bytes Written=23
    hadoop@Ubuntu1604:/usr/local/hadoop$ 
    hadoop@Ubuntu1604:/usr/local/hadoop$ ls -l ./output/
    总用量 4
    -rw-r--r-- 1 hadoop hadoop 11 4月  27 22:29 part-r-00000
    -rw-r--r-- 1 hadoop hadoop  0 4月  27 22:29 _SUCCESS
    hadoop@Ubuntu1604:/usr/local/hadoop$ 
    hadoop@Ubuntu1604:/usr/local/hadoop$ cat ./output/*
    1	dfsadmin
    hadoop@Ubuntu1604:/usr/local/hadoop$ 
    

    Hadoop 默认不会覆盖结果文件,再次运行前需要先将output目录删除。
    hadoop@Ubuntu1604:/usr/local/hadoop$ rm -rf ./output

    Hadoop附带示例

    hadoop@Ubuntu1604:/usr/local/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar
    An example program must be given as the first argument.
    Valid program names are:
      aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
      aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
      bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
      dbcount: An example job that count the pageview counts from a database.
      distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
      grep: A map/reduce program that counts the matches of a regex in the input.
      join: A job that effects a join over sorted, equally partitioned datasets
      multifilewc: A job that counts words from several files.
      pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
      pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
      randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
      randomwriter: A map/reduce program that writes 10GB of random data per node.
      secondarysort: An example defining a secondary sort to the reduce.
      sort: A map/reduce program that sorts the data written by the random writer.
      sudoku: A sudoku solver.
      teragen: Generate data for the terasort
      terasort: Run the terasort
      teravalidate: Checking results of terasort
      wordcount: A map/reduce program that counts the words in the input files.
      wordmean: A map/reduce program that counts the average length of the words in the input files.
      wordmedian: A map/reduce program that counts the median length of the words in the input files.
      wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
    hadoop@Ubuntu1604:/usr/local/hadoop$ 
    
  • 相关阅读:
    mac-常用命令
    react-redux-数据流
    ##通讯录阶段重要代码
    ##DAY15——UICollectionView
    ##DAY14——StoryBoard
    通讯录——单例
    通讯录——选择图片
    ##DAY13——可视化编程之XIB
    ##DAY12 UITableViewCell自定义
    ##DAY10 UITableView基础
  • 原文地址:https://www.cnblogs.com/anliven/p/6777777.html
Copyright © 2011-2022 走看看