zoukankan      html  css  js  c++  java
  • 一、Hadoop课程

    Hadoop课程

    2.1 初始设置

    初始环境这里平台已设置好,同学们需要了解一下如何设置。

    1. 修改主机名,以master节点为例

    [ec2-user@ip-172-31-32-47 ~]$ sudo vi /etc/hostname 
    #在里面删去所有内容,在首行添加 master作为自己新的主机名。
    #重启虚拟机,使配置生效
    [ec2-user@ip-172-31-32-47 ~]$ sudo reboot
    

    2. 修改hosts映射,以master节点为例

    #查看所有节点的IP
    [ec2-user@master ~]$ ifconfig
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
            inet 172.31.32.47  netmask 255.255.240.0  broadcast 172.31.47.255
            inet6 fe80::8b2:80ff:fe01:e5c2  prefixlen 64  scopeid 0x20<link>
            ether 0a:b2:80:01:e5:c2  txqueuelen 1000  (Ethernet)
            RX packets 3461  bytes 687720 (671.6 KiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 3262  bytes 544011 (531.2 KiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
            inet 127.0.0.1  netmask 255.0.0.0
            inet6 ::1  prefixlen 128  scopeid 0x10<host>
            loop  txqueuelen 1000  (Local Loopback)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    [ec2-user@slave1 ~]$ ifconfig
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
            inet 172.31.36.81  netmask 255.255.240.0  broadcast 172.31.47.255
            inet6 fe80::87d:36ff:fe72:bc0c  prefixlen 64  scopeid 0x20<link>
            ether 0a:7d:36:72:bc:0c  txqueuelen 1000  (Ethernet)
            RX packets 2195  bytes 543199 (530.4 KiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 2178  bytes 361053 (352.5 KiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
            inet 127.0.0.1  netmask 255.0.0.0
            inet6 ::1  prefixlen 128  scopeid 0x10<host>
            loop  txqueuelen 1000  (Local Loopback)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    [ec2-user@slave2 ~]$ ifconfig
    eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
            inet 172.31.46.142  netmask 255.255.240.0  broadcast 172.31.47.255
            inet6 fe80::850:68ff:fe8c:6c5e  prefixlen 64  scopeid 0x20<link>
            ether 0a:50:68:8c:6c:5e  txqueuelen 1000  (Ethernet)
            RX packets 2284  bytes 547630 (534.7 KiB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 2241  bytes 375782 (366.9 KiB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
            inet 127.0.0.1  netmask 255.0.0.0
            inet6 ::1  prefixlen 128  scopeid 0x10<host>
            loop  txqueuelen 1000  (Local Loopback)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    #以IP 主机名格式写道hosts文件中
    [ec2-user@master ~]$ sudo vi /etc/hosts
    #查看修改结果,注意:所有节点都要修改hosts文件
    [ec2-user@master ~]$ cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost6 localhost6.localdomain6
    172.31.32.47 master
    172.31.36.81 slave1
    172.31.46.142 slave2
    

    2.2 安装Java环境

    我们先来了解一下为什么要安装JDK,JDK是 Java 语言的软件开发工具包,提供给程序员使用。主要用于移动设备、嵌入式设备上的java应用程序。JDK是整个java开发的核心,它包含了JAVA的运行环境(JVM+Java系统类库)和JAVA工具。

    1. 解压jdk1.8

    #将jdk解压到指定路径
    [ec2-user@master ~]$ sudo tar -zxvf hadoop/jdk-8u144-linux-x64.tar.gz -C /usr/local/src/
    #查看目标目录下是否有jdk解压包
    [ec2-user@master ~]$ ls /usr/local/src/
    jdk1.8.0_144
    

    2. 重命名为jdk

    [ec2-user@master ~]$ cd /usr/local/src/
    [ec2-user@master src]$ ls
    jdk1.8.0_144
    [ec2-user@master src]$ sudo mv jdk1.8.0_144/ jdk
    [ec2-user@master src]$ ls
    jdk
    

    3. 添加环境变量(所有节点)–以master为例

    [ec2-user@master src]$ sudo vi /etc/profile
    #在文件末尾添加如下内容
    export JAVA_HOME=/usr/local/src/jdk
    export PATH=$PATH:$JAVA_HOME/bin
    #刷新环境变量
    [ec2-user@master src]$ source /etc/profile
    

    4. 查看jdk版本,验证是否安装成功

    [ec2-user@master src]$ java -version
    java version "1.8.0_144"
    Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
    

    5. 修改权限(所有节点,以master为例)

    因为我们的实验是采用普通用户执行的,但是/usr/local/src/目录需要root权限才能操作,如果不修改权限,在分发文件时会显示权限不足。

    [ec2-user@master ~]$ ll /usr/local/
    total 0
    drwxr-xr-x 2 root root  6 Apr  9  2019 bin
    drwxr-xr-x 2 root root  6 Apr  9  2019 etc
    drwxr-xr-x 2 root root  6 Apr  9  2019 games
    drwxr-xr-x 2 root root  6 Apr  9  2019 include
    drwxr-xr-x 2 root root  6 Apr  9  2019 lib
    drwxr-xr-x 2 root root  6 Apr  9  2019 lib64
    drwxr-xr-x 2 root root  6 Apr  9  2019 libexec
    drwxr-xr-x 2 root root  6 Apr  9  2019 sbin
    drwxr-xr-x 5 root root 49 Mar  4 20:51 share
    drwxr-xr-x 4 root root 31 Mar 19 06:54 src
    #把/usr/local/src/目录和子文件夹的所属用户以及所属组设置为ec2-user用户
    [ec2-user@master ~]$ sudo chown -R ec2-user:ec2-user /usr/local/src/
    #再次查看/usr/local/src/目录所属用户以及所属组
    [ec2-user@master ~]$ ll /usr/local/
    total 0
    drwxr-xr-x 2 root     root      6 Apr  9  2019 bin
    drwxr-xr-x 2 root     root      6 Apr  9  2019 etc
    drwxr-xr-x 2 root     root      6 Apr  9  2019 games
    drwxr-xr-x 2 root     root      6 Apr  9  2019 include
    drwxr-xr-x 2 root     root      6 Apr  9  2019 lib
    drwxr-xr-x 2 root     root      6 Apr  9  2019 lib64
    drwxr-xr-x 2 root     root      6 Apr  9  2019 libexec
    drwxr-xr-x 2 root     root      6 Apr  9  2019 sbin
    drwxr-xr-x 5 root     root     49 Mar  4 20:51 share
    drwxr-xr-x 4 ec2-user ec2-user 31 Mar 19 06:54 src
    

    6. 远程分发到其他节点

    [ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave1:/usr/local/src/
    [ec2-user@master ~]$ scp -r /usr/local/src/jdk/ slave2:/usr/local/src/
    

    2.3 安装Hadoop集群

    1. 解压

    [ec2-user@master src]$tar -zxvf /home/ec2-user/hadoop/hadoop-2.9.1.tar.gz -C /usr/local/src/
    [ec2-user@master src]$ ls
    hadoop-2.9.1  jdk
    

    2. 重命名为Hadoop

    [ec2-user@master src]$ pwd
    /usr/local/src
    [ec2-user@master src]$ mv hadoop-2.9.1/ hadoop
    [ec2-user@master src]$ ls
    hadoop  jdk
    

    3. 添加环境变量(所有节点)–以master为例

    [ec2-user@master ~]$ sudo vi /etc/profile
    #在文件末尾添加如下内容
    export HADOOP_HOME=/usr/local/src/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    export HADOOP_CLASSPATH=/usr/local/src/hadoop/lib/*
    #刷新环境变量
    [ec2-user@master ~]$ source /etc/profile
    

    4. 修改core-site.xml配置文件

    [ec2-user@master ~]$ cd /usr/local/src/hadoop/etc/hadoop/
    [ec2-user@master hadoop]$ vi core-site.xml 
    

    <configuration></configuration>标签中添加如下内容:

    	<property>
    		<name>fs.defaultFS</name>
    		<value>hdfs://master:9000</value>
    	</property>
    
    	<property>
    		<name>hadoop.tmp.dir</name>
    		<value>/usr/local/src/hadoop/tmp</value>
    	</property>
    

    5. 修改hdfs-site.xml配置文件

    [ec2-user@master hadoop]$ pwd
    /usr/local/src/hadoop/etc/hadoop
    [ec2-user@master hadoop]$ vi hdfs-site.xml 
    

    <configuration></configuration>标签中添加如下内容:

    <property>
    	<name>dfs.replication</name>
    	<value>3</value>
    </property>
    
    <!-- 指定Hadoop辅助名称节点主机配s置 -->
    <property>
    	<name>dfs.namenode.secondary.http-address</name>
    	<value>slave1:50090</value>
    </property>
    
    <property>
    	<name>dfs.namenode.name.dir</name>
    	<value>/usr/local/src/hadoop/tmp/dfs/name</value>
    </property>
    
    <property>
    	<name>dfs.datanode.data.dir</name>
    	<value>/usr/local/src/hadoop/tmp/dfs/data</value>
    </property>
    
    <property>
    	<name>dfs.webhdfs.enabled</name>
    	<value>true</value>
    </property>
    

    6. 修改yarn-site.xml配置文件

    [ec2-user@master hadoop]$ pwd
    /usr/local/src/hadoop/etc/hadoop
    [ec2-user@master hadoop]$ vi yarn-site.xml 
    

    <configuration></configuration>标签中添加如下内容:

    	<property>
    		<name>yarn.nodemanager.aux-services</name>
    		<value>mapreduce_shuffle</value>
    	</property>
    
    	<property>
    		<name>yarn.resourcemanager.hostname</name>
    		<value>master</value>
    	</property>
    
    	<property>
    		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
    	</property>
    

    7. 修改mapred-site.xml配置文件

    [ec2-user@master hadoop]$ pwd
    /usr/local/src/hadoop/etc/hadoop
    [ec2-user@master hadoop]$ cp mapred-site.xml.template mapred-site.xml
    [ec2-user@master hadoop]$ vi mapred-site.xml
    

    <configuration></configuration>标签中添加如下内容:

        <property>
    		<name>mapreduce.framework.name</name>
    		<value>yarn</value>
    	</property>
    

    8. 修改hadoop-env.sh配置文件

    [ec2-user@master hadoop]$ pwd
    /usr/local/src/hadoop/etc/hadoop
    [ec2-user@master hadoop]$ vi hadoop-env.sh 
    

    配置jdk路径:

    export JAVA_HOME=/usr/local/src/jdk
    

    注意:要根据自己路径来修改。

    9. 修改slaves配置文件

    [ec2-user@master hadoop]$ pwd
    /usr/local/src/hadoop/etc/hadoop
    [ec2-user@master hadoop]$ vi slaves 
    [ec2-user@master hadoop]$ cat slaves 
    slave1
    slave2
    

    10. 远程分发到其他节点

    [ec2-user@master hadoop]$ cd /usr/local/src/
    [ec2-user@master src]$ scp -r hadoop/ slave1:/usr/local/src/
    [ec2-user@master src]$ scp -r hadoop/ slave2:/usr/local/src/
    

    11. 在namenode节点格式化namenode

    [ec2-user@master src]$ hdfs namenode -format
    

    img

    12. 启动hadoop集群

    #在namenode节点启动Hadoop集群
    [ec2-user@master src]$ start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [master]
    The authenticity of host 'master (172.31.32.47)' can't be established.
    ECDSA key fingerprint is SHA256:Tueyo4xR8lsxmdA11GlXAO3w44n6T75dYHe9flk8Y70.
    ECDSA key fingerprint is MD5:22:9b:6d:f2:f3:11:a2:6d:4d:dd:ec:25:56:3b:2d:b2.
    Are you sure you want to continue connecting (yes/no)? yes
    master: Warning: Permanently added 'master,172.31.32.47' (ECDSA) to the list of known hosts.
    master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-namenode-master.out
    slave2: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave2.out
    slave1: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-datanode-slave1.out
    Starting secondary namenodes [slave1]
    slave1: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-ec2-user-secondarynamenode-slave1.out
    starting yarn daemons
    starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-resourcemanager-master.out
    slave1: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave1.out
    slave2: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-ec2-user-nodemanager-slave2.out
    #jps查看进程
    [ec2-user@master src]$ jps
    31522 Jps
    31256 ResourceManager
    30973 NameNode
    [ec2-user@master src]$ ssh slave1
    Last login: Fri Mar 19 06:15:47 2021 from 219.153.251.37
    
           __|  __|_  )
           _|  (     /   Amazon Linux 2 AMI
          ___|\___|___|
    
    https://aws.amazon.com/amazon-linux-2/
    [ec2-user@slave1 ~]$ jps
    29424 DataNode
    29635 NodeManager
    29544 SecondaryNameNode
    29789 Jps
    [ec2-user@slave1 ~]$ ssh slave2
    Last login: Fri Mar 19 06:15:57 2021 from 219.153.251.37
    
           __|  __|_  )
           _|  (     /   Amazon Linux 2 AMI
          ___|\___|___|
    
    https://aws.amazon.com/amazon-linux-2/
    [ec2-user@slave2 ~]$ jps
    29633 Jps
    29479 NodeManager
    29354 DataNode
    

    13. 查看hadoop集群状态

    [ec2-user@master ~]$ hdfs dfsadmin -report
    Configured Capacity: 17154662400 (15.98 GB)
    Present Capacity: 11389693952 (10.61 GB)
    DFS Remaining: 11389685760 (10.61 GB)
    DFS Used: 8192 (8 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    Pending deletion blocks: 0
    
    -------------------------------------------------
    Live datanodes (2):
    
    Name: 172.31.36.81:50010 (slave1)
    Hostname: slave1
    Decommission Status : Normal
    Configured Capacity: 8577331200 (7.99 GB)
    DFS Used: 4096 (4 KB)
    Non DFS Used: 2882510848 (2.68 GB)
    DFS Remaining: 5694816256 (5.30 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 66.39%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Mar 19 07:45:06 UTC 2021
    Last Block Report: Fri Mar 19 07:41:00 UTC 2021
    
    
    Name: 172.31.46.142:50010 (slave2)
    Hostname: slave2
    Decommission Status : Normal
    Configured Capacity: 8577331200 (7.99 GB)
    DFS Used: 4096 (4 KB)
    Non DFS Used: 2882457600 (2.68 GB)
    DFS Remaining: 5694869504 (5.30 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 66.39%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Mar 19 07:45:06 UTC 2021
    Last Block Report: Fri Mar 19 07:41:00 UTC 2021
    

    2.4 安装Hive

    1. 安装MySQL

    在安装hive前我们需要先安装MySQL数据库,用来存储hive的元数据。

    1)下载mysql源安装包

    [ec2-user@master ~]$ sudo wget http://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
    

    2)安装mysql源

    [ec2-user@master ~]$ sudo yum localinstall mysql57-community-release-el7-8.noarch.rpm
    

    3)检查mysql源是否安装成功

    [ec2-user@master ~]$ sudo yum repolist enabled | grep "mysql.*.community.*"
    mysql-connectors-community/x86_64     MySQL Connectors Community          146+39
    mysql-tools-community/x86_64          MySQL Tools Community                  123
    mysql57-community/x86_64              MySQL 5.7 Community Server             484
    

    4)安装MySQL

    [ec2-user@master ~]$ sudo yum install mysql-community-server
    

    5)启动MySQL服务并查看运行状态

    [ec2-user@master ~]$ sudo systemctl start mysqld
    [ec2-user@master ~]$ sudo systemctl status mysqld
    ● mysqld.service - MySQL Server
       Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
       Active: active (running) since Fri 2021-03-19 07:56:43 UTC; 1s ago
         Docs: man:mysqld(8)
               http://dev.mysql.com/doc/refman/en/using-systemd.html
      Process: 31978 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
      Process: 31927 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
     Main PID: 31981 (mysqld)
       CGroup: /system.slice/mysqld.service
               └─31981 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
    
    Mar 19 07:56:39 master systemd[1]: Starting MySQL Server...
    Mar 19 07:56:43 master systemd[1]: Started MySQL Server.
    

    6)查看mysql初始密码

    [ec2-user@master ~]$ sudo grep "password" /var/log/mysqld.log
    2021-03-19T07:56:41.030922Z 1 [Note] A temporary password is generated for root@localhost: v=OKXu0laSo;
    

    7)修改mysql登陆密码

    先把之前我们查看到的初始密码复制下来,在进入mysql需要输入密码时粘贴下来,回车,就可以进入MySQL命令行。

    [ec2-user@master ~]$ sudo mysql -uroot -p
    Enter password: 
    Welcome to the MySQL monitor.  Commands end with ; or g.
    Your MySQL connection id is 9
    Server version: 5.7.33
    
    Copyright (c) 2000, 2021, Oracle and/or its affiliates.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
    
    mysql> 
    

    修改密码,设置MySQL登陆密码为1234:

    mysql> set password for 'root'@'localhost'=password('1234');
    ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
    

    由上可知,新密码设置的时候如果设置的过于简单会报错。

    这时我们需要修改密码规则:

    mysql> set global validate_password_policy=0;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> set global validate_password_length=1;
    Query OK, 0 rows affected (0.00 sec)
    

    重新设置密码:

    mysql> set password for 'root'@'localhost'=password('1234');
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    

    8) 设置远程登陆

    先退出MySQL,以新密码登陆MySQL。

    [ec2-user@master ~]$ mysql -uroot -p1234
    mysql: [Warning] Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or g.
    Your MySQL connection id is 10
    Server version: 5.7.33 MySQL Community Server (GPL)
    
    Copyright (c) 2000, 2021, Oracle and/or its affiliates.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
    
    mysql> 
    

    创建用户:

    mysql> create user 'root'@'172.%.%.%' identified by '1234';
    Query OK, 0 rows affected (0.00 sec)
    

    允许远程连接:

    mysql> grant all privileges on *.* to 'root'@'172.%.%.%' with grant option;
    Query OK, 0 rows affected (0.00 sec)
    

    刷新权限:

    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    

    至此,MySQL安装成功。

    2. 把hive解压到指定位置

    [ec2-user@master ~]$ tar -zxvf hadoop/apache-hive-1.1.0-bin.tar.gz -C /usr/local/src/
    

    3. 重命名

    [ec2-user@master ~]$ cd /usr/local/src/
    [ec2-user@master src]$ ls
    apache-hive-1.1.0-bin  hadoop  jdk
    [ec2-user@master src]$ mv apache-hive-1.1.0-bin/ hive
    [ec2-user@master src]$ ls
    hadoop  hive  jdk
    

    4. 添加环境变量

    [ec2-user@master src]$ sudo vi /etc/profile
    #在文件末尾添加如下内容
    export HIVE_HOME=/usr/local/src/hive
    export PATH=$PATH:$HIVE_HOME/bin
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/src/hive/lib/*
    #刷新环境变量
    [ec2-user@master src]$ source /etc/profile
    

    5. 修改hive-site.xml配置文件

    [ec2-user@master src]$ cd hive/conf/
    #创建hive-site.xml文件
    [ec2-user@master conf]$ touch hive-site.xml
    [ec2-user@master conf]$ vi hive-site.xml 
    

    在hive-site.xml文件中添加如下内容:

    <configuration>
    <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/user/hive/warehouse</value>
    </property>
    
    <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    </property>
    
    <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
    </property>
    
    <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>root</value>
    </property>
    
    <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>1234</value>
    </property>
    </configuration>
    

    注意:MySQL密码要改成自己设置的密码。

    6. 修改hive-env.sh配置文件

    [ec2-user@master conf]$ pwd
    /usr/local/src/hive/conf
    [ec2-user@master conf]$ cp hive-env.sh.template hive-env.sh
    [ec2-user@master conf]$ vi hive-env.sh
    #在里面添加如下配置
    export HADOOP_HOME=/usr/local/src/hadoop
    export HIVE_CONF_DIR=/usr/local/src/hive/conf
    

    7. 添加MySQL连接包

    把MySQL驱动放到hive的lib目录下。

    [ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $HIVE_HOME/lib
    [ec2-user@master conf]$ ls $HIVE_HOME/lib/mysql-connector-java-5.1.44-bin.jar 
    /usr/local/src/hive/lib/mysql-connector-java-5.1.44-bin.jar
    

    8. 启动Hadoop集群(hive需要hdfs分布式文件系统存储来数据)

    如果Hadoop已启动,则不需要执行这一步。

    start-all.sh
    

    9. 初始化MySQL中的hive的数据库

    [ec2-user@master conf]$ schematool -dbType mysql -initSchema
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    Metastore connection URL:	 jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false
    Metastore Connection Driver :	 com.mysql.jdbc.Driver
    Metastore connection User:	 root
    Starting metastore schema initialization to 1.1.0
    Initialization script hive-schema-1.1.0.mysql.sql
    Initialization script completed
    schemaTool completed
    

    10. 启动hive并测试

    [ec2-user@master conf]$ hive
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-1.1.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-1.1.0.jar!/hive-log4j.properties
    hive> show databases;
    OK
    default
    Time taken: 0.587 seconds, Fetched: 1 row(s)
    

    至此,hive安装成功。

    2.5 安装Sqoop

    1. 解压

    [ec2-user@master ~]$ tar -zxvf hadoop/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /usr/local/src/
    

    2. 重命名为sqoop

    [ec2-user@master ~]$ cd /usr/local/src/
    [ec2-user@master src]$ ls
    hadoop  hive  jdk  sqoop-1.4.7.bin__hadoop-2.6.0
    [ec2-user@master src]$ mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop
    [ec2-user@master src]$ ls
    hadoop  hive  jdk  sqoop
    

    3. 添加环境变量

    [ec2-user@master src]$ sudo vi /etc/profile
    #在里面添加如下代码
    export SQOOP_HOME=/usr/local/src/sqoop
    export PATH=$PATH:$SQOOP_HOME/bin
    #刷新环境变量
    [ec2-user@master src]$ source /etc/profile
    

    4. 修改sqoop-env.sh配置文件

    [ec2-user@master src]$ cd sqoop/conf/
    [ec2-user@master conf]$ mv sqoop-env-template.sh sqoop-env.sh
    [ec2-user@master conf]$ vi sqoop-env.sh 
    

    在里面修改一下配置项,根据自己的环境来修改:

    #Set path to where bin/hadoop is available
    export HADOOP_COMMON_HOME=/usr/local/src/hadoop
    
    #Set path to where hadoop-*-core.jar is available
    export HADOOP_MAPRED_HOME=/usr/local/src/hadoop
    
    #Set the path to where bin/hive is available
    export HIVE_HOME=/usr/local/src/hive
    

    5. 把mysql驱动放到sqoop的lib目录下

    [ec2-user@master conf]$ cp /home/ec2-user/hadoop/mysql-connector-java-5.1.44-bin.jar $SQOOP_HOME/lib[ec2-user@master conf]$ ls $SQOOP_HOME/lib/mysql-connector-java-5.1.44-bin.jar 
    /usr/local/src/sqoop/lib/mysql-connector-java-5.1.44-bin.jar
    

    6. 验证sqoop是否配置成功

    [ec2-user@master conf]$ sqoop help
    Warning: /usr/local/src/sqoop/../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /usr/local/src/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /usr/local/src/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /usr/local/src/sqoop/../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    21/03/19 08:53:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    usage: sqoop COMMAND [ARGS]
    
    Available commands:
      codegen            Generate code to interact with database records
      create-hive-table  Import a table definition into Hive
      eval               Evaluate a SQL statement and display the results
      export             Export an HDFS directory to a database table
      help               List available commands
      import             Import a table from a database to HDFS
      import-all-tables  Import tables from a database to HDFS
      import-mainframe   Import datasets from a mainframe server to HDFS
      job                Work with saved jobs
      list-databases     List available databases on a server
      list-tables        List available tables in a database
      merge              Merge results of incremental imports
      metastore          Run a standalone Sqoop metastore
      version            Display version information
    
    See 'sqoop help COMMAND' for information on a specific command.
    
    小石小石摩西摩西的学习笔记,欢迎提问,欢迎指正!!!
  • 相关阅读:
    linux 网卡配置详情
    linux ftp 添加用户及权限管理
    mysql 权限管理
    linux ftp 安装及相关命令
    linux find 命令
    linux yum 安装及卸载
    linux svn 安装
    cssText方式写入css
    addLoadEvent
    mobile体验效果:增加点击后反馈
  • 原文地址:https://www.cnblogs.com/shijingwen/p/15026169.html
Copyright © 2011-2022 走看看