zoukankan      html  css  js  c++  java
  • 《OD学hadoop》第二周0703

     hdfs可视化界面: http://beifeng-hadoop-01:50070/dfshealth.html#tab-overview

    yarn可视化界面: http://beifeng-hadoop-01:8088/cluster

    历史服务器可视化界面:http://beifeng-hadoop-01:19888/

     

    sbin/hadoop-daemon.sh start namenode

    sbin/hadoop-daemon.sh start datanode

    sbin/yarn-daemon.sh start resourcemanager

    sbin/yarn-daemon.sh start nodemanager

    sbin/mr-jobhistory-daemon.sh start historyserver

    sbin/hadoop-daemon.sh stop namenode

    sbin/hadoop-daemon.sh stop datanode

    sbin/yarn-daemon.sh stop resourcemanager

    sbin/yarn-daemon.sh stop nodemanager

    sbin/mr-jobhistory-daemon.sh stop historyserver

    一、替换本地库

    mv native/ bak_native

    tar -zxf native-**.gz -C /opt/modules/hadoop-2.5.0/lib

    二、SecondaryNameNode

    1、namenode 存储的是整个文件系统的元数据

    2、格式化之后会产生一个目录

    3、格式化之后还会产生文件初始的元数据

    bin/hdfs namenode -format

    4、元数据是放在内存中的

    5、在namenode没有启动之前,元数据存在本地系统文件中

    6、格式化之后,会生成一个fsimage文件

    准确的说是文件系统的镜像文件,存储元数据

    7、在HDFS上任何的操作,比如:上传,创建,会导致元数据发生改变

    8、记录HDFS上操作的行为记录,操作日志,记录这些信息

    edits logs 编辑日志文件

    9、  有了日志文件之后,namenode再次启动的时候首先会去读取fsimage

    再去读取编辑日志文件 edits,这样就不怕丢失了

    10、考虑有一个服务进程去定时的将fsimage和edits进行合并?

    11、SecondaryNameNode会去读取fsimage和eitds,读到内存中

    将内存中的东西,写到一个新的fsimage文件中,原来的两个文件就不需要了,接着再生成一个eitds文件,继续记录

    注意:读取fsimage速度很快,读取edits速度很慢

    12、SecondaryNameNode作用:

    (1)合并

    (2)减少一次namenode的启动时间

    13、配置

    hdfs-site.xml

    <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>beifeng-hadoop-01:50090</value>

    </property>

    14、启动命令:

    $ sbin/hadoop-daemon.sh start secondarynamenode

     http://beifeng-hadoop-01:50090/status.html

    fsimage: file:///opt/modules/hadoop-2.5.0/data/tmp/dfs/namesecondaryfsimage

    edits: file:///opt/modules/hadoop-2.5.0/data/tmp/dfs/namesecondaryeidts

    三、HDFS存储目录的配置

    四、配置文件、客户端、服务端

    1、Hadoop的配置文件有两类

    默认的,自定义的

    如果要提高集群性能,就可以通过修改配置来实现

    • Hadoop Common: The common utilities that support the other Hadoop modules.
    • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
    • Hadoop YARN: A framework for job scheduling and cluster resource management.
    • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

    每一个模块对应一个配置文件。

    2、 运行启动加载文件

    (1)第一步加载默认的配置文件

    (2)第二步加载自定义的配置文件

    • core-site.xml
    • hdfs-site.xml
    • mapred-site.xml
    • yarn-site.xml

    3、自定义配置文件优先级高于默认配置文件

    4、hdfs有四个配置文件

    • core-default.xml
    • hdfs-default.xml
    • core-site.xml
    • hdfs-site.xml

    5、服务端、namenode,datanode启动都会读取配置文件

    6、客户端 

    七、ssh无密钥登陆

    命令通过脚本执行,脚本通过ssh协议远程连接

    生成公钥:ssh-keygen -t rsa

    id_rsa

    id_rsa.pub

    ssh-copy-id beifeng-hadoop-01

    known_hosts

    authorized_keys

    八、以Hadoop2.x 为核心的生态系统

    计算框架: MapReduce

    计算框架容器:YARN 

    数据存储: HDFS

    操作系统:CentOS

    数据来源: 关系型数据库,日志文件    ======>  HDFS

    Sqoop:关系型数据库的表的数据<==>HDFS

    http://blog.csdn.net/yfkiss/article/details/8700480

    Flume:  实时抽取日志文件的数据,监控日志文件中的数据==>HDFS

    Zookeeper:分布式协调框架

    Hive: HiveQL语句,解析成mapreduce

    Pig: 流式编程语言

    实时查询:一张表,上亿的数据,快速检索,Bigtable->HBase(分布式数据库)

    Oozie: 是一种框架,它让我们可以把多个Map/Reduce作业组合到一个逻辑工作单元中

    CM:集成以上所述组件

    九、 HDFS架构

    1. namenode

    2. datanode

    复制的文件块是为了保证数据的安全性,

    适用于大数据集,GB或者TB.

    HDFS不适合的场景:

    大量小文件处理;

    多用户写入,任意修改文件;

    HDFS文件和目录元数据存储在fsimage二进制文件中

    edits

    fsimage操作目的:

    (1)从fsimage中读取HDFS中把偶错呢的每个目录和文件;

    (2)初始化每个目录和文件的元数据信息;

    (3)根据目录和文件的路径构造出整个命名空间在内存中的景象

    (4)如果是文件,

    十、HDFS Shell命令

    [beifeng@beifeng-hadoop-01 hadoop-2.5.0]$ bin/hdfs dfs
    Usage: hadoop fs [generic options]
            [-appendToFile <localsrc> ... <dst>]
            [-cat [-ignoreCrc] <src> ...]
            [-checksum <src> ...]
            [-chgrp [-R] GROUP PATH...]
            [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
            [-chown [-R] [OWNER][:[GROUP]] PATH...]
            [-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
            [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
            [-count [-q] <path> ...]
            [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
            [-createSnapshot <snapshotDir> [<snapshotName>]]
            [-deleteSnapshot <snapshotDir> <snapshotName>]
            [-df [-h] [<path> ...]]
            [-du [-s] [-h] <path> ...]
            [-expunge]
            [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
            [-getfacl [-R] <path>]
            [-getfattr [-R] {-n name | -d} [-e en] <path>]
            [-getmerge [-nl] <src> <localdst>]
            [-help [cmd ...]]
            [-ls [-d] [-h] [-R] [<path> ...]]
            [-mkdir [-p] <path> ...]
            [-moveFromLocal <localsrc> ... <dst>]
            [-moveToLocal <src> <localdst>]
            [-mv <src> ... <dst>]
            [-put [-f] [-p] <localsrc> ... <dst>]
            [-renameSnapshot <snapshotDir> <oldName> <newName>]
            [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
            [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
            [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
            [-setfattr {-n name [-v value] | -x name} <path>]
            [-setrep [-R] [-w] <rep> <path> ...]
            [-stat [format] <path> ...]
            [-tail [-f] <file>]
            [-test -[defsz] <path>]
            [-text [-ignoreCrc] <src> ...]
            [-touchz <path> ...]
            [-usage [cmd ...]]
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|jobtracker:port>    specify a job tracker
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    [beifeng@beifeng-hadoop-01 hadoop-2.5.0]$ bin/hdfs
    Usage: hdfs [--config confdir] COMMAND
           where COMMAND is one of:
      dfs                  run a filesystem command on the file systems supported in Hadoop.
      namenode -format     format the DFS filesystem
      secondarynamenode    run the DFS secondary namenode
      namenode             run the DFS namenode
      journalnode          run the DFS journalnode
      zkfc                 run the ZK Failover Controller daemon
      datanode             run a DFS datanode
      dfsadmin             run a DFS admin client
      haadmin              run a DFS HA admin client
      fsck                 run a DFS filesystem checking utility
      balancer             run a cluster balancing utility
      jmxget               get JMX exported values from NameNode or DataNode.
      oiv                  apply the offline fsimage viewer to an fsimage
      oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
      oev                  apply the offline edits viewer to an edits file
      fetchdt              fetch a delegation token from the NameNode
      getconf              get config values from configuration
      groups               get the groups which users belong to
      snapshotDiff         diff two snapshots of a directory or diff the
                           current directory contents with a snapshot
      lsSnapshottableDir   list all snapshottable dirs owned by the current user
                                                    Use -help to see options
      portmap              run a portmap service
      nfs3                 run an NFS version 3 gateway
      cacheadmin           configure the HDFS cache
    
    Most commands print help when invoked w/o parameters.
    [beifeng@beifeng-hadoop-01 hadoop-2.5.0]$ bin/hdfs dfsadmin
    Usage: java DFSAdmin
    Note: Administrative commands can only be run as the HDFS superuser.
               [-report]
               [-safemode enter | leave | get | wait]
               [-allowSnapshot <snapshotDir>]
               [-saveNamespace]
               [-rollEdits]
               [-restoreFailedStorage true|false|check]
               [-refreshNodes]
               [-finalizeUpgrade]
               [-rollingUpgrade [<query|prepare|finalize>]]
               [-metasave filename]
               [-refreshServiceAcl]
               [-refreshUserToGroupsMappings]
               [-refreshSuperUserGroupsConfiguration]
               [-refreshCallQueue]
               [-refresh]
               [-printTopology]
               [-refreshNamenodes datanodehost:port]
               [-deleteBlockPool datanode-host:port blockpoolId [force]]
               [-setQuota <quota> <dirname>...<dirname>]
               [-clrQuota <dirname>...<dirname>]
               [-setSpaceQuota <quota> <dirname>...<dirname>]
               [-clrSpaceQuota <dirname>...<dirname>]
               [-setBalancerBandwidth <bandwidth in bytes per second>]
               [-fetchImage <local directory>]
               [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
               [-getDatanodeInfo <datanode_host:ipc_port>]
               [-help [cmd]]
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|jobtracker:port>    specify a job tracker
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    
    [beifeng@beifeng-hadoop-01 hadoop-2.5.0]$
    [beifeng@beifeng-hadoop-01 hadoop-2.5.0]$ bin/hdfs dfsadmin
    Usage: java DFSAdmin
    Note: Administrative commands can only be run as the HDFS superuser.
               [-report]
               [-safemode enter | leave | get | wait]
               [-allowSnapshot <snapshotDir>]
               [-disallowSnapshot <snapshotDir>]
               [-saveNamespace]
               [-rollEdits]
               [-restoreFailedStorage true|false|check]
               [-refreshNodes]
               [-finalizeUpgrade]
               [-rollingUpgrade [<query|prepare|finalize>]]
               [-metasave filename]
               [-refreshServiceAcl]
               [-refreshUserToGroupsMappings]
               [-refreshSuperUserGroupsConfiguration]
               [-refreshCallQueue]
               [-refresh]
               [-printTopology]
               [-refreshNamenodes datanodehost:port]
               [-deleteBlockPool datanode-host:port blockpoolId [force]]
               [-setQuota <quota> <dirname>...<dirname>]
               [-clrQuota <dirname>...<dirname>]
               [-setSpaceQuota <quota> <dirname>...<dirname>]
               [-clrSpaceQuota <dirname>...<dirname>]
               [-setBalancerBandwidth <bandwidth in bytes per second>]
               [-fetchImage <local directory>]
               [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
               [-getDatanodeInfo <datanode_host:ipc_port>]
               [-help [cmd]]
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|jobtracker:port>    specify a job tracker
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]

    十一、安全模式

    [beifeng@beifeng-hadoop-01 hadoop-2.5.0]$ bin/hdfs dfsadmin -safemode
    Usage: java DFSAdmin [-safemode enter | leave | get | wait]

    安全模型下,是不能对文件进行操作的。

    bin/hdfs dfsadmin -safemode get

    bin/hdfs dfsadmin -safemode enter

    bin/hdfs dfsadmin -safemode leave

    十二、安装eclipse和maven

    十三、Yarn

    资源管理系统: 资源分配和资源隔离

    十四、HDFS API

    Maven仓库常用地址   

    http://hadoop.apache.org/docs/r2.5.2/api/index.html

    1. 获取hdfs文件系统

     Configuration conf = new Configuration();

    FileSystem fileSystem = FileSystem.get(conf);

    syso(fileSystem);

    十五、以YARN为核心的生态系统

    1、hostonworks  BATCH(MapReduce)

    2、运行在YARN上的服务:长服务、短服务

    3、apache silder

    4、solar

  • 相关阅读:
    openstack命令行
    Hub, bridge, switch, router, gateway的区别
    openstack奠基篇:devstack (liberty)于centos 7安装
    git常用命令备忘
    如何利用gatling创建一个性能测试例
    如何通过源码生成Gatling可执行工具
    博客园的第一篇博文
    Spring Oauth2 with JWT Sample
    你首先是一个人,然后你才是程序员。
    Maven——快速入门手册(学习记录) ****
  • 原文地址:https://www.cnblogs.com/yeahwell/p/5636891.html
Copyright © 2011-2022 走看看