zoukankan      html  css  js  c++  java
  • 重启hdfs集群的时候,报大量的gc问题。

    问题现象:

    2019-03-11 12:30:52,174 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7653ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=7692ms
    2019-03-11 12:31:00,573 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7899ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=7951ms
    2019-03-11 12:31:08,952 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7878ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=7937ms
    2019-03-11 12:31:17,405 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7951ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=8037ms
    2019-03-11 12:31:26,611 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 8705ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=8835ms
    2019-03-11 12:31:35,009 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7897ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=8083ms
    2019-03-11 12:31:43,806 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 8296ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=8416ms
    2019-03-11 12:31:52,317 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 8010ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=8163ms
    2019-03-11 12:32:00,680 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7862ms
    

      

    gc一段时间后出现:

    2019-03-11 12:27:15,820 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
    java.lang.OutOfMemoryError: Java heap space
            at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
            at java.lang.StringCoding.encode(StringCoding.java:344)
            at java.lang.String.getBytes(String.java:918)
            at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
            at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
            at java.io.File.exists(File.java:819)
            at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1282)
            at sun.misc.URLClassPath.getResource(URLClassPath.java:239)
            at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
            at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
            at java.security.AccessController.doPrivileged(Native Method)
            at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            at org.apache.hadoop.hdfs.server.namenode.JournalSet.close(JournalSet.java:244)
            at org.apache.hadoop.hdfs.server.namenode.FSEditLog.close(FSEditLog.java:400)
            at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.close(FSEditLogAsync.java:112)
            at org.apache.hadoop.hdfs.server.namenode.FSImage.close(FSImage.java:1408)
            at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1079)
            at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:666)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:728)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
    2019-03-11 12:27:15,827 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.lang.OutOfMemoryError: Java heap space
    2019-03-11 12:27:15,830 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    

      

    或者出现下面的错误:

    2019-03-11 11:09:16,124 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
    java.lang.OutOfMemoryError: GC overhead limit exceeded
            at com.google.protobuf.CodedInputStream.<init>(CodedInputStream.java:573)
            at com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55)
            at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:199)
            at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
            at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
            at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
            at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
            at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeSection$INode.parseDelimitedFrom(FsImageProto.java:10867)
            at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:233)
            at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:250)
            at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:176)
            at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
            at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:937)
            at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:921)
            at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:794)
            at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:724)
            at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:322)
            at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1052)
            at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:666)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:728)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:932)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673)
            at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
    2019-03-11 11:09:16,127 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.lang.OutOfMemoryError: GC overhead limit exceeded
    

      

    解决:

    打开hadoop-env.sh文件,找到HADOOP_HEAPSIZE=  和HADOOP_NAMENODE_INIT_HEAPSIZE=  调整这两个参数,具体调整多少,视情况而定,默认是1000m,也就是一个g,我这里调整如下:

    export HADOOP_HEAPSIZE=32000  
    export HADOOP_NAMENODE_INIT_HEAPSIZE=16000                  这两个参数去掉前面的#号,两台namenode节点都要调整
    

      

    接着重新启动hdfs,如果还不行,打开hadoop-env.sh文件,找到HADOOP_NAMENODE_OPTS

    export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender}  $HADOOP_NAMENODE_OPTS"    ----这是系统默认值
    调整如下:
    export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender}  -Xms6000m -Xmx6000m -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -XX:SoftRefLRUPolicyMSPerMB=0 $HADOOP_NAMENODE_OPTS" 
    

      

    接着重新启动hdfs,如果还是报上面的错误,那就继续调大上面

    HADOOP_HEAPSIZE和
    HADOOP_NAMENODE_INIT_HEAPSIZE  的值
  • 相关阅读:
    crontab 定时任务设置
    kafka如何保证消息得顺序性
    Spark Streaming和Kafka整合是如何保证数据零丢失
    docker学习之路(安装、使用)
    linux根文件系统 /etc/resolv.conf 文件详解
    Navicat Premium 12连接MySQL数据库出现Authentication plugin 'caching_sha2_password' cannot be loaded的解决方案
    window10 安装Mysql 8.0.17以及忘记密码重置密码
    ubantu安装mysql (解决配置密码问题)
    Navicat Premium 12 connection is being used以及查询sql文件保存位置
    离线安装PM2
  • 原文地址:https://www.cnblogs.com/yjt1993/p/10510113.html
Copyright © 2011-2022 走看看