zoukankan      html  css  js  c++  java
  • 大数据集群常见问题总结

      项目将近尾声,上线一切顺利,在开发过程中遇到了不少的问题,趁着空闲时间对项目中遇到的常见问题做一个总结,当作一个笔记,问题如下:

    1. java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20180124_log/src_20180124_log

    出现这种情况大多是结点断了,没有连接上。检查配置,重新启动服务即可。

       2.     java.lang.OutOfMemoryError: Java heap space

    出现这种异常,明显是jvm内存不够得原因,要修改所有的datanode的jvm内存大小。Java -Xms1024m -Xmx4096m

    一般jvm的最大内存使用应该为总内存大小的一半,我们使用的8G内存,所以设置为4096m,这一值可能依旧不是最优的值。

      3.     IO写操作出现问题

    0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:

    java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/

    172.16.100.165:5001remote=/172.16.100.165:50930]atorg.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)

    atorg.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)

    at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)

    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)

            at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)

            at java.lang.Thread.run(Thread.java:619)

    It seems there are many reasons that it can timeout, the example given in

    HADOOP-3831 is a slow reading client.

    连接超时,解决办法:在hadoop-site.xml中设置dfs.datanode.socket.write.timeout=0。

      4.解决hadoop OutOfMemoryError问题

    <property>

        <name>mapred.child.java.opts</name>

       <value>-Xmx800M -server</value>

    </property>

    或者:hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M

      5.     Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.

    JobClient.runJob(JobClient.java:1232) while indexing.

    when i use nutch1.0,get this error:

    Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.

    runJob(JobClient.java:1232) while indexing.

    解决办法:可以删除conf/log4j.properties,然后可以看到详细的错误报告。如出现的是out of memory,解决办法是在给运行主类org.apache.nutch.crawl.Crawl加上参数:-Xms64m -Xmx512m。

      6.     Namenode in safe mode

    解决方法:执行命令 bin/hadoop dfsadmin -safemode leave

      7.     java.net.NoRouteToHostException: No route to host

    解决方法:sudo /etc/init.d/iptables stop

      8.更改namenode后,在hive中运行select 依旧指向之前的namenode地址

    解决办法:将metastore中的之前出现的namenode地址全部更换为现有的namenode地址

      9.  ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ""PARTITIONS"" in Catalog "" Schema "". JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "org.jpox.autoCreateTables"

    原因:就是因为在 hive-default.xml 里把 org.jpox.fixedDatastore 设置成 true 了,应该把值设为false。

      10.INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010

    >  INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001

    >  INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

    Bad connect ack with firstBadLink 192.168.1.16:50010

    >  INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001

    >  INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

    Bad connect ack with firstBadLink 192.168.1.11:50010

    > INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001号的

    > INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

    Bad connect ack with firstBadLink 192.168.1.16:50010

    > INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001

    >  WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable

    to create new block.

    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)

    >at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996

    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)

    > WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001

    bad datanode[2] nodes == null

    >  WARN hdfs.DFSClient: Could not get block locations. Source file "/user/umer/8GB_input"

    - Aborting...

    > put: Bad connect ack with firstBadLink 192.168.1.16:50010

    解决方法:

    1) '/etc/init.d/iptables stop' -->stopped firewall

    2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux

      11.某次正常运行mapreduce实例时,抛出错误

    java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…

    atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735

    at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

    java.io.IOException: Could not get block locations. Aborting…at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)

    at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

    at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

    问题原因是linux机器打开了过多的文件导致。解决方法:用命令ulimit -n可以发现linux默认的文件打开数目为1024,修改/ect/security/limit.conf,增加hadoop soft 65535,再重新运行程序(最好所有的datanode都修改),问题解决。

      12.  bin/hadoop jps后报如下异常:

    Exception in thread "main" java.lang.NullPointerException

    atsun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127)atsun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133)at sun.tools.jps.Jps.main(Jps.java:45)

    解决方法:系统根目录/tmp文件夹被删除了。重新建立/tmp文件夹即可。

      13. bin/hive中出现 unable to  create log directory /tmp/

    解决办法:系统根目录/tmp文件夹被删除了。重新建立/tmp文件夹即可。

      14.MySQL报错

    [root@localhost mysql]# ./bin/mysqladmin -u root password '123456'

    ./bin/mysqladmin: connect to server at 'localhost' failed

    error: 'Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)'

    Check that mysqld is running and that the socket: '/tmp/mysql.sock' exists!

    解决不能通过mysql .sock连接MySQL问题 这个问题主要提示是,不能通过 '/tmp/mysql .sock'连到服务器,而php标准配置正是用过'/tmp/mysql .sock',但是一些mysql 安装方法 将 mysql .sock放在/var/lib/mysql .sock或者其他的什么地方,可以通过修改/etc/my.cnf文件来修正它,打开文件,可以看到如下的配置:

    [mysqld]

    socket=/var/lib/mysql.sock

    改一下就好了,但也会引起其他的问题,如mysql 程序连不上了,再加一点:

    [mysql]

    socket=/tmp/mysql.sock

    或者还可以通过修改my.ini中的配置来使用其他的mysql.sock来连,或者用这样的方法:

    ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock

    成功了,就是这样ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock

      15. NameNode不能切换

    less $BEH_HOME/logs/hadoop/hadoop-hadoop-zkfc-hadoop001.log

    日志文件中有错误提醒:Unable to create SSH session

    com.jcraft.jsch,JSchException:java.io.FileNotFoundException:~/.ssh/id_rsa (No such file or directory)

    解决办法:vim $HADOOP_HOME/etc/hadoop/HDFS-site.xml

    修改以下参数:

    <property>

        <name>dfs.ha.fencing.ssh.private-key-files</name>

        <value>/home/hadoop/.ssh/id_rsa</value>

        <final>true</final>

        <description/>

    </property>

     

  • 相关阅读:
    Java实现 LeetCode 767 重构字符串(ASCII的转换)
    Java实现 LeetCode 767 重构字符串(ASCII的转换)
    Java实现 LeetCode 767 重构字符串(ASCII的转换)
    Java实现 LeetCode 766 托普利茨矩阵(暴力)
    Java实现 LeetCode 766 托普利茨矩阵(暴力)
    Java实现 LeetCode 766 托普利茨矩阵(暴力)
    Java实现 LeetCode 765 情侣牵手(并查集 || 暴力)
    219. Contains Duplicate II
    217. Contains Duplicate
    135. Candy
  • 原文地址:https://www.cnblogs.com/10158wsj/p/8283098.html
Copyright © 2011-2022 走看看