zoukankan      html  css  js  c++  java
  • 大数据集群常见问题总结

      项目将近尾声,上线一切顺利,在开发过程中遇到了不少的问题,趁着空闲时间对项目中遇到的常见问题做一个总结,当作一个笔记,问题如下:

    1. java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20180124_log/src_20180124_log

    出现这种情况大多是结点断了,没有连接上。检查配置,重新启动服务即可。

       2.     java.lang.OutOfMemoryError: Java heap space

    出现这种异常,明显是jvm内存不够得原因,要修改所有的datanode的jvm内存大小。Java -Xms1024m -Xmx4096m

    一般jvm的最大内存使用应该为总内存大小的一半,我们使用的8G内存,所以设置为4096m,这一值可能依旧不是最优的值。

      3.     IO写操作出现问题

    0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:

    java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/

    172.16.100.165:5001remote=/172.16.100.165:50930]atorg.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)

    atorg.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)

    at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)

    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)

            at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)

            at java.lang.Thread.run(Thread.java:619)

    It seems there are many reasons that it can timeout, the example given in

    HADOOP-3831 is a slow reading client.

    连接超时,解决办法:在hadoop-site.xml中设置dfs.datanode.socket.write.timeout=0。

      4.解决hadoop OutOfMemoryError问题

    <property>

        <name>mapred.child.java.opts</name>

       <value>-Xmx800M -server</value>

    </property>

    或者:hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M

      5.     Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.

    JobClient.runJob(JobClient.java:1232) while indexing.

    when i use nutch1.0,get this error:

    Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.

    runJob(JobClient.java:1232) while indexing.

    解决办法:可以删除conf/log4j.properties,然后可以看到详细的错误报告。如出现的是out of memory,解决办法是在给运行主类org.apache.nutch.crawl.Crawl加上参数:-Xms64m -Xmx512m。

      6.     Namenode in safe mode

    解决方法:执行命令 bin/hadoop dfsadmin -safemode leave

      7.     java.net.NoRouteToHostException: No route to host

    解决方法:sudo /etc/init.d/iptables stop

      8.更改namenode后,在hive中运行select 依旧指向之前的namenode地址

    解决办法:将metastore中的之前出现的namenode地址全部更换为现有的namenode地址

      9.  ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ""PARTITIONS"" in Catalog "" Schema "". JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "org.jpox.autoCreateTables"

    原因:就是因为在 hive-default.xml 里把 org.jpox.fixedDatastore 设置成 true 了,应该把值设为false。

      10.INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010

    >  INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001

    >  INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

    Bad connect ack with firstBadLink 192.168.1.16:50010

    >  INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001

    >  INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

    Bad connect ack with firstBadLink 192.168.1.11:50010

    > INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001号的

    > INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:

    Bad connect ack with firstBadLink 192.168.1.16:50010

    > INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001

    >  WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable

    to create new block.

    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)

    >at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996

    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182)

    > WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001

    bad datanode[2] nodes == null

    >  WARN hdfs.DFSClient: Could not get block locations. Source file "/user/umer/8GB_input"

    - Aborting...

    > put: Bad connect ack with firstBadLink 192.168.1.16:50010

    解决方法:

    1) '/etc/init.d/iptables stop' -->stopped firewall

    2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux

      11.某次正常运行mapreduce实例时,抛出错误

    java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…

    atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735

    at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

    java.io.IOException: Could not get block locations. Aborting…at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)

    at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)

    at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

    问题原因是linux机器打开了过多的文件导致。解决方法:用命令ulimit -n可以发现linux默认的文件打开数目为1024,修改/ect/security/limit.conf,增加hadoop soft 65535,再重新运行程序(最好所有的datanode都修改),问题解决。

      12.  bin/hadoop jps后报如下异常:

    Exception in thread "main" java.lang.NullPointerException

    atsun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127)atsun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133)at sun.tools.jps.Jps.main(Jps.java:45)

    解决方法:系统根目录/tmp文件夹被删除了。重新建立/tmp文件夹即可。

      13. bin/hive中出现 unable to  create log directory /tmp/

    解决办法:系统根目录/tmp文件夹被删除了。重新建立/tmp文件夹即可。

      14.MySQL报错

    [root@localhost mysql]# ./bin/mysqladmin -u root password '123456'

    ./bin/mysqladmin: connect to server at 'localhost' failed

    error: 'Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)'

    Check that mysqld is running and that the socket: '/tmp/mysql.sock' exists!

    解决不能通过mysql .sock连接MySQL问题 这个问题主要提示是,不能通过 '/tmp/mysql .sock'连到服务器,而php标准配置正是用过'/tmp/mysql .sock',但是一些mysql 安装方法 将 mysql .sock放在/var/lib/mysql .sock或者其他的什么地方,可以通过修改/etc/my.cnf文件来修正它,打开文件,可以看到如下的配置:

    [mysqld]

    socket=/var/lib/mysql.sock

    改一下就好了,但也会引起其他的问题,如mysql 程序连不上了,再加一点:

    [mysql]

    socket=/tmp/mysql.sock

    或者还可以通过修改my.ini中的配置来使用其他的mysql.sock来连,或者用这样的方法:

    ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock

    成功了,就是这样ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock

      15. NameNode不能切换

    less $BEH_HOME/logs/hadoop/hadoop-hadoop-zkfc-hadoop001.log

    日志文件中有错误提醒:Unable to create SSH session

    com.jcraft.jsch,JSchException:java.io.FileNotFoundException:~/.ssh/id_rsa (No such file or directory)

    解决办法:vim $HADOOP_HOME/etc/hadoop/HDFS-site.xml

    修改以下参数:

    <property>

        <name>dfs.ha.fencing.ssh.private-key-files</name>

        <value>/home/hadoop/.ssh/id_rsa</value>

        <final>true</final>

        <description/>

    </property>

     

  • 相关阅读:
    Construct Binary Tree from Preorder and Inorder Traversal
    Construct Binary Tree from Inorder and Postorder Traversal
    Maximum Depth of Binary Tree
    Sharepoint 2013 创建TimeJob 自动发送邮件
    IE8 不能够在Sharepoint平台上在线打开Office文档解决方案
    TFS安装与管理
    局域网通过IP查看对方计算机名,通过计算机名查看对方IP以及查看在线所有电脑IP
    JS 隐藏Sharepoint中List Item View页面的某一个字段
    SharePoint Calculated Column Formulas & Functions
    JS 两个一组数组转二维数组
  • 原文地址:https://www.cnblogs.com/10158wsj/p/8283098.html
Copyright © 2011-2022 走看看