zoukankan      html  css  js  c++  java
  • 【故障•监听】TNS-12518、TNS-00517和 Linux Error:32:Broken pipe

    【故障|监听TNS-12518TNS-00517Linux Error32Broken pipe

    1.1  BLOG文档结构图

    wps8CE6.tmp 

     

    1.2  前言部分

    1.2.1  导读和注意事项

    各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~

    ① Linux Error: 32: Broken pipe的可能原因(重点)

    ② TNS-12518: TNS:listener could not hand off client connection的一般解决过程

    SQL*Plus登录报错:ORA-12537: TNS:connection closed

    Tips

    本文在itpubhttp://blog.itpub.net/26736162)、博客园(http://www.cnblogs.com/lhrbest)和微信公众号(xiaomaimiaolhr有同步更新

    文章中用到的所有代码相关软件相关资料及本文的pdf版本都请前往小麦苗的云盘下载小麦苗的云盘地址见:http://blog.itpub.net/26736162/viewspace-1624453/

    若网页文章代码格式有错乱,下载pdf格式的文档来阅读

    本篇BLOG,代码输出部分一般放在一行一列的表格中。

    本文如有错误或不完善的地方请大家多多指正,ITPUB留言或QQ皆可,您的批评指正是我写作的最大动力。

     

     

    1.3  故障分析及解决过程

     

    1.3.1  故障环境介绍

     

    项目

    source db

    db 类型

    单机

    db version

    11.2.0.3.0

    db 存储

    ASM

    OS版本及kernel版本

    RHEL 6.5

     

    1.3.2  故障发生现象及报错信息

    登陆报错:

    [oracle@orcltest ~]$ sqlplus 'sys/"l@h /0"'@LHRDB as sysdba

     

    SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 16 12:06:36 2017

     

    Copyright (c) 1982, 2011, Oracle.  All rights reserved.

     

    ERROR:

    ORA-12537: TNS:connection closed

     

     

     

    查监听日志(/u01/app/11.2.0/grid/network/admin/listener.ora),报错如下:

    16-MAR-2017 12:06:36 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=lhrdb)(CID=(PROGRAM=sqlplus)(HOST=orcltest)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.59.129)(PORT=12333)) * establish * lhrdb * 12518

    TNS-12518: TNS:listener could not hand off client connection

    TNS-12547: TNS:lost contact

      TNS-12560: TNS:protocol adapter error

       TNS-00517: Lost contact

        Linux Error: 32: Broken pipe

     

     

    1.3.3  故障分析

    潇湘隐者http://www.cnblogs.com/kerrycode/p/4164838.html

    我的解决参考:http://www.itpub.net/thread-1870217-1-1.html

     

    根据MOS:Troubleshooting Guide for TNS-12518 TNS listener could not hand off client connection (文档 ID 550859.1)中,

    Error: 32: Broken pipe

     

     

    Error stack in listener log:

     

    TNS-12518: TNS:listener could not hand off client connection

    TNS-12547: TNS:lost contact

       TNS-12560: TNS:protocol adapter error

         TNS-00517: Lost contact

           IBM/AIX RISC System/6000 Error: 32: Broken pipe

     

     

    Cause:

    The error 32 indicates the communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.

     

    Action:

    1.  One of reason would be processes parameter being low, and can be verified by the v$resource_limit view.

    2. In Shared Server mode, check the 'lsnrctl services' output and see if the dispatcher has refused any connections, if so, then consider increasing the number of dispatchers.

    3. Check the alert log for any possible errors.

    4. Memory resource is also another cause for this issue. Check the swap, memory usage of the OS.

    5. If RAC/SCAN or listener is running in separate home, check the following note:

     

    Note: 1069517.1  ORA-12537 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User

     

     

    根据第5条,因为我的grid11.2.0.1,而Oracle11.2.0.3的,查询MOS:ORA-12537 / ORA-12547 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User (文档 ID 1069517.1),里边明确指出是由于$RDBMS_HOME/bin/oracle文件的权限问题导致。

    1.3.4  故障解决

    $ORACLE_HOME/bin/oracle文件的权限问题,应该为-rwsr-s--x,属主应该是oracle:asmadmin

    解决办法:

    方法:

    1、修改oracle的权限为6751

    [root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    -rwxr-s--x. 1 oracle asmadmin 210823844 May 18  2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest ~]#  chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    -rwsr-s--x. 1 oracle asmadmin 210823844 May 18  2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest bin]#

     

     

    2、重新用setasmgidwrap设置oracle

    [root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    -rwxr-s--x. 1 oracle asmadmin 210823844 May 18  2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest ~]# /u01/app/11.2.0/grid/bin/setasmgidwrap -o /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest bin]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    -rwsr-s--x. 1 oracle asmadmin 210823844 May 18  2014 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest bin]#

    [root@orcltest bin]# stat /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

      File: `/u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle'

      Size: 210823844       Blocks: 411776     IO Block: 4096   regular file

    Device: 802h/2050d      Inode: 1717737     Links: 1

    Access: (6751/-rwsr-s--x)  Uid: (  501/  oracle)   Gid: (  504/asmadmin)

    Access: 2017-03-16 11:05:44.809363974 +0800

    Modify: 2014-05-18 17:09:50.508549983 +0800

    Change: 2017-03-16 12:33:15.733816820 +0800

    [root@orcltest bin]#

    [root@orcltest bin]#

     

     

     

    重新连接:

    [oracle@orcltest ~]$ sqlplus 'sys/"l@h /0"'@LHRDB as sysdba

     

    SQL*Plus: Release 11.2.0.3.0 Production on Thu Mar 16 13:32:48 2017

     

    Copyright (c) 1982, 2011, Oracle.  All rights reserved.

     

     

    Connected to:

    Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

    With the Partitioning, Automatic Storage Management, OLAP, Data Mining

    and Real Application Testing options

     

    SYS@LHRDB>

     

     

    1.4  故障处理总结

    如果可执行文件$ORACLE_HOME/bin/oracle的属主或权限设定出了问题,那么可能会造成很多问题。例如:无法登陆到数据库、ora-600错误、“TNS-12518: TNS:listener could not hand off client connection”“Linux Error: 32: Broken pipe”“ORA-12537: TNS:connection closed”、访问ASM磁盘出错等。解决办法很简单,可以在grid用户下运行setasmgidwrap命令重新配置$ORACLE_HOME/bin/oracle可执行文件的权限和属主或者直接将oracle文件的权限修改为6751$ORACLE_HOME/bin/oracle可执行文件正确属主应该是oracle:asmadmin,并且权限必须有s共享才可以,如下所示:

    [root@orcltest ~]$ which setasmgidwrap

    /u01/app/11.2.0/grid/bin/setasmgidwrap

    [root@orcltest ~]$ setasmgidwrap -o /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest ~]$ ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    -rwsr-s--x 1 oracle asmadmin 232399083 Apr 21  2015 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest ~]# ll /u01/app/11.2.0/grid/bin/oracle

    -rwsr-s--x. 1 grid oinstall 203972117 Jan  5  2015 /u01/app/11.2.0/grid/bin/oracle

    [root@orcltest ~]# chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest ~]# ll /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle       

    -rwsr-s--x 1 oracle asmadmin 232399083 Apr 21  2015 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    [root@orcltest bin]# cd /u01/app/oracle/product/11.2.0/dbhome_1/bin/

    [root@orcltest bin]# which stat

    /usr/bin/stat

    [root@orcltest bin]# stat oracle

      File: `oracle'

      Size: 210823844       Blocks: 411776     IO Block: 4096   regular file

    Device: 802h/2050d      Inode: 1717737     Links: 1

    Access: (6751/-rwsr-s--x)  Uid: (  501/  oracle)   Gid: (  504/asmadmin)

    Access: 2017-03-16 12:33:44.809363974 +0800

    Modify: 2014-05-18 17:09:50.508549983 +0800

    Change: 2017-03-16 11:05:15.733816820 +0800

     

     

    问:在UNIX/LINUX环境中,oracle数据库启动后存在许多后台进程和前台进程,虽然相关进程产生一些trace文件也是常有的事情,但是真正是什么决定了oracle相关进程的属性呢?

    答:通常来说,oracle的后台进程的调用是依赖于$ORACLE_HOME/bin/oracle这个二进制文件,但它从远端连入而分配的服务器进程(server process)相关属主的属性则是继承自listener进程,而listener进程的属主属性同样是进程自其启动的用户(分oracle用户和grid用户)$ORACLE_HOME/bin/oracle的属主属性。

     

     

    其他原因:

    启动ORACLE监听的时候 报错 Linux Error: 32: Broken pipe

    原因:原来/home/oracle/product/9.2.0.1.0/network/log/listener.log 文件超过2G,监听就会断掉。

    解决办法:清空日志文件

                  [oracle@localhost oracle]$ cd $ORACLE_HOME/network/log

                  [oracle@chicago log]$ cat listener.log

    至此问题解决

     

    有个命令可以对这个写日志进行关闭

    LSNRCTL> set log_status off

    要继续写日志,只需要

    LSNRCTL> set log_status on

    也可以在 listener.ora 文件里面添加一项

    LOGGING_LISTENER = OFF

    添加了这个之后无论你怎么重启监听,都不会写日志了。

    1.5  用到的SQL集合

    setasmgidwrap -o /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    chmod 6751 /u01/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    stat oracle

     

    1.6  参考文章

    1.6.1  MOS




    1.6.1.1  Troubleshooting Guide for TNS-12518 TNS listener could not hand off client connection (文档 ID 550859.1)


    In this Document

      Purpose
      Troubleshooting Steps
       
      Section I: Steps to approach ORA-12518/TNS-12518 Error:
      Section II: Commonly Known Errors:
      Error: 32: Broken pipe
      Error: 11: Resource temporarily unavailable
      Error: Connection Pooling limit reached
      Error: 24: Too many open files
      Section III: Errors Specific to Windows
      Error: 2: No such file or directory 
      Error: 233: Unknown error
      Error: 54: Unknown error
      Error: 10022: Unknown error
      References


    APPLIES TO:

    Oracle Net Services - Version 10.1.0.5 to 12.1.0.2 [Release 10.1 to 12.1]
    Information in this document applies to any platform.

    PURPOSE

    This article discusses  how to troubleshoot the ORA-12518/TNS-12518 listener errors.

    TROUBLESHOOTING STEPS

     

    Section I: Steps to approach ORA-12518/TNS-12518 Error:


    ORA-12518/TNS-12518 indicates a problem while listener hands off the client connection to the server process or dispatcher process.TNS-12518 is logged in the listener log. The client might receive ORA-12518 or some other disconnection errors like ORA-12537. Once TNS-12518 is noted in the listener log, follow the below steps to approach and resolve this error. 

    Let us have a small discussion about how actually database connections are made:

    In Dedicated mode, database client contacts listener and supplies the SERVICE NAME of the database. Then listener spawns a dedicated server process and hands off the client connection to this dedicated server process. TNS-12518 indicates a problem while handing off the client connection to the server process.

    In Shared Server mode, database client contacts listener and supplies the SERVICE NAME of the database. Then listener hands off the client connection to one of the dispatcher configured for that service. TNS-12518 indicates a problem while handing off the client connection to the dispatcher server process.


    Though this error is logged in the listener log, the listener is just the messenger, ORA-12518/TNS-12518 is mostly related to RDBMS and OS resources. 



    Step 1. Is listener version compatible to the database

    Check if you are using compatible listener version for your database version. 

    If the database is 8i then use 8i or 9i listeners only. 10g listeners are not compatible to work with 8i databases. 
    For 9i databases, 9i or 10g listeners can be used. 
    For 10g databases, only 10g listeners can be used. 

    The general rule is that use the higher version of the listener when there is a version mismatch between database and the listener. 




    Step 2. Gather more information from listener log

    The first place you would look for the TNS-12518 error is the listener log. Usually the listener log would be located under $ORACLE_HOME/network/log directory. You can use 'lsnrctl status' command output to look for the location of the listener log file.

    $lsnrctl status 
    - - - 
    - - - 
    Listener Parameter File /ora10g/home_ora10g/network/admin/listener.ora 
    Listener Log File /ora10g/home_ora10g/network/log/listener.log 
    - - - 
    - - -

    listener log gives the complete error stack and the database service name to which the client tried to connect to.

    19-SEP-2007 13:55:34 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))(SERVICE_NAME=test.oracle.com)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.3)(PORT=36030)) * establish * test.oracle.com * 12518 
    TNS-12518: TNS:listener could not hand off client connection 
     TNS-12547: TNS:lost contact 
      TNS-12560: TNS:protocol adapter error 
       TNS-00517: Lost contact 
        Linux Error: 32: Broken pipe


    In the above example, listener log shows the complete error stack, the bottom error being 32 is the OS error. It also shows that the jdbc client from IP 10.10.10.3 has tried to connect to the database service 'test.oracle.com' and failed with the error 12518.

    Look for the lowest error in the stack. That is the error we have to concentrate on and try to resolve it. In the above example, the lower error is 'Linux Error: 32: Broken pipe'.




    Step 3
    . Are service handlers in blocked state

    Check if the handlers are in blocked state. Check the output of the 'lsnrctl services'. Examine the status information under the database service name. From the listener log you would know which database service was affected by the 12518 error, now with the output of the 'lsnrctl services' under that service name gather more information.

    Service "test.oracle.com" has 1 instance(s). 
      Instance "db10g", status READY, has 2 handler(s) for this service... 
        Handler(s): 
          "DEDICATED" established:9 refused:0 state:ready 
             LOCAL SERVER 
          "D000" established:10 refused:0 current:0 max:972 state:ready 
             DISPATCHER  
             (ADDRESS=(PROTOCOL=tcp)(HOST=oid.mohan.com)(PORT=33487))


    The highlighted state should be in 'ready' state for the connection to be successful. If the state is in 'blocked' then the connection are not possible. The state of a handler could be in blocked state in the following scenario: 
    i. The database parameter processes reached its value. 
    ii. The database is in the process of startup or shutting down. 

    In shared server mode, the number of dispatchers should be set according to the load that you expect. 'lsnrctl services' output shows the maximum number(max:997) of connections that the dispatcher would accept and the number connections refused (refused:0) by this dispatcher. If any connections refused by the dispatcher, then consider increasing the number of dispatchers. 

    If you are using PFILE edit init.ora and increase the dispatchers parameter. If you are using SPFILE you can dynamically increase the dispatchers parameter by the'alter system set' command. 




    Step 4
    Is a local BEQ connection successful

    Check if local BEQ connection to the database works fine. It also verifies if the database is up and in good condition to accept the connection. If the database is down or in a hung state then a connection request to the database by the listener will not be possible. 

    Connect to the database server via telnet or ssh and check if a local bequeath SQL*Plus connection works. In other words, issue:
    sqlplus username/password [Enter]

    This connection bypasses the listener and directly connects to the database via the BEQ (bequeath) protocol. If this fails, then the TNS-12518 listener error is simply a result of the database issue. 

    One such error is:
    ORA-12560: TNS:protocol adapter error

    A possible cause for this error on Microsoft Windows servers, is that the Windows Database Service has not yet been created (common when creating a "standby" instance).
    Resolution for this would be to create the Windows Service first by using the "oradim" command (see the Database Admin guide for details on oradim and service creation).





    Step 5
    Has number of processes reached its limit?

    If local BEQ is successful, check the below query

    SQL> select * from v$resource_limit; 

    RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_AL LIMIT_VALU 
    -------------------- ------------------- --------------- ---------- ---------- 
    processes 249 250 250 250 
    sessions 54 82 280 280 
    - - - - - -
    - - - - - -


    Verify if the processes or sessions reached its limit value. If these database parameters reached its limit value, then consider increasing it accordingly. 

    In the above example, the processes parameter has been set to 250. It's MAX_UTILIZATION has reached the limit value of 250, so the processes parameter should be increased further to accomodate the number of incoming connections. 

    Edit the init.ora and set the processes parameter to a higher value. By default, if you just increase the processes it is enough, the sessions value would automatically be increased.

    Check the alert.log for a corresponding error such as "ksvcreate" process failed, etc.  Check timestamp against listener.log timestamps for causal relationship.




    Step 6Are OS kernel parameters configured for optimum?


    Database is operated by a single user, normally it would be 'oracle' user. At the Operating System level, there is a limit for the number of process spawned by a user. And also there is a limit for the total number of process running on the entire OS. 

    The Oracle Database and the newly spawned processes would be owned by the 'oracle' user. And so make sure that these values are set accordingly.On Unix these values are configured through the configurable OS kernel parameters and is specific to Operating Systems. You will have check your corresponding OS documentation for your OS. 

    For example for HP-UX the configurable kernel parameters are, 

    maxuprc  Maximum number of processes for each non-root user 
    nproc      Maximum number of processes on the system 






    Step 7Does alert log have any errors?

    Look in alert log and look for any errors related to memory or process during the time the error TNS-12518 occurred in the listener log. If the alert log has any memory related errors, there there is a potential memory resource issue at the OS level. 

    OS memory issues can be addressed by the below: 
    i.   Make sure that the OS has been configured with the enough Swap memory. In case of Windows it is called as Virtual Memory. 
    ii.  Reduce the size of SGA, thus the newly spawned server process will have some more system memory available. 
    iii. Reduce the PGA size, so that the newly spawned server process would occupy less memory. 
    iv. If you are in DEDICATED mode, try switching to MTS mode. 

    To address memory issue for 32-bit Windows: Refer Note 371983.1

    If there are any memory or process related error in the alert log during the time the TNS-12518 is logged in the listener log, then those errors in the alert log should be focused on and should be solved at first. Because, the errors in the alert log is the base error for the TNS-12518 in the listener log. 

    However, the errors in the alert logs are not being discussed in this article, they are out of the scope of this article.

    Step 8. If using a statically defined SID_DESC in the listener.ora file for your sid, ensure that it is configured properly. 

    A common mistake is to include a (PROGRAM=EXTPROC) parameter:

      (SID_LIST= 
        (SID_DESC = 
          (GLOBAL_DBNAME = ORCL.oracle.com) 
          (SID_NAME = ORCL) 
          (PROGRAM=extproc) 
          (ORACLE_HOME = D:oracleproduct10.2.0db_1)))

    This misconfiguration occurs when the PLSExtproc SID_DESC is copied, pasted and edited in the listener.ora file.  The inclusion of the PROGRAM line will cause an ORA-12518 to be returned to the client.  Here's the corrected SID_DESC for our example SID:

      (SID_LIST= 
         (SID_DESC = 
         (GLOBAL_DBNAME = ORCL.oracle.com) 
         (SID_NAME = ORCL) 
         (ORACLE_HOME = D:oracleproduct10.2.0db_1)))

    Section II: Commonly Known Errors:


    This section lists some of the known and reported errors. It is also recommended that you refer the Section I above for a generic troubleshooting approach to the error TNS-12518.

    Below shows example error stack that can be found in the listener log. The last line in the error stack shows the actual operating system name. Depending on the OS, only the operating system name would be different. For example, if you encounter error 32: Broken pipe, according to the OS, the last line in the error stack would be different only by the OS name, as shown below.

    Solaris Error: 32: Broken pipe 
    HPUX Error: 32: Broken pipe 
    Linux Error: 32: Broken pipe


    _______________________________________________________________________________________________________________________________________ 

    Error: 32: Broken pipe


    Error stack in listener log:
     

    TNS-12518: TNS:listener could not hand off client connection 
     TNS-12547: TNS:lost contact 
       TNS-12560: TNS:protocol adapter error 
         TNS-00517: Lost contact 
           IBM/AIX RISC System/6000 Error: 32: Broken pipe 


    Cause: 
    The error 32 indicates the communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.

    Action: 
    1.  One of reason would be processes parameter being low, and can be verified by the v$resource_limit view. 
    2. In Shared Server mode, check the 'lsnrctl services' output and see if the dispatcher has refused any connections, if so, then consider increasing the number of dispatchers. 
    3. Check the alert log for any possible errors. 
    4. Memory resource is also another cause for this issue. Check the swap, memory usage of the OS.
    5. If RAC/SCAN or listener is running in separate home, check the following note:

    Note: 1069517.1  ORA-12537 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User 
    ______________________________________________________________________________________________________________________________________

    Error: 11: Resource temporarily unavailable


    Error stack in listener log: 
    TNS-12518: TNS:listener could not hand off client connection 
     TNS-12549: TNS:operating system resource quota exceeded 
      TNS-12560: TNS:protocol adapter error 
       TNS-00519: Operating system resource quota exceeded 
        IBM/AIX RISC System/6000 Error: 11: Resource temporarily unavailable 


    Cause: 
    As the error indicates operating system resource has exceeded. 

    Action: 
    1. Increase the appropriate OS kernel parameters for 'maximum number of processes allowed per user'
    For example for HP-UX the parameters are maxuprc and nproc
    ______________________________________________________________________________________________________________________________________

    Error: 12: Not enough space

     Error stack in listener log: 
     TNS-12518: TNS:listener could not hand off client connection 
       TNS-12549: TNS:operating system resource quota exceeded 
         TNS-12560: TNS:protocol adapter error 
           TNS-00519: Operating system resource quota exceeded 
             IBM/AIX RISC System/6000 Error: 12: Not enough space 


    Cause: 
    This is a memory related issue. The error indicates that there is not enough memory available to spawn and hand off the client connections. 
    Typical problems are: 
    - Out of system memory / swap 
    - Out of process slots in the process table 
    - Streams resources depleted 
    - Out of File Handles 
    - sga memory usage 


    Action: 
    1. Check in the alert log for any possible memory related error. 
    2. Increase swap/Virtual memory if possible the available memory. 
    3. SGA and PGA can be reduced to address the memory consumption. 
    4. MTS mode can be used to reduce the amount of process and memory consumption. 

    ________________________________________________________________________________________________________________________________________

    Error: Connection Pooling limit reached

     Error stack in listener log: 
     TNS-12518 TNS:listener could not hand off client connection 
      TNS-12564 TNS:connection refused 
       TNS-12602 TNS: Connection Pooling limit reached 


    Action: 
    1. Try increasing initial number of dispatchers. 
    _______________________________________________________________________________________________________________________________________ 

    Error: 24: Too many open files

    Error stack in listener log:
        TNS-12518: TNS:listener could not hand off client connection
          TNS-12560: TNS:protocol adapter error
           TNS-00530: Protocol adapter error
            Linux Error: 24: Too many open files


    or    


        TNS-12518: TNS:listener could not hand off client connection
       
       TNS-12560: TNS:protocol adapter error
           TNS-00530: Protocol adapter error
            Solaris Error: 24: Too many open files

    Error Description:
     
    Out of file descriptors

    Cause:
    Can be caused by the following unpublished defect:

    BUG 13078786 - LISTENER GOES DOWN SUDDENLY W/ LINUX ERROR: 32: BROKEN PIPE

    Action:

    See Note 1527483.1  11.2 : ORA-12518 Listener Hangs and Reports "Too Many Open Files" 

    Use prescribed workaround OR apply one-off patch to your environment if available.
    _______________________________________________________________________________________________________________________________________

    Error: 1: Not Owner


    Error Description:
    The listener doesn't own or have permissions to write a necessary file

    Cause:

    The listener doesn't have adequate permission on socket files

    Error stack in listener log:

    TNS-12518: TNS:listener could not hand off client connection 
       TNS-12560: TNS:protocol adapter error 
         TNS-00530: Protocol adapter error 
            HPUX Error: 1: Not owner

    Action:  Clear /var/tmp/.oracle/ directory

    IMPORTANT NOTE:  In RAC environment, please use caution when removing existing socket files.  See Note 2099377.1  How to remove Network socket files in a RAC Environment for Cluster/Resource startup issues


    ========================================================================================================================= 

    Section III: Errors Specific to Windows

    It is also recommended that you refer the Section I above for a generic troubleshooting approach to the error TNS-12518.

    This section briefly describes about the errors that are encountered on Windows Operating System. TNS-12518 most commonly occurs on 32-bit OS due to its memory constraint, however TNS-12518 can occur on 64-bit OS as well. See Note 873752.1 for more information on Windows memory addressing and the 3GB switch.

    _______________________________________________________________________________________________________________________________________  

    Error: 2: No such file or directory 

    Error stack in listener log:  
        TNS-12518: TNS:listener could not hand off client connection   
          TNS-12560: TNS:proto adapter error    
            TNS-00530: Proto adapter error     
            32-bit Windows Error: 2: No such file or directory 

    Error Description: ERROR_FILE_NOT_FOUND 2 The system cannot find the file specified. 

    Cause: This indicates the database service is not actually available 

    Action: 1.Verify if the intended database really up and accepting local BEQ connections. 


    _______________________________________________________________________________________________________________________________________  

    Error: 233: Unknown error

    Error stack in listener log: 
     TNS-12518: TNS:listener could not hand off client connection 
      TNS-12560: TNS:protocol adapter error 
       TNS-00530: Protocol adapter error 
        32-bit Windows Error: 233: Unknown error 


    Error Description: 
    ERROR_PIPE_NOT_CONNECTED 
    233 
    No process is on the other end of the pipe. 

    Cause: 
    The communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process. 


    Action: 
    Refer Note 371983.1

    _______________________________________________________________________________________________________________________________________

    Error: 54: Unknown error

    Error stack in listener log: 
     TNS-12518: TNS:listener could not hand off client connection 
      TNS-12571: TNS:packet writer failure 
       TNS-12560: TNS:protocol adapter error 
        TNS-00530: Protocol adapter error 
          32-bit Windows Error: 54: Unknown error 


    Error Description: 
    ERROR_NETWORK_BUSY 
    54 
    0x36 
    The network is busy. 


    Cause: 
    This indicates a bottleneck at the network layer(TCP/IP). 


    Action: 
    1.Try increasing dispatchers and shared servers.

    Error: 10022: Unknown error

    Error stack in listener log:   
    12518 TNS-12518: TNS:listener could not hand off client connection 
        
         TNS-12560: TNS:protocol adapter error TNS-00534: Failed to grant connection ownership to child           
             64-bit Windows Error: 10022: Unknown error 

    Error Description:
    Error: 10022: Invalid Argument

    Cause:
    An invalid argument was supplied.


    Action: 
    Check the Control Panel Services Properties and confirm both listener and instance are started with same "Log on As" account.

    Note that in 12c, the account would be the Oracle Home User. 


        

    Note that a JDBC Thin connection from a Windows client might also yield the 12518:

    Macromedia}{Oracle JDBC Driver][Oracle]Connection refused, (DESCRIPTION=(TMP=)(VSNNUM=186646784)(ERR=12518)(ERROR_STACK= (ERROR=(CODE=12518)(EMFI=4))(ERROR=(CODE=12560)(EMFI=4))(ERROR=(CODE=530)(EMFI=4))(ERROR=(BUF='64-bit Windows Error:203: Unknown error'))))

     






    REFERENCES

    NOTE:873752.1 - Windows Memory Configuration: 32-bit and 64-bit
    NOTE:1069517.1 - ORA-12537 / ORA-12547 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User
    NOTE:371983.1 - 10gR2 Dedicated Connections Intermittently Fail with TNS-12518

    NOTE:1527483.1 - 11.2 : ORA-12518 Listener Hangs and Reports "Too Many Open Files" 




    1.6.1.2  ORA-12537 / ORA-12547 or TNS-12518 if Listener (including SCAN Listener) and Database are Owned by Different OS User (文档 ID 1069517.1)


    In this Document

      Symptoms
      Cause
      Solution
      References


    APPLIES TO:

    Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
    Oracle Database Configuration Assistant - Version 11.1.0.7 and later
    Oracle Net Services - Version 11.2.0.2 and later
    Information in this document applies to any platform.

    SYMPTOMS

    In environment where listener home (including SCAN listener which resides in GRID Infrastructure/ASM home) and database home are owned by different OS user, ORA-12537 could happen when connecting through listener, when creating database through DBCA, or when installing database software and creating a database in runInstaller. Job Role Separation is a typical example as SCAN and local grid home listener is owned differently than database.

    • Error detail
    12537, 00000, "TNS:connection closed"
    // *Cause: "End of file" condition has been reached; partner has disconnected.
    // *Action: None needed; this is an information message.

    Before the error is reported, connection could hang for a while, and a core file may also be generated.
    • Screen output:
    SQL> conn system/xxx@OSES
    ERROR:
    ORA-12537: TNS:connection closed

    OR

    PRCR-1079 : Failed to start resource ora.db11.db
    CRS-5011: Check of resource db11 failed: details at ..
    CRS-2674: Start of ora.db11.db on racnode1 failed
    ORA-12546: TNS:permission denied
    • listener_scann.log or listener.log
    TNS-12518: TNS:listener could not hand off client connection
    TNS-12546: TNS:permission denied
     TNS-12560: TNS:protocol adapter error
      TNS-00516: Permission denied
       Linux Error: 13: Permission denied

    OR

    TNS-12518: TNS:listener could not hand off client connection
    TNS-12547: TNS:lost contact
     TNS-12560: TNS:protocol adapter error
      TNS-00517: Lost contact
       Linux Error: 32: Broken pipe
    • runInstaller error if installing
    INFO: Starting Output Reader Threads for process /ocw/grid/bin/kfod
    INFO: Parsing KFOD-00300: OCI error [-1] [OCI error] [ORA-12547: TNS:lost contact
    INFO: Parsing ] [12547]
    INFO: Parsing
    INFO: The process /ocw/grid/bin/kfod exited with code 1
    ..
    SEVERE: [FATAL] [INS-30502] No ASM disk group found.
       CAUSE: There were no disk groups managed by the ASM instance +ASM1.


    CAUSE


    1. In environments where the listener is not run in the same ORACLE_HOME where the database resides, the listener owner (including SCAN listener) may not be able to access the oracle binary in the database home.  This is common in RAC or whenever a GRID_HOME and a database ORACLE_HOME are installed.

    As listener owner:

    $ ls -l $RDBMS_HOME/bin/oracle
    ls: /home/oracle/app/oracle/product/11.2/db/bin/oracle: Permission denied



    2. Oracle binary in database home has wrong permission:

    ls -l $RDBMS_HOME/bin/oracle
    -rwxr-x--x 1 oracle asmadmin 184286251 Aug  9 16:25 /home/oracle/app/oracle/product/11.2/db/bin/oracle


    The permission "-rwxr-x--x" is wrong as it's missing suid bit, oracle binary should have permission of 6751: 

    -rwsr-s--x 1 oracle asmadmin 184286251 Aug  9 16:25 /home/oracle/app/oracle/product/11.2/db/bin/oracle


    Note: If Job Role Separation is in place, the group will be , otherwise it will be


    3. File System for database home does not support setuid/suid or has nosetuid/nosuid set:

    mount| grep <mount_point_of_oracle_home>
    /home/oracle on /dev/dsk/diskoracle read/write/nosuid..


    4. RDBMS_HOME/lib has wrong ownership/permission:

    As listener owner:

    $ ls -l $RDBMS_HOME/lib
    ls: /home/oracle/app/oracle/product/11.2/db/lib: Permission denied


    5. Another cause may be permissions on the RDBMS Directory structure or Home directory which needs to be accessed by the CRS user.
    Check that the RDBMS $ORACLE_HOME is set to 755.
    This can be seen from an OS trace such as strace or truss when using it to trace the CRS user running the "oracle" executable which fails with the "Permission denied" error.

    Also:

    a) Log in as the "GRID" user on each node, and issue the following (on each directory under the RDBMS Home) :-


    [grid@orcl002:+ASM2 ~]$ ls -al /home/oracle/app/oracle
    ls: cannot open directory /u01/app/oracle: Permission denied

    ***NOTE: the Oracle directory has 700 for the permissions, which should be changed to 755:


    Current (incorrect) -->

    [grid@orcl002:+ASM2 ~]$ ll /home/oracle/app
    drwx------. 8 oracle oinstall  4096 Oct 12 08:38 oracle


    Should be (correct) -->

    [grid@orcl002:+ASM2 ~]$ ll /home/oracle/app
    drwxr-xr-x. 8 oracle oinstall  4096 Oct 12 08:38 oracle



    b) Likewise the /product directory has 700 perms, so change to 755 -->

    [grid@orcl002:+ASM2 ~]$ ls -al /home/oracle/app/oracle/product
    ls: cannot open directory /u01/app/oracle/product: Permission denied


    Current (incorrect) -->

    [grid@orcl002:+ASM2 ~]$ ls -al /home/oracle/app/oracle
    drwx------.  3 oracle oinstall 4096 Oct  12 08:58 product


    Should be (correct) -->

    [grid@orcl002:+ASM2 ~]$ ls -al /u01/app/oracle
    drwxr-xr-x.  3 oracle oinstall 4096 Oct  12 08:58 product




    c) Now a connection should work:

    $ sqlplus system/@'(description=(address=(protocol=tcp)(host=orcl002)(port=1521))(connect_data=(server=dedicated)(service_name=ORCL)))'

    SQL>



    SOLUTION

    Solution is to make sure file system for database home has setuid/suid set, database binary($RDBMS_HOME/bin/oracle) has correct ownership and permission, and listener owner is able to access database oracle binary (as listener owner, "ls -l $RDBMS_HOME/bin/oracle" will tell)

    If its Job Role Separation environment, please refer to the following document:

    Oracle? Grid Infrastructure 
    Installation Guide
    11g Release 2 (11.2)

    Oracle ASM Job Role Separation Option with SYSASM

    REFERENCES

    NOTE:422173.1 - Local SQL*Plus Connection and DBCA Fails With: ORA-12547: TNS:Lost Contact 
    NOTE:975457.1 - How to Troubleshoot Connectivity Issue with 11gR2 SCAN Name
    NOTE:970619.1 - ORA-12545 or ORA-12537 While Connecting to RAC through SCAN name




    1.6.1.3  TNS Listener Crash with Core Dump (文档 ID 549932.1)


    In this Document

      Symptoms
      Cause
      Solution
      References


    APPLIES TO:

    Oracle Net Services - Version 10.2.0.1 to 10.2.0.4 [Release 10.2]
    Generic UNIX
    ***Checked for relevance on 6-JUL-2016***

    SYMPTOMS

    • There may be heavy load on the CPU shooting up to 100%.
    • The number of sessions in the database is well below the upper or maximum limit defined in the parameter  file.
    • The listener crashes suddenly during  this heavy CPU load generating the core.
    • (Optional) Listener.Ora has SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF.

    Listener Log: 
    ------------- 

    ..... 
    ..... 
    19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:pegasos eusysinuniface.exe)(HOST=TERVI-NB179)(USER=kjokioja))) * (ADDRESS= (PROTOCOL=tcp)(HOST=10.12.152.5)(PORT=1670)) * establish * ORAC * 12518 
    TNS-12518: TNS:listener could not hand off client connection 
    TNS-12571: TNS:packet writer failure 
    TNS-12560: TNS:protocol adapter error 
    TNS-00530: Protocol adapter error 
    Linux Error: 104: Connection reset by peer 
    19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:pegaos eusysinuniface.exe)(HOST=TERVI-0184A)(USER=paitasal))) * (ADDRESS=(PROTO COL=tcp)(HOST=10.12.176.136)(PORT=1574)) * establish * ORAC * 12518 
    TNS-12518: TNS:listener could not hand off client connection 
    TNS-12547: TNS:lost contact 
    TNS-12560: TNS:protocol adapter error 
    TNS-00517: Lost contact 
    Linux Error: 32: Broken pipe 
    ......... 
    ........                                                                                                            

    The Operating system log (/var/log/messages) may show the following :

    tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4 
    tnslsnr[6469]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9420 error 4 
    tnslsnr[7375]: segfault at 0000000000000018 rip 0000003eab668bb3 rsp 0000007fbfff9c70 error 4

    Gdb on the core file generated shows the following stack :

    #gdb  /home/oracle/db_1/bi/tnslsnr core.7375   Core was generated  by `/opt/oracle/ora102/bin/tnslsnr LISTENER -inherit'.  
        Program terminated with  signal 11, Segmentation fault.  
        Reading symbols from /opt/oracle/ora102/lib/libclntsh.so.10.1...(no debugging symbols found)...done. 
        ........ 
        ......... 
            (gdb) bt 
        #0  0x00000032b7468bb3 in _int_free () from  /lib64/tls/libc.so.6 
        #1  0x00000032b74691f6 in free () from /lib64/tls/libc.so.6 
        #2  0x000000000040f01f in nsglhe () 
        #3  0x00000000004114e9 in nsglma () 
        #4  0x00000000004061cb in main () 
        (gdb) where 
        #0  0x00000032b7468bb3 in _int_free () from /lib64/tls/libc.so.6 
        #1  0x00000032b74691f6 in free () from /lib64/tls/libc.so.6 
        #2  0x000000000040f01f  in nsglhe () 
        #3  0x00000000004114e9 in nsglma () 
        #4  0x00000000004061cb in main () 
                                     

    The core indicates that the program terminated with signal 11, Segmentation fault . 
    SIGSEGV is reported for improper memory handling .The default action for a program upon receiving 
    SIGSEGV is abnormal termination. This action will end the process.

    The virtual memory stats (vmstat) output may show huge paging/swapping activity .

    For Example: 
    ------------- 

    $vmstat 5 7 The si and so column under the swap section suggests the amount of memory swapped in from disk (/s)and amount of memory swapped to disk (/s) respectively. 

        procs                      memory    swap          io     system         cpu  
      r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id  
      2  0  0  27124   3940   1160  74528   0 1304  956  5360  272   191  51  21  98  
      3  0  0  27080   3756   1180  72580   0 260  2552   388  218   429  90  10  89  
     14  2  1  26808   5096   1188  69868  84 1108 2016  9064  490   567  59  22  96  
      4  0  0  25548   5912   1192  73032   0   0   436     0  478   736  50  50   0  
      5  0  0  25548   3940   1192  73548   0   0  1560     0  301   385  93   7   0  
      1  1  1  25548   3336   1192  71800   8 176  2848   432  258   147  91   9  88  
      1  0  0  25544   4124   1200  70480 116  60   836    60  171   200  97   3  92 


    Note: You may also use the top command to check the system memory usage.

    CAUSE

    Extensive paging/swapping activity is a clear  indication that the system is running out of the physical memory.

    SOLUTION

    Solution : 

    1. Increase the physical memory of the system. 

                                 OR 

    2. Apply the Patch 6139856 for unpublished Bug 6139856 if available for your platform.

                                 OR

    3. Configure Hugepages on the OS. Ref : Note 361323.1

    A Note on HugePages:

    Hugepages is a mechanism that allows the Linux kernel to utilise the multiple page size capabilities of modern hardware architectures.  Its important to know that utilising a huge amount of physical memory with the default page size consumes the TLB and adds processing overhead. The Linux kernel is able to set aside a portion of physical memory to be able be addressed using a larger page size. Since the page size is higher, there will be less overhead managing the pages with the TLB. In the Linux 2.6 series of kernels, hugepages is enabled using the CONFIG_HUGETLB_PAGE feature when the kernel is built. Systems with large amount of memory can be configured to utilise the memory more efficiently by setting aside a portion dedicated for hugepages. The actual size of the page is dependent on the system architecture. A typical x86 system will have a Huge Page Size of 2048 kBytes. The huge page size may be found by looking at the /proc/meminfo : 

    # cat /proc/meminfo |grep Hugepagesize Hugepagesize: 2048 kB 

    For more on configuring Hugepages please refer Note 361323.1

    REFERENCES

    BUG:6752308 - LISTENER DIED BY SEGFAULT AFTER TNS ERROR
    NOTE:361323.1 - HugePages on Linux: What It Is... and What It Is Not...




    1.6.2  博客

    潇湘隐者http://www.cnblogs.com/kerrycode/p/4164838.html

    最近一周,有一台ORACLE数据库服务器的监听服务在凌晨2点过几分的时间点突然崩溃,以前从没有出现过此类情况,但是最近一周出现了两次这种情况,检查时发现了如下一些信息:

    $ lsnrctl services
     
    LSNRCTL for Linux: Version 10.2.0.4.0 - Production on 12-DEC-2014 08:22:34
     
    Copyright (c) 1991, 2007, Oracle.  All rights reserved.
     
    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1)))
    TNS-12541: TNS:no listener
     TNS-12560: TNS:protocol adapter error
      TNS-00511: No listener
       Linux Error: 111: Connection refused
    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ceglnx01)(PORT=1521)))
    TNS-12541: TNS:no listener
     TNS-12560: TNS:protocol adapter error
      TNS-00511: No listener
       Linux Error: 111: Connection refused

     

    检查监听服务的日志信息(/u01/app/oracle/product/10.2.0/db_1/network/log/listener.log),发现如下信息

                                   第一次错误信息截图

    clip_image001

    08-DEC-2014 02:07:41 * (CONNECT_DATA=(SERVICE_NAME=epps)(CID=(PROGRAM=D:TasksFGIS2MESCEKgmt_auto.exe)(HOST=CEGWEB1)(USER=cegadmin))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.21.34)(PORT=4419)) * establish * epps * 12518
    TNS-12518: TNS:listener could not hand off client connection
     TNS-12547: TNS:lost contact
      TNS-12560: TNS:protocol adapter error
       TNS-00517: Lost contact
        Linux Error: 32: Broken pipe

    第二次错误信息截图

    clip_image002


    在metalink上查到ID 550859.1,出现Linux Error: 32: Broken pipe这个错误的原因大概有下面一些因素导致:

    Cause:

    The error 32 indicates the communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.

    Action:

        1. One of reason would be processes parameter being low, and can be verified by the v$resource_limit view.

        2. In Shared Server mode, check the 'lsnrctl services' output and see if the dispatcher has refused any connections, if so, then consider increasing the number of dispatchers.

        3. Check the alert log for any possible errors.

        4. Memory resource is also another cause for this issue. Check the swap, memory usage of the OS.


    1:错误原因之一:processes参数的值偏低

    关于v$resource_limit,如下所示

    Column

    Datatype

    Description

     

    RESOURCE_NAME

    VARCHAR2(30)

    资源名

    Name of the resource

    CURRENT_UTILIZATION

    NUMBER

    资源的当前使用量

    Number of (resources, locks, or processes) currently being used

    MAX_UTILIZATION

    NUMBER

    实例启动以来,资源使用达到的最大值

    Maximum consumption of this resource since the last instance start-up

    INITIAL_ALLOCATION

    VARCHAR2(10)

    初始化值,一般等于limit_value

    Initial allocation. This will be equal to the value specified for the resource in the initialization parameter file (UNLIMITED for infinite allocation).

    LIMIT_VALUE

    VARCHAR2(10)

    系统设置的资源值

    Unlimited for resources and locks. This can be greater than the initial allocation value (UNLIMITED for infinite limit).

    如下所示,我们可以发现自从实例启动以来,processes的最大值为152,而LIMIT_VALUE的值为170,所以可以排除这个原因导致上述错误。

    SQL> select * from v$resource_limit;
     
    RESOURCE_NAME         CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE    
    --------------------- ------------------- --------------- ------------------ ------------
    processes                             113             152        170                170
    sessions                              115             154        192                192
    enqueue_locks                          57             117       2480               2480
    enqueue_resources                      40              86       1064          UNLIMITED
    ges_procs                               0               0          0                  0
    ges_ress                                0               0          0          UNLIMITED
    ges_locks                               0               0          0          UNLIMITED
    ges_cache_ress                          0               0          0          UNLIMITED
    ges_reg_msgs                            0               0          0          UNLIMITED
    ges_big_msgs                            0               0          0          UNLIMITED
    ges_rsv_msgs                            0               0          0                  0
     
    RESOURCE_NAME         CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
    --------------------- ------------------- --------------- ------------------ ------------
    gcs_resources                           0               0          0                  0
    gcs_shadows                             0               0          0                  0
    dml_locks                               0              76        844          UNLIMITED
    temporary_table_locks                   0               3  UNLIMITED          UNLIMITED
    transactions                            2              12        211          UNLIMITED
    branches                                0               1        211          UNLIMITED
    cmtcallbk                               0               1        211          UNLIMITED
    sort_segment_locks                     39             131  UNLIMITED          UNLIMITED
    max_rollback_segments                  11              11        211              65535
    max_shared_servers                      1               1  UNLIMITED          UNLIMITED
    parallel_max_servers                    0               0          0               3600
     
    22 rows selected.
     
    SQL> 

    clip_image003

    2: 如下所示,shared_servers的值大于1,服务器是共享模式

    clip_image004

    客户端连接数据库都是专用服务器模式dedicated server connection,如下所示

    clip_image005

    lsnrctl services 查看时并没有发现dispatcher has refused any connections,所以也可以排除dispatchers数量偏少的原因。这个如下截图所示

    clip_image006

    3: 关于内存资源不足会导致该错误,如下所示,服务器RAM 16G, 物理内存基本使用完,但是Swap基本上没有使用多少,有点嫌疑,但是还是没有明显的证据。

    clip_image007

    查看系统日志messages,如下所示,第一次在02:07有一条错误记录,查看了一些资料后,判定应该是由内存访问越界造成的。这个倒是一个支持内存不足导致监听服务错误的有力证据。

    [root@ceglnx01 log]# more /var/log/messages

    Dec 7 04:02:13 ceglnx01 syslogd 1.4.1: restart.

    Dec 8 02:07:41 ceglnx01 kernel: tnslsnr[8040]: segfault at 18 ip 0000003aee8705d5 sp 00007fff010a07f0 error 4 in libc-2.5.so[3aee80

    0000+14e000]

    Dec 8 07:53:22 ceglnx01 avahi-daemon[3706]: Invalid query packet.

    Dec 8 07:54:02 ceglnx01 last message repeated 7 times

    Dec 8 08:12:18 ceglnx01 avahi-daemon[3706]: Invalid query packet.

    Dec 8 08:20:16 ceglnx01 last message repeated 9 times

    clip_image008

    第二次错误信息,也是发生在凌晨2:07,具体错误信息如下所示:

    clip_image009

     

    Dec 9 02:10:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
     
    Dec 9 02:10:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:10:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
     
    Dec 9 02:10:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
     
    Dec 9 02:10:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
     
    Dec 9 02:10:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
     
    Dec 9 02:10:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? cpumask_next_and+0x20/0x32
     
    Dec 9 02:10:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:10:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     
    Dec 9 02:10:51 ceglnx01 kernel: [] wait_for_completion+0x1d/0x1f
     
    Dec 9 02:10:51 ceglnx01 kernel: [] do_fork+0x2d0/0x339
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? do_page_fault+0x28a/0x299
     
    Dec 9 02:10:51 ceglnx01 kernel: [] sys_vfork+0x25/0x27
     
    Dec 9 02:10:51 ceglnx01 kernel: [] stub_vfork+0x13/0x20
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? system_call_fastpath+0x16/0x1b
     
    Dec 9 02:10:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
     
    Dec 9 02:10:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:10:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
     
    Dec 9 02:10:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
     
    Dec 9 02:10:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
     
    Dec 9 02:10:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
     
    Dec 9 02:10:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:10:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:10:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     
    Dec 9 02:10:51 ceglnx01 kernel: [] wait_for_completion+0x1d/0x1f
     
    Dec 9 02:10:51 ceglnx01 kernel: [] do_fork+0x2d0/0x339
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? do_page_fault+0x28a/0x299
     
    Dec 9 02:10:51 ceglnx01 kernel: [] sys_vfork+0x25/0x27
     
    Dec 9 02:10:51 ceglnx01 kernel: [] stub_vfork+0x13/0x20
     
    Dec 9 02:10:51 ceglnx01 kernel: [] ? system_call_fastpath+0x16/0x1b
     
    Dec 9 02:12:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
     
    Dec 9 02:12:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:12:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
     
    Dec 9 02:12:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
     
    Dec 9 02:12:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
     
    Dec 9 02:12:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
     
    Dec 9 02:12:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? cpumask_next_and+0x20/0x32
     
    Dec 9 02:12:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:12:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     
    Dec 9 02:12:51 ceglnx01 kernel: [] wait_for_completion+0x1d/0x1f
     
    Dec 9 02:12:51 ceglnx01 kernel: [] do_fork+0x2d0/0x339
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? do_page_fault+0x28a/0x299
     
    Dec 9 02:12:51 ceglnx01 kernel: [] sys_vfork+0x25/0x27
     
    Dec 9 02:12:51 ceglnx01 kernel: [] stub_vfork+0x13/0x20
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? system_call_fastpath+0x16/0x1b
     
    Dec 9 02:12:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
     
    Dec 9 02:12:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:12:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
     
    Dec 9 02:12:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
     
    Dec 9 02:12:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
     
    Dec 9 02:12:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
     
    Dec 9 02:12:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:12:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:12:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     
    Dec 9 02:12:51 ceglnx01 kernel: [] wait_for_completion+0x1d/0x1f
     
    Dec 9 02:12:51 ceglnx01 kernel: [] do_fork+0x2d0/0x339
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? do_page_fault+0x28a/0x299
     
    Dec 9 02:12:51 ceglnx01 kernel: [] sys_vfork+0x25/0x27
     
    Dec 9 02:12:51 ceglnx01 kernel: [] stub_vfork+0x13/0x20
     
    Dec 9 02:12:51 ceglnx01 kernel: [] ? system_call_fastpath+0x16/0x1b
     
    Dec 9 02:14:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
     
    Dec 9 02:14:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:14:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
     
    Dec 9 02:14:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
     
    Dec 9 02:14:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
     
    Dec 9 02:14:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
     
    Dec 9 02:14:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? cpumask_next_and+0x20/0x32
     
    Dec 9 02:14:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:14:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     
    Dec 9 02:14:51 ceglnx01 kernel: [] wait_for_completion+0x1d/0x1f
     
    Dec 9 02:14:51 ceglnx01 kernel: [] do_fork+0x2d0/0x339
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? do_page_fault+0x28a/0x299
     
    Dec 9 02:14:51 ceglnx01 kernel: [] sys_vfork+0x25/0x27
     
    Dec 9 02:14:51 ceglnx01 kernel: [] stub_vfork+0x13/0x20
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? system_call_fastpath+0x16/0x1b
     
    Dec 9 02:14:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
     
    Dec 9 02:14:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:14:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
     
    Dec 9 02:14:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
     
    Dec 9 02:14:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
     
    Dec 9 02:14:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
     
    Dec 9 02:14:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:14:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:14:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     
    Dec 9 02:14:51 ceglnx01 kernel: [] wait_for_completion+0x1d/0x1f
     
    Dec 9 02:14:51 ceglnx01 kernel: [] do_fork+0x2d0/0x339
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? do_page_fault+0x28a/0x299
     
    Dec 9 02:14:51 ceglnx01 kernel: [] sys_vfork+0x25/0x27
     
    Dec 9 02:14:51 ceglnx01 kernel: [] stub_vfork+0x13/0x20
     
    Dec 9 02:14:51 ceglnx01 kernel: [] ? system_call_fastpath+0x16/0x1b
     
    Dec 9 02:16:51 ceglnx01 kernel: INFO: task gdb:17053 blocked for more than 120 seconds.
     
    Dec 9 02:16:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:16:51 ceglnx01 kernel: gdb D ffff880427d52840 0 17053 17019 0x00000080
     
    Dec 9 02:16:51 ceglnx01 kernel: ffff8800bbb1dd50 0000000000000086 ffff8800bbb1dd20 0000000000000202
     
    Dec 9 02:16:51 ceglnx01 kernel: ffff880003f78800 ffff88008d93c6c0 ffff880003f78bd0 ffffffff81231b4a
     
    Dec 9 02:16:51 ceglnx01 kernel: 0000000000004800 0000000000000100 ffff8800bbb1de00 7fffffffffffffff
     
    Dec 9 02:16:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? cpumask_next_and+0x20/0x32
     
    Dec 9 02:16:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:16:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     
    Dec 9 02:16:51 ceglnx01 kernel: [] wait_for_completion+0x1d/0x1f
     
    Dec 9 02:16:51 ceglnx01 kernel: [] do_fork+0x2d0/0x339
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? do_page_fault+0x28a/0x299
     
    Dec 9 02:16:51 ceglnx01 kernel: [] sys_vfork+0x25/0x27
     
    Dec 9 02:16:51 ceglnx01 kernel: [] stub_vfork+0x13/0x20
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? system_call_fastpath+0x16/0x1b
     
    Dec 9 02:16:51 ceglnx01 kernel: INFO: task gdb:17069 blocked for more than 120 seconds.
     
    Dec 9 02:16:51 ceglnx01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     
    Dec 9 02:16:51 ceglnx01 kernel: gdb D 0000000000000000 0 17069 17022 0x00000080
     
    Dec 9 02:16:51 ceglnx01 kernel: ffff88008d93fd50 0000000000000082 0000000000000000 0000000000000202
     
    Dec 9 02:16:51 ceglnx01 kernel: ffff88008d93c6c0 ffffffff81aa84c0 ffff88008d93ca90 000000054227a93b
     
    Dec 9 02:16:51 ceglnx01 kernel: 0000000000004c00 0000000000000000 0000000000000000 7fffffffffffffff
     
    Dec 9 02:16:51 ceglnx01 kernel: Call Trace:
     
    Dec 9 02:16:51 ceglnx01 kernel: [] schedule_timeout+0x36/0xe7
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? __enqueue_entity+0x79/0x7b
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? enqueue_entity+0x263/0x270
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? need_resched+0x23/0x2d
     
    Dec 9 02:16:51 ceglnx01 kernel: [] wait_for_common+0xb7/0x12c
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? default_wake_function+0x0/0x19
     
    Dec 9 02:16:51 ceglnx01 kernel: [] ? task_rq_unlock+0x11/0x13
     

    然后在metalink文档 ID 549932.1中发现如下一些信息

    APPLIES TO:

    Oracle Net Services - Version 10.2.0.1 to 11.1.0.7 [Release 10.2 to 11.1]

    Generic UNIX

    ***Checked for relevance on 22-MAR-2013***

    SYMPTOMS

    · There may be heavy load on the CPU shooting up to 100%.

    · The number of sessions in the database is well below the upper or maximum limit defined in the parameter file.

    · The listener crashes suddenly during this heavy CPU load generating the core.

    · (Optional) Listener.Ora has SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF.

    Listener Log:

    -------------

    .....

    .....

    19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:pegasos eusysinuniface.exe)(HOST=TERVI-NB179)(USER=kjokioja))) * (ADDRESS= (PROTOCOL=tcp)(HOST=10.12.152.5)(PORT=1670)) * establish * ORAC * 12518

    TNS-12518: TNS:listener could not hand off client connection

    TNS-12571: TNS:packet writer failure

    TNS-12560: TNS:protocol adapter error

    TNS-00530: Protocol adapter error

    Linux Error: 104: Connection reset by peer

    19-NOV-2007 13:40:49 * (CONNECT_DATA=(SID=ORAC)(CID=(PROGRAM=C:pegaos eusysinuniface.exe)(HOST=TERVI-0184A)(USER=paitasal))) * (ADDRESS=(PROTO COL=tcp)(HOST=10.12.176.136)(PORT=1574)) * establish * ORAC * 12518

    TNS-12518: TNS:listener could not hand off client connection

    TNS-12547: TNS:lost contact

    TNS-12560: TNS:protocol adapter error

    TNS-00517: Lost contact

    Linux Error: 32: Broken pipe

    .........

    ........

    The Operating system log (/var/log/messages) may show the following :

    tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4

    tnslsnr[6469]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9420 error 4

    tnslsnr[7375]: segfault at 0000000000000018 rip 0000003eab668bb3 rsp 0000007fbfff9c70 error 4

    Gdb on the core file generated shows the following stack :

    #gdb /home/oracle/db_1/bi/tnslsnr core.7375 Core was generated by `/opt/oracle/ora102/bin/tnslsnr LISTENER -inherit'.

    Program terminated with signal 11, Segmentation fault.

    Reading symbols from /opt/oracle/ora102/lib/libclntsh.so.10.1...(no debugging symbols found)...done.

    ........

    .........

    (gdb) bt

    #0 0x00000032b7468bb3 in _int_free () from /lib64/tls/libc.so.6

    #1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6

    #2 0x000000000040f01f in nsglhe ()

    #3 0x00000000004114e9 in nsglma ()

    #4 0x00000000004061cb in main ()

    (gdb) where

    #0 0x00000032b7468bb3 in _int_free () from /lib64/tls/libc.so.6

    #1 0x00000032b74691f6 in free () from /lib64/tls/libc.so.6

    #2 0x000000000040f01f in nsglhe ()

    #3 0x00000000004114e9 in nsglma ()

    #4 0x00000000004061cb in main ()

    The core indicates that the program terminated with signal 11, Segmentation fault .

    SIGSEGV is reported for improper memory handling .The default action for a program upon receiving

    SIGSEGV is abnormal termination. This action will end the process.

    The virtual memory stats (vmstat) output may show huge paging/swapping activity .

    For Example:

    -------------

    $vmstat 5 7 The si and so column under the swap section suggests the amount of memory swapped in from disk (/s)and amount of memory swapped to disk (/s) respectively.

    procs                      memory    swap          io     system         cpu  
    r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id  
    2  0  0  27124   3940   1160  74528   0 1304  956  5360  272   191  51  21  98  
    3  0  0  27080   3756   1180  72580   0 260  2552   388  218   429  90  10  89  
    14  2  1  26808   5096   1188  69868  84 1108 2016  9064  490   567  59  22  96  
    4  0  0  25548   5912   1192  73032   0   0   436     0  478   736  50  50   0  
    5  0  0  25548   3940   1192  73548   0   0  1560     0  301   385  93   7   0  
    1  1  1  25548   3336   1192  71800   8 176  2848   432  258   147  91   9  88  
    1  0  0  25544   4124   1200  70480 116  60   836    60  171   200  97   3  92 

    Note: You may also use the top command to check the system memory usage.

    CAUSE

    Extensive paging/swapping activity is a clear indication that the system is running out of the physical memory.

    SOLUTION

    Solution :

    1. Increase the physical memory of the system.

    OR

    2. Apply the Patch 6139856 for unpublished Bug 6139856 if available for your platform.

    OR

    3. Configure Hugepages on the OS. Ref : Note 361323.1

    A Note on HugePages:

    Hugepages is a mechanism that allows the Linux kernel to utilise the multiple page size capabilities of modern hardware architectures. Its important to know that utilising a huge amount of physical memory with the default page size consumes the TLB and adds processing overhead. The Linux kernel is able to set aside a portion of physical memory to be able be addressed using a larger page size. Since the page size is higher, there will be less overhead managing the pages with the TLB. In the Linux 2.6 series of kernels, hugepages is enabled using the CONFIG_HUGETLB_PAGE feature when the kernel is built. Systems with large amount of memory can be configured to utilise the memory more efficiently by setting aside a portion dedicated for hugepages. The actual size of the page is dependent on the system architecture. A typical x86 system will have a Huge Page Size of 2048 kBytes. The huge page size may be found by looking at the /proc/meminfo :

    # cat /proc/meminfo |grep Hugepagesize Hugepagesize: 2048 kB

    For more on configuring Hugepages please refer Note 361323.1

    REFERENCES

    BUG:6752308 - LISTENER DIED BY SEGFAULT AFTER TNS ERROR

    NOTE:361323.1 - HugePages on Linux: What It Is... and What It Is Not...

    如上官方文档所示,有三个解决方案:

    1: 增加系统物理内存

    2: 打补丁Patch 6139856

    3: 启用HugePage设置

     

    鉴于当时操作系统有100多天没有重启过了,于是在2014-12-12 23:00重启了一下Linux服务器,到目前为止已经运行了3天,暂时没有出现这个错误。因为有可能一些内存泄露也会导致内存资源不足情况,例如,TNSListener Leaking Memory Using Dedicated Server (文档 ID 785742.1)。所以暂时没有实施上面方案2、3、想运行一段时间,验证一下自己的想法,如果还是出现这个错误,则尝试方案2、3、

    [root@ceglnx01 ~]# who -b

    system boot 2014-12-12 23:11

    从另外一方面分析,在凌晨2点过几分出现这个错误,是因为,有两个比较大的作业在此时运行。消耗的服务器资源比较大,所以也从侧面验证了内存资源不足。

    另外,在网上看到listener.log日志文件大于2G时,也可能出现这个问题。所以有时候也需要排查“监听日志文件过大可能也是导致监听器自己服务宕掉”。

     

     





    http://blog.csdn.net/killvoon/article/details/49420485


    oracle 10.2.0.4

    一生产系统监听异常停止了,listener.log中报出如下错误:


    TNS-12518: TNS:listener could not hand off client connection
     TNS-12547: TNS:lost contact
      TNS-12560: TNS:protocol adapter error
       TNS-00517: Lost contact
        Linux Error: 32: Broken pipe


    并且操作系统日志/var/log/messages中抛出类似如下错误:


    tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4 

    且有一些cpu和内存负载过高,自己kill进程的信息

    在metalink上有这篇文档:549932.1

    版本:

    Oracle Net Services - Version 10.2.0.1 to 11.1.0.7 [Release 10.2 to 11.1]
    Generic UNIX
    ***Checked for relevance on 22-MAR-2013***

    问题现象:

     

    • There may be heavy load on the CPU shooting up to 100%.
    • The number of sessions in the database is well below the upper or maximum limit defined in the parameter  file.
    • The listener crashes suddenly during  this heavy CPU load generating the core.
    • (Optional) Listener.Ora has SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF.

    导致原因:


    Extensive paging/swapping activity is a clear  indication that the system is running out of the physical memory.

    解决方法:


    1. Increase the physical memory of the system. 
                                 OR 
    2. Apply the Patch 6139856 for unpublished Bug 6139856 if available for your platform.

                                 OR

    3. Configure Hugepages on the OS. Ref : Note 361323.1

    --------------------------------------------------------------------------------------------------------------

    算是oracle bug问题了,当操作系统物理内存不足,swap/page 耗尽,将会导致listener异常崩溃。

    而且从操作系统日志中,可以看到linux自己kill 进程的信息(由于事后总结,且信息在内网内,权限有限,贴不出日志内容)。

    所以我的理解就是,当操作系统物理内存居高不下,操作系统会自己杀掉一些他认为的空闲进程之类,而不巧,杀掉的恰好是oracle的监听进程,

    从而导致监听异常崩溃。

    之所以说恰好是监听进程,是因为在/var/log/messages中,看到之前也有杀掉oracle进程的信息,但当时监听并未停掉,所以怀疑当时杀掉的并不是oracle监听进程,可能是其他非本地进程。






    我的解决参考:http://www.itpub.net/thread-1870217-1-1.html

    论坛各位大拿,帮帮忙吧。
    问题描述:
    服务器断电重启后客户端无法登陆。数据库本地 sqlplus / as sysdba可以登录。
    C:Usersduchanglei>sqlplus 
    username/password@jkxt21

    SQL*Plus: Release 11.2.0.1.0 Production on Tue Jun 3 16:42:05 2014
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    ERROR:
    ORA-12537: TNS:connection closed


    环境:数据库
    [oracle@jkxtrac1 bin]$ sqlplus / as sysdba
    SQL*Plus: Release 11.2.0.4.0 Production on Tue Jun 3 16:46:12 2014
    Copyright (c) 1982, 2013, Oracle.  All rights reserved.

    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
    With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
    Data Mining and Real Application Testing options
    环境:操作系统
    [root@jkxtrac1 log]# lsb_release -a
    LSB Version:    :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
    Distributor ID: RedHatEnterpriseServer
    Description:    Red Hat Enterprise Linux Server release 5.6 (Tikanga)
    Release:        5.6
    Codename:       Tikanga

    监听日志:


    type='UNKNOWN' level='16' host_id='jkxtrac1'
    host_addr='10.199.102.21'>
    TNS-12518: TNS:listener could not hand off client connection
    TNS-12547: TNS:lost contact
      TNS-12560: TNS:protocol adapter error
       TNS-00517: Lost contact
        Linux Error: 32: Broken pipe


    alert日志
    没有相关信息,没有报任何错误。
    操作系统日志:
    [root@jkxtrac1 log]# tail -n50 messages
    Jun  3 14:27:42 jkxtrac1 avahi-daemon[8081]: Registering new address record for 10.199.102.23 on eth0.
    Jun  3 14:27:42 jkxtrac1 avahi-daemon[8081]: Withdrawing address record for 10.199.102.23 on eth0.
    Jun  3 14:27:42 jkxtrac1 avahi-daemon[8081]: Registering new address record for 10.199.102.23 on eth0.
    Jun  3 14:27:42 jkxtrac1 avahi-daemon[8081]: Withdrawing address record for 10.199.102.23 on eth0.
    Jun  3 14:27:42 jkxtrac1 avahi-daemon[8081]: Registering new address record for 10.199.102.23 on eth0.
    Jun  3 14:28:56 jkxtrac1 gconfd (root-10454): 正在启动(版本 2.14.0),pid 10454 用户“root”
    Jun  3 14:28:56 jkxtrac1 gconfd (root-10454): 地址“xml:readonly:/etc/gconf/gconf.xml.mandatory”解析为位于 0 的只读配置源
    Jun  3 14:28:56 jkxtrac1 gconfd (root-10454): 地址“xml:readwrite:/root/.gconf”解析为位于 1 的可写入配置源
    Jun  3 14:28:56 jkxtrac1 gconfd (root-10454): 地址“xml:readonly:/etc/gconf/gconf.xml.defaults”解析为位于 2 的只读配置源
    Jun  3 14:28:58 jkxtrac1 gconfd (root-10454): 地址“xml:readwrite:/root/.gconf”解析为位于 0 的可写入配置源
    Jun  3 14:28:58 jkxtrac1 hcid[7308]: Default passkey agent (:1.6, /org/bluez/applet) registered
    Jun  3 14:28:58 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
    Jun  3 14:28:58 jkxtrac1 last message repeated 2 times
    Jun  3 14:28:59 jkxtrac1 nm-system-settings: Loaded plugin ifcfg-rh: (c) 2007 - 2008 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
    Jun  3 14:28:59 jkxtrac1 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ... 
    Jun  3 14:28:59 jkxtrac1 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ... 
    Jun  3 14:28:59 jkxtrac1 nm-system-settings:    ifcfg-rh:     read connection 'System eth1'
    Jun  3 14:28:59 jkxtrac1 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-usb0 ... 
    Jun  3 14:28:59 jkxtrac1 nm-system-settings:    ifcfg-rh:     read connection 'System usb0'
    Jun  3 14:28:59 jkxtrac1 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ... 
    Jun  3 14:28:59 jkxtrac1 nm-system-settings:    ifcfg-rh:     read connection 'System eth0'
    Jun  3 14:29:01 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
    Jun  3 14:39:53 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 1fe 2002.
    Jun  3 14:53:04 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 34f 2002.
    Jun  3 15:00:18 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
    Jun  3 15:32:43 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
    Jun  3 15:49:19 jkxtrac1 scim-bridge: The lockfile is destroied
    Jun  3 15:49:19 jkxtrac1 scim-bridge: Cleanup, done. Exitting...
    Jun  3 16:03:49 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
    Jun  3 16:14:00 jkxtrac1 kernel: qla2xxx 0000:8b:00.1: scsi(4:1:3): Abort command issued -- 1 cd7 2002.
    Jun  3 16:28:59 jkxtrac1 scim-bridge: Failed to open the panel socket
    Jun  3 16:29:00 jkxtrac1 scim-bridge: Failed to open the panel socket
    Jun  3 16:29:00 jkxtrac1 scim-bridge: Panel client has not yet been prepared
    Jun  3 16:29:00 jkxtrac1 last message repeated 3 times
    Jun  3 16:29:00 jkxtrac1 scim-bridge: The lockfile is destroied
    Jun  3 16:29:00 jkxtrac1 scim-bridge: Cleanup, done. Exitting...
    Jun  3 16:29:00 jkxtrac1 Cleanup, done. Exitting... 
    Jun  3 16:29:27 jkxtrac1 gconfd (root-10454): GConf 服务器不在使用,正在关闭。
    Jun  3 16:29:27 jkxtrac1 gconfd (root-10454): 退出
    Jun  3 16:30:43 jkxtrac1 kernel: scim-launcher[10521] trap stack segment rip:2b38d4ea70f1 rsp:7fffacbf26f0 error:0
    Jun  3 16:32:13 jkxtrac1 gconfd (root-1975): 正在启动(版本 2.14.0),pid 1975 用户“root”
    Jun  3 16:32:13 jkxtrac1 gconfd (root-1975): 地址“xml:readonly:/etc/gconf/gconf.xml.mandatory”解析为位于 0 的只读配置源
    Jun  3 16:32:13 jkxtrac1 gconfd (root-1975): 地址“xml:readwrite:/root/.gconf”解析为位于 1 的可写入配置源
    Jun  3 16:32:13 jkxtrac1 gconfd (root-1975): 地址“xml:readonly:/etc/gconf/gconf.xml.defaults”解析为位于 2 的只读配置源
    Jun  3 16:32:14 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
    Jun  3 16:32:14 jkxtrac1 last message repeated 2 times
    Jun  3 16:32:14 jkxtrac1 gconfd (root-1975): 地址“xml:readwrite:/root/.gconf”解析为位于 0 的可写入配置源
    Jun  3 16:32:14 jkxtrac1 hcid[7308]: Default passkey agent (:1.18, /org/bluez/applet) registered
    Jun  3 16:32:17 jkxtrac1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
    Jun  3 16:34:54 jkxtrac1 CLSD: The clock on host jkxtrac1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

    网络上说资源问题,我的资源很多啊:

    top - 16:53:12 up  2:28,  2 users,  load average: 0.21, 0.29, 0.27
    Tasks: 665 total,   2 running, 662 sleeping,   0 stopped,   1 zombie
    Cpu(s):  0.7%us,  0.1%sy,  0.0%ni, 99.0%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:  32889308k total,  5147440k used, 27741868k free,   181984k buffers
    Swap: 34933332k total,        0k used, 34933332k free,  3341088k cached
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                 
    2154 root      15   0  328m  19m  10m R 20.7  0.1   0:35.01 gnome-terminal                                                          
      471 root      10  -5     0    0    0 S  0.9  0.0   0:36.27 kacpid                                                                  
    5947 root      15   0 13160 1568  828 R  0.7  0.0   0:00.58 top                                                                     
    8764 grid      15   0  245m  31m  12m S  0.7  0.1   0:15.21 gipcd.bin                                                               
    9342 grid      15   0 1323m  57m  40m S  0.7  0.2   0:48.72 oracle                                                                  
    7476 root      15   0  3824  600  496 S  0.5  0.0   0:06.27 acpid                                                                   
    8723 grid      15   0  306m  37m  16m S  0.5  0.1   0:29.02 oraagent.bin                                                            
    8763 root      16   0  292m  35m  15m S  0.5  0.1   0:35.45 orarootagent.bi                                                         
    9615 root      15   0  272m  25m  14m S  0.5  0.1   0:42.74 orarootagent.bi                                                         
    7036 root      10  -5     0    0    0 S  0.2  0.0   0:07.41 kondemand/9                                                             
    7042 root      10  -5     0    0    0 S  0.2  0.0   0:03.46 kondemand/15                                                            
    7050 root      10  -5     0    0    0 S  0.2  0.0   0:00.06 kondemand/23                                                            
    8750 grid      15   0  174m  25m  11m S  0.2  0.1   0:04.80 gpnpd.bin                                                               
    8798 root      RT   0  249m  92m  56m S  0.2  0.3   0:04.35 cssdmonitor                                                             
    9019 root      17   0  241m  22m  10m S  0.2  0.1   0:02.35 octssd.bin                                                              
    9346 grid      15   0 1314m  30m  21m S  0.2  0.1   0:02.10 oracle                                                                  
    30771 oracle    15   0 12.8g  51m  36m S  0.2  0.2   0:09.80 oracle                                                                  
    30955 oracle    15   0 12.8g  42m  39m S  0.2  0.1   0:01.83 oracle  

    SQL> select * from v$resource_limit;
    RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION                       LIMIT_VALUE
    ------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
    processes                                       51              54       1000                                     1000
    sessions                                        59              64       1536                                     1536
    enqueue_locks                                   39              52      19523                                    19523
    enqueue_resources                               36              36       6976                                UNLIMITED
    ges_procs                                       47              49       1001                                     1001
    ges_ress                                      9448            9448      32571                                UNLIMITED
    ges_locks                                     4771            5092      49133                                UNLIMITED
    ges_cache_ress                                1649            1649          0                                UNLIMITED
    ges_reg_msgs                                   112             134       2730                                UNLIMITED
    ges_big_msgs                                    35              78       2730                                UNLIMITED
    ges_rsv_msgs                                     0               0       1000                                     1000
    RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION                       LIMIT_VALUE
    ------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
    gcs_resources                                36726           36726  UNLIMITED                                UNLIMITED
    gcs_shadows                                  36561           36563  UNLIMITED                                UNLIMITED
    smartio_overhead_memory                          0               0          0                                UNLIMITED
    smartio_buffer_memory                            0               0          0                                UNLIMITED
    smartio_metadata_memory                          0               0          0                                UNLIMITED
    smartio_sessions                                 0               0          0                                UNLIMITED
    dml_locks                                        0               0       6756                                UNLIMITED
    temporary_table_locks                            0               0  UNLIMITED                                UNLIMITED
    transactions                                     0               0       1689                                UNLIMITED
    branches                                         0               0       1689                                UNLIMITED
    cmtcallbk                                        0               2       1689                                UNLIMITED
    RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION                       LIMIT_VALUE
    ------------------------------ ------------------- --------------- ---------------------------------------- ----------------------------------------
    max_rollback_segments                           11              11       1689                                    65535
    sort_segment_locks                               0               1  UNLIMITED                                UNLIMITED
    k2q_locks                                        0               0       3072                                UNLIMITED
    max_shared_servers                               1               1  UNLIMITED                                UNLIMITED
    parallel_max_servers                             1               1        970                                     3600
    27 rows selected.

    重启数据库服务器2次,也没弄好、。

    问题解决,结贴。
    方法:
    1、修改oracle.exe的权限为6751;
    [oracle@jkxtrac1 bin]$ ls -l ./oracle
    -rwxr-s--x 1 oracle asmadmin 239627031 Jan 21 18:59 ./oracle
    [oracle@jkxtrac1 bin]$ ls -l oracle
    -rwxr-s--x 1 oracle asmadmin 239627031 Jan 21 18:59 oracle
    [oracle@jkxtrac1 bin]$ chmod 6751 oracle
    [oracle@jkxtrac1 bin]$ ls -l oracle
    -rwsr-x--x 1 oracle asmadmin 239627031 Jan 21 18:59 oracle

    2、重新用setasmgidwrap设置oracle.exe
    [root@jkxtrac1 ~]# cd /data/app/11.2.0/grid_1/bin/
    [root@jkxtrac1 bin]# ./setasmgidwrap o=/home/oracle/app/oracle/product/11.2.0/dbhome_1/bin/oracle

    感谢各位朋友的帮助;;





    About Me

    ...............................................................................................................................

    本文作者:小麦苗,只专注于数据库的技术,更注重技术的运用

    本文在itpubhttp://blog.itpub.net/26736162)、博客园http://www.cnblogs.com/lhrbest和个人微信公众号(xiaomaimiaolhr)上有同步更新

    本文itpub地址:http://blog.itpub.net/26736162/viewspace-2135468/

    本文博客园地址:http://www.cnblogs.com/lhrbest/p/6559092.html

    本文pdf小麦苗云盘地址:http://blog.itpub.net/26736162/viewspace-1624453/

    ● QQ群:230161599     微信群:私聊

    联系我请加QQ好友(642808185),注明添加缘由

    2017-03-16 08:00 ~ 2017-03-16 19:00 在泰兴公寓完成

    文章内容来源于小麦苗的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解

    版权所有,欢迎分享本文,转载请保留出处

    ...............................................................................................................................

    拿起手机使用微信客户端扫描下边的左边图片来关注小麦苗的微信公众号:xiaomaimiaolhr,扫描右边的二维码加入小麦苗的QQ群,学习最实用的数据库技术。

       DBA笔试面试讲解

  • 相关阅读:
    DGA域名可以是色情网站域名
    使用cloudflare加速你的网站隐藏你的网站IP
    167. Two Sum II
    leetcode 563. Binary Tree Tilt
    python 多线程
    leetcode 404. Sum of Left Leaves
    leetcode 100. Same Tree
    leetcode 383. Ransom Note
    leetcode 122. Best Time to Buy and Sell Stock II
    天津Uber优步司机奖励政策(12月28日到12月29日)
  • 原文地址:https://www.cnblogs.com/lhrbest/p/6559092.html
Copyright © 2011-2022 走看看