zoukankan      html  css  js  c++  java
  • 疑难杂症--SQL SERVER 18056的错误

    朋友遇到一个很棘手的问题,查看服务器日志,报以下错误:

    01/21/2014 11:47:43,spid296,未知,错误: 18056,严重性: 20,状态: 2901/21/2014 11:47:43,spid495,未知,
    The client was unable to reuse a session with SPID 495<c/> 
    which had been reset for connection pooling. The failure ID is 29. 
    This error may have been caused by an earlier operation failing. 
    Check the error logs for failed operations immediately before this error message.

    百度一下, 找到相关文章:http://blog.csdn.net/yangzhawen/article/details/8209167

    一方面让开发从IIS角度去解决,另一方面从SQL SERVER入手,继续查看错误日志,发现以下错误:

    01/21/2014 11:46:10,spid8s,未知,
    SQL Server has encountered 3 occurrence(s) of I/O requests taking longer 
    than 15 seconds to complete on file [H:templog.ldf] in database [tempdb]
      
    The OS file handle is 0x0000000000001254.  
    The offset of the latest long I/O is: 0x000000184a6a00
    
    01/21/2014 11:46:10,spid8s,未知,
    SQL Server has encountered 3 occurrence(s) of I/O 
    requests taking longer than 15 seconds to complete 
    on file [H:HisDataTXX.mdf] in database [xxx] 
    
    The OS file handle is 0x0000000000001268.  
    The offset of the latest long I/O is: 0x00000349a6e000

    使用以下代码查看当前耗CPU和IO比较多的执行计划

    --最耗费CPU的前个查询以及它们的执行计划
    SETTRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
    WITHTEMP AS(
    SELECTCAST((qs.total_worker_time) / 1000000.0 AS DECIMAL(28,2)) AS TotalSecondsForCPUTime
    ,CAST(qs.total_worker_time* 100.0 / qs.total_elapsed_time  AS DECIMAL(28,2)) AS CPUPersent
    ,CAST((qs.total_elapsed_time- qs.total_worker_time)* 100.0 / qs.total_elapsed_time AS DECIMAL(28, 2)) AS WaitingPersent
    ,qs.execution_countExecutionCount
    ,CAST((qs.total_worker_time)/ 1000000.0 / qs.execution_count AS DECIMAL(28, 2)) AS AvgSecondsForCPUTime
    ,SUBSTRING (qt.text,(qs.statement_start_offset/2) + 1,     
        ((CASEWHEN qs.statement_end_offset = -1
          THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
          ELSE qs.statement_end_offset
          END - qs.statement_start_offset)/2) + 1) AS IndividualQuery
    ,qt.text AS ParentQuery
    ,DB_NAME(qt.dbid) AS DatabaseName
    ,qp.query_planQueryPlan
    FROM sys.dm_exec_query_stats qs
    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
    CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
    WHEREqs.total_elapsed_time > 0 )
     
    SELECTTOP(20)* FROM TEMP
    ORDERBY TEMP.TotalSecondsForCPUTime DESC
     
    --最占IO的前个查询以及它们的执行计划
    SETTRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
    WITHTEMP AS(
    SELECT (qs.total_logical_reads + qs.total_logical_writes) AS TotalIO
    ,(qs.total_logical_reads+ qs.total_logical_writes) / qs.execution_count AS AvgIO
    ,qs.execution_count AS ExecutionCount
    ,SUBSTRING (qt.text,(qs.statement_start_offset/2) + 1,     
    ((CASEWHEN qs.statement_end_offset = -1
    THENLEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
    ELSE qs.statement_end_offset
    END- qs.statement_start_offset)/2) + 1) AS IndividualQuery
    ,qt.text AS ParentQuery
    , DB_NAME(qt.dbid) AS DatabaseName
    ,qp.query_plan AS QueryPlan
    FROM sys.dm_exec_query_stats qs
    CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt
    CROSSAPPLY sys.dm_exec_query_plan(qs.plan_handle) qp)
     
    SELECTTOP(20)* FROM TEMP
    ORDERBY TEMP.TotalIO DESC
    View Code

    发现一部分耗时长和IO高的执行语句,提起这些语句待分析优化

    在使用以下代码来查看当前请求和阻塞:

    SELECT
        SPID                = er.session_id 
        ,STATUS             = ses.STATUS 
        ,[LOGIN]            = ses.login_name 
        ,HOST               = ses.host_name 
        ,BlkBy              = er.blocking_session_id 
        ,DBName             = DB_NAME(er.database_id) 
        ,CommandType        = er.command 
        ,SQLStatement       = st.text 
        ,ObjectName         = OBJECT_NAME(st.objectid) 
        ,ElapsedMS          = er.total_elapsed_time 
        ,CPUTime            = er.cpu_time 
        ,IOReads            = er.logical_reads + er.reads 
        ,IOWrites           = er.writes 
        ,LastWaitType       = er.last_wait_type 
        ,StartTime          = er.start_time 
        ,Protocol           = con.net_transport 
        ,ConnectionWrites   = con.num_writes 
        ,ConnectionReads    = con.num_reads 
        ,ClientAddress      = con.client_net_address 
        ,Authentication     = con.auth_scheme 
    FROM sys.dm_exec_requests er 
    OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) st 
    LEFT JOIN sys.dm_exec_sessions ses 
    ON ses.session_id = er.session_id 
    LEFT JOIN sys.dm_exec_connections con 
    ON con.session_id = ses.session_id 
    WHERE er.session_id > 50 
    ORDER BY er.blocking_session_id DESC,er.session_id
    View Code


    发现以下问题:

    MSDN上有如下介绍:

    ConnectionWrites:此连接中已发生的读包次数。可为空值。
    ConnectionReads:此连接中已发生的写数据包次数。可为空值。


    在普通的OLTP数据上,ConnectionWrites和ConnectionReads 基本在几十到上百,而在这台服务器上达到了352W之多,经确认,192.168.8.16 上是新产品服务器,服务器出现问题与该产品上线时间基本吻合,基本可断定问题根源就是该新产品。

    剩下问题表示分析为什么需要如此庞大的网络包,寻找解决之道。

  • 相关阅读:
    (转)tomcat 修改默认访问项目名称和项目发布路径
    ftp 传输数据:命令链路连接方法是一样的,而数据链路的建立方法就完全不同
    在Linux下,如何分析一个程序达到性能瓶颈的原因
    arp绑定IP
    xrange 和range的区别
    oa tomcat 代码处理跨域问题
    python inspect.stack() 的简单使用
    supervisor 结合 Dockerfile ENTRYPOINT
    (转)tcp/ip协议的简单理解 -- ip报文和tcp报文的格式
    linux 修改内核参数 如何生效?
  • 原文地址:https://www.cnblogs.com/TeyGao/p/3528745.html
Copyright © 2011-2022 走看看