zoukankan      html  css  js  c++  java
  • 数据库系统异常排查之DMV(转)

       来源: http://www.cnblogs.com/fygh/archive/2012/03/12.html

    数据库系统异常是DBA经常要面临的情景,一名有一定从业经验的DBA,都会有自己一套故障排查的方法和步骤,此文为为大家介绍一下通过系统

    性能视图(SQLServer05以上版本)来排查系统异常的基本方法,希望能对大家有所帮助。

    这里分两部分来介绍:

    一.  从数据库连接情况来判断异常:

    1. 首先我们来看一下目前数据库系统所有请求情况:

    --request info
    select s.session_id, s.status,db_name(r.database_id) as database_name,
    s.login_name,s.login_time, s.host_name,
    c.client_net_address,c.client_tcp_port,s.program_name,
    r.cpu_time, r.reads, r.writes,c.num_reads,c.num_writes,
    s.client_interface_name,
    s.last_request_start_time, s.last_request_end_time,
    c.connect_time, c.net_transport, c.net_packet_size,
    r.start_time, r.status, r.command,
    r.blocking_session_id, r.wait_type,
    r.wait_time, r.last_wait_type, r.wait_resource, r.open_transaction_count,
    r.percent_complete,r.granted_query_memory
    from Sys.dm_exec_requests r with(nolock)
    right outer join Sys.dm_exec_sessions s with(nolock)
    on r.session_id = s.session_id
    right outer join Sys.dm_exec_connections c with(nolock)
    on s.session_id = c.session_id
    where s.session_id >50
    order by s.session_id

    某台生产机运行情况:

        这个查询将目前数据库中的所有请求都显示出来了,其中比较重要的有Status、Login_name、Host_Name,Client_Net_Address、Program_name

    等,但是信息比较多,我们很难查看有什么异常,但是可以通过一图中红色圈的数字:441 初步判断连接数是否超过了平时的标准(很多时候系统异常是连接

    数过多造成的,而连接数过多又是因为其他原因影响的)。

    2. 哪个用户连接数最多:

    --request info by user
    select login_name,COUNT(0) user_count
    from Sys.dm_exec_requests r with(nolock)
    right outer join Sys.dm_exec_sessions s with(nolock)
    on r.session_id = s.session_id
    right outer join Sys.dm_exec_connections c with(nolock)
    on s.session_id = c.session_id
    where s.session_id >50
    group by login_name
    order by 2 desc

    运行结果:

    从图中我们可以很方便的看出用户连接数情况,如果我们的不同的功能是使用不同的的数据库账号的话,就能初步判断是哪部分功能可能出现了异常。

    3. 哪台机器发起到数据库的连接数最多:

    --request info by hostname
    select s.host_name,c.client_net_address,COUNT(0) host_count
    from Sys.dm_exec_requests r with(nolock)
    right outer join Sys.dm_exec_sessions s with(nolock)
    on r.session_id = s.session_id
    right outer join Sys.dm_exec_connections c with(nolock)
    on s.session_id = c.session_id
    where s.session_id >50
    group by host_name,client_net_address
    order by 3 desc

    运行结果:

       这个查询能够一下就帮我们找出来哪些机器发起了对数据库的链接,它们的链接数量是否有异常;这个其实对调查某些问题非常有用,我有一次就遇

    到一个case:

    用户反映,过一两个星期,系统就会出现一次异常,出问题时数据库连接数量很高,大量的访问被数据库拒绝,过半个小时左右,系统又自动恢复了,但是

    在数据库里面查看,并没有发现有异常的进程和错误的信息,问题一时很棘手,很难定位,系统不稳定领导不满,DBA顶着压力一时不知道如何是好;后面

    转换方向,通过调查问题发生时,为什么会产生这么多连接,这些连接是那些机器发过来的,这些连接发过来正常吗,是数据库不砍业务的重负,还是业务

    在某个时间段内会出现暴涨等一系列原因,最终找出是一台Web因为开发人员代码写的有问题,内存出现内存泄露,导致大量的连接不能释放,出问题是,

    发出的数据库连接数比平时高3-4倍,最终影响到了数据库,问题压根和数据库没关系(从这个事实看出,DBA真是的炮灰角色,不是自己的问题,也得顶

    着压力调查出原因呀);如果在类似问题发生时,我们能通过这个查询及早知道问题是出在某台Web机器上,那就不用费尽心力来调查数据库了。

    4. 这些连接在访问哪个库:

    --request info by databases
    select db_name(r.database_id) as database_name,COUNT(0) host_count
    from Sys.dm_exec_requests r with(nolock)
    right outer join Sys.dm_exec_sessions s with(nolock)
    on r.session_id = s.session_id
    right outer join Sys.dm_exec_connections c with(nolock)
    on s.session_id = c.session_id
    where s.session_id >50
    group by r.database_id
    order by 2 desc

    结果(为NULL的估计是没办法定位库):

    5. 进程状态:

    --request info by status
    select s.status,COUNT(0) host_count
    from Sys.dm_exec_requests r with(nolock)
    right outer join Sys.dm_exec_sessions s with(nolock)
    on r.session_id = s.session_id
    right outer join Sys.dm_exec_connections c with(nolock)
    on s.session_id = c.session_id
    where s.session_id >50
    group by s.status
    order by 2 desc

    结果(running数比较多,表面数据库压力比较大):

    二. 从阻塞情况来判断异常(这部分内容不再一一贴图,直接上脚本):

    1. 查看数据库阻塞情况:

    ----------------------------------------Blocked Info----------------------------------
    --记录当前阻塞信息
    select t1.resource_type as [lock type] ,db_name(resource_database_id) as [database]
    ,t1.resource_associated_entity_id as [blk object]
    ,t1.request_mode as [lock req] -- lock requested
    ,t1.request_session_id as [waiter sid] -- spid of waiter
    ,t2.wait_duration_ms as [wait time]
    ,(select text from sys.dm_exec_requests as r with(nolock) --- get sql for waiter
    cross apply sys.dm_exec_sql_text(r.sql_handle)
    where r.session_id = t1.request_session_id) as waiter_batch
    ,(select substring(qt.text,r.statement_start_offset/2,
    (case when r.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2
    else r.statement_end_offset end - r.statement_start_offset)/2+1)
    from sys.dm_exec_requests as r with(nolock)
    cross apply sys.dm_exec_sql_text(r.sql_handle) as qt
    where r.session_id = t1.request_session_id) as waiter_stmt --- statement executing now
    ,t2.blocking_session_id as [blocker sid] --- spid of blocker
    ,(select text from sys.sysprocesses as p with(nolock) --- get sql for blocker
    cross apply sys.dm_exec_sql_text(p.sql_handle)
    where p.spid = t2.blocking_session_id) as blocker_stmt,getdate() time
    from sys.dm_tran_locks as t1 with(nolock) , sys.dm_os_waiting_tasks as t2 with(nolock)
    where t1.lock_owner_address = t2.resource_address

    2. 查看阻塞其他进程的进程(阻塞源头):

    --阻塞其他session的session 
    select t2.blocking_session_id,COUNT(0) counts
    from sys.dm_tran_locks as t1 with(nolock) , sys.dm_os_waiting_tasks as t2 with(nolock)
    where t1.lock_owner_address = t2.resource_address
    group by blocking_session_id
    order by 2

    3. 被阻塞时间最长的进程:

    --被阻塞时间最长的session
    select top 10 t1.resource_type as [lock type] ,db_name(resource_database_id) as [database]
    ,t1.resource_associated_entity_id as [blk object]
    ,t1.request_mode as [lock req] -- lock requested
    ,t1.request_session_id as [waiter sid] -- spid of waiter
    ,t2.wait_duration_ms as [wait time]
    ,(select text from sys.dm_exec_requests as r with(nolock) --- get sql for waiter
    cross apply sys.dm_exec_sql_text(r.sql_handle)
    where r.session_id = t1.request_session_id) as waiter_batch
    ,(select substring(qt.text,r.statement_start_offset/2,
    (case when r.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2
    else r.statement_end_offset end - r.statement_start_offset)/2+1)
    from sys.dm_exec_requests as r with(nolock)
    cross apply sys.dm_exec_sql_text(r.sql_handle) as qt
    where r.session_id = t1.request_session_id) as waiter_stmt --- statement executing now
    ,t2.blocking_session_id as [blocker sid] --- spid of blocker
    ,(select text from sys.sysprocesses as p with(nolock) --- get sql for blocker
    cross apply sys.dm_exec_sql_text(p.sql_handle)
    where p.spid = t2.blocking_session_id) as blocker_stmt,getdate() time
    from sys.dm_tran_locks as t1 with(nolock) , sys.dm_os_waiting_tasks as t2 with(nolock)
    where t1.lock_owner_address = t2.resource_address
    order by t2.wait_duration_ms desc
        此文大致总结了通过DMV调查数据库异常的基本方法和步骤,如果大家在调查问题时能够灵活运用,相信对数据库异常情况的定位和解决能够更快更有效。
  • 相关阅读:
    leetcode 18 4Sum
    leetcode 71 Simplify Path
    leetcode 10 Regular Expression Matching
    leetcode 30 Substring with Concatenation of All Words
    leetcode 355 Design Twitte
    leetcode LRU Cache
    leetcode 3Sum
    leetcode Letter Combinations of a Phone Number
    leetcode Remove Nth Node From End of List
    leetcode Valid Parentheses
  • 原文地址:https://www.cnblogs.com/tdskee/p/5610551.html
Copyright © 2011-2022 走看看