db_alert ORA-00600: ??????, ??: [ksliwat: bad wait time], [18446744073709471616], [], [], [], [], [], [], [], [], [], []
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Dumping diagnostic data in directory=[cdmp_20190223052024], requested by (instance=1, osid=63791 (SCM0)), summary=[incident=524425].
Unix process pid: 63791, image: oracle@pelpsrdb01 (SCM0)
ORA-00600: ??????, ??: [ksliwat: bad wait time], [18446744073709471616], [], [], [], [], [], [], [], [], [], []
2019-02-23 05:20:23.965 :kjsc_main(): error SCM0
OPIRIP: Uncaught error 447. Error stack:
ORA-00447: fatal error in background process
ORA-00600: internal error code, arguments: [ksliwat: bad wait time], [18446744073709471616], [], [], [], [], [], [], [], [], [], []
ksedst()+119 call kgdsdst()
dbkedDefDump()+1200 call ksedst()
ksedmp()+259 call dbkedDefDump()
dbgexPhaseII()+2130 call ksedmp()
dbgexProcessError()+2531 call dbgexPhaseII()
dbgePostErrorKGE()+1767 call dbgexProcessError()
dbkePostKGE_kgsf()+90 call dbgePostErrorKGE()
kgeadse()+477 call dbkePostKGE_kgsf()
kgerinv_internal()+49 call kgeadse()
kgerinv()+40 call kgerinv_internal()
kgeasnmierr()+150 call kgerinv()
ksliwat()+15035 call kgeasnmierr()
kslwaitctx()+197 call ksliwat()
kjsc_main()+1431 call kslwaitctx()
ksvrdp_int()+2010 call kjsc_main()
二 、问题分析
1.信息汇总 1)数据库版本12.2.0.1 2)报错信息ORA-600 [ksliwat: bad wait time] [18446744073709471616] 3)相关进程kjsc_main(): error SCM0 ORA-00447: fatal error in background process 4)函数名称 5)目的分析ORA-600报错的影响 2.信息查询 函数说明 函数名称 描述 ksliwat kernel service lock management inner wait function; setup a wait that times out kslwaitctx kslwaitctx|wait context; wait until timeout kjsc_main kernel lock management RAC global stats ksvrdp_int kernel service (VOS) slave management run generic detached slave process 全局内核锁状态管理,触发超时等待,随后报错
Name |
Expanded Name |
Short Description |
Long Description |
External Properties |
SCM0 |
DLM Statistics Collection and Management Slave |
Collects and manages statistics related to global enqueue service (GES) and global cache service (GCS) |
The DLM Statistics Collection and Management slave (SCM0) is responsible for collecting and managing the statistics related to global enqueue service (GES) and global cache service (GCS). This slave exists only if DLM statistics collection is enabled. |
Database instances |
SYS@ora122>select a.ksppinm,b.ksppstvl,a.ksppdesc from x$ksppi a,x$ksppcv b where (a.indx=b.indx) and a.ksppinm like '%_dlm_stats_collect%';
--------- -------------------- ------------------------------
_dlm_stats_collect 1 DLM statistics collection(0 = disable (default), 1 = enable)
_dlm_stats_collect_mode 0 DLM statistics collection mode
_dlm_stats_collect_slot_interval 60 DLM statistics collection slot interval (in seconds)
_dlm_stats_collect_du_limit 3000 DLM statistics collection disk updates per slot
MOS文档截取 12.2 RAC DB Background process SCM0 consuming excessive CPU (文档 ID 2373451.1) The DLM Statistics Collection and Management slave (SCM0) is responsible for collecting and managing the
statistics related to global enqueue service (GES) and global cache service (GCS). This slave exists only if
DLM statistics collection is enabled.
The value is set to 1. Please go ahead and run the following command to change the value of _dlm to 0: alter system set "_dlm_stats_collect" = 0 scope = spfile sid = '*'; This does require a reboot for changes to take effect. If a reboot is not an option, as a workaround
you may kill the SCM0 process at OS level, it will respawn a new process soon. kill -9 <os pid of SCM0> Note: Disabling dlm_stats_collect (ie setting to 0) has no negative effect in 12.2.
This is because the stats are not yet used in 12.2
(the features that would use these stats service based affinity and cache warmup are also disabled in 12.2 by default).
Versions 18 and 19 may have them enabled, so re-evaluate at that time.
分布式锁管理器(distributed lock management DLM),简单说对于RAC环境,所有数据的修改,都需要事先以节点为单位,去DLM申请节点对块的修改权限,DLM对块的资源进行多节点修改进行协调。
GES控制数据库中所有的 library cache锁和dictionary cache锁。这些资源在单实例数据库中是本地性的,但是到了RAC群集中变成了全局资源。全局锁也被用来保护数据的结构,进行事务的管理。一般说来,事务和表锁 在RAC环境或是 单实例环境中是一致的。
GCS 是oracle 用来实施Cache fusion的机制。被GCS 和GES 管理的块和锁叫做资源。对这些资源的访问必须在群集的多个实例中进行协调。这个协调在实例层面和数据库层面都有发生。实例层次的资源协调叫做本地资源协调;数据库层次的协调叫做全局资源协调。
本地资源协调的机制和单实例oracle的资源协调机制类似,包含有块级别的访问,空间管理,dictionary cache、library cache管理,行级锁,SCN 发生。全局资源协调是针对RAC的,使用了SGA中额外的内存组件、算法和后台进程。