http://www.laoxiong.net/how_debugger_dump_oracle_state.html
如果Oracle数据库hang住了,对Oracle做system dump,或做hang analyze,是研究和解决问题的有效办法,至少在提交SR时能够有更多的有用信息。如果能够连接数据库,并能够进行操作,那么用oradebug是简单快捷的办法。
但有的时候,数据库由于hang住,sqlplus不能连接时(在10g可以尝试用sqlplus -prelim连接数据库),可以使用操作系统上的调试工具来dump oracle系统状态。在记一次Oracle数据库无响应(hang住)故障的处理一文中,就曾使用dbx做systemstate dump,并发现问题所在,并最终解决了问题。下面是当时用dbx做dump的过程:
1 # dbx -a 446910 2 Waiting to attach to process 446910 … 3 Successfully attached to oracle. 4 Type ‘help’ for help. 5 reading symbolic information … 6 stopped in iosl.select at 0×9000000000c94d8 ($t2) 7 0×9000000000c94d8 (select+0xfffffffffff06318) e8410028 ld r2,0×28(r1) 8 (dbx) print ksudss(10) 9 Segmentation fault in slrac at 0×100083aa0 ($t2) 10 0×100083aa0 (slrac+0xe4) 88030000 lbz r0,0×0(r3) 11 (dbx) detach
从上面可以看到,使用dbx做dump的过程为:
- 找到有异常的进程号,比如CPU非常高,HANG住的进程等。如果做系统范围的systemstate dump,可以是其他的进程。
- dbx -a < 进程号>
- print ksudss(10) --这里是直接调用ORACLE程序中的ksudss函数,dump level为10,就等同于在sqlplus 中用oradebug dump systemstate 10
- detach
- quit
在LINUX下可以使用gdb,下面是一个例子:
1 [oracle@xty ~]$ ps -ef | grep LOCAL 2 oracle 3765 3764 1 05:55 ? 00:00:00 oraclexty (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) 3 oracle 3767 3668 0 05:55 pts/2 00:00:00 grep LOCAL 4 [oracle@xty ~]$ gdb $ORACLE_HOME/bin/oracle 3765 5 GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) 6 Copyright 2004 Free Software Foundation, Inc. 7 GDB is free software, covered by the GNU General Public License, and you are 8 welcome to change it and/or distribute copies of it under certain conditions. 9 Type “show copying” to see the conditions. 10 There is absolutely no warranty for GDB. Type “show warranty” for details. 11 This GDB was configured as “i386-redhat-linux-gnu”…(no debugging symbols found)…Using host libthread_db library “/lib/tls/libthread_db.so.1″. 12 Attaching to program: /u01/app/oracle/product/10.1.0/db_1/bin/oracle, process 3765 13 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libskgxp10.so…(no debugging symbols found)…done. 14 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libskgxp10.so 15 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libhasgen10.so…done. 16 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libhasgen10.so 17 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libskgxn2.so…done. 18 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libskgxn2.so 19 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocr10.so…done. 20 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocr10.so 21 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocrb10.so…done. 22 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocrb10.so 23 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocrutl10.so…done. 24 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocrutl10.so 25 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libjox10.so…done. 26 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libjox10.so 27 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libclsra10.so…done. 28 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libclsra10.so 29 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libdbcfg10.so…done. 30 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libdbcfg10.so 31 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libnnz10.so…done. 32 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libnnz10.so 33 Reading symbols from /usr/lib/libaio.so.1…done. 34 Loaded symbols for /usr/lib/libaio.so.1 35 Reading symbols from /lib/libdl.so.2…done. 36 Loaded symbols for /lib/libdl.so.2 37 Reading symbols from /lib/tls/libm.so.6…done. 38 Loaded symbols for /lib/tls/libm.so.6 39 Reading symbols from /lib/tls/libpthread.so.0…done. 40 [Thread debugging using libthread_db enabled] 41 [New Thread -1219938624 (LWP 3765)] 42 Loaded symbols for /lib/tls/libpthread.so.0 43 Reading symbols from /lib/libnsl.so.1…done. 44 Loaded symbols for /lib/libnsl.so.1 45 Reading symbols from /lib/tls/libc.so.6…done. 46 Loaded symbols for /lib/tls/libc.so.6 47 Reading symbols from /lib/ld-linux.so.2…done. 48 Loaded symbols for /lib/ld-linux.so.2 49 Reading symbols from /lib/libnss_files.so.2…done. 50 Loaded symbols for /lib/libnss_files.so.2 51 0×006967a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 52 (gdb) print ksudss(10) 53 [Switching to Thread -1219938624 (LWP 3765)] 54 $1 = 213658428 55 (gdb) detach 56 Detaching from program: /u01/app/oracle/product/10.1.0/db_1/bin/oracle, process 3765 57 (gdb) quit
然后我们可以找到有dump结果的trace文件:
[oracle@xty ~]$ cd $ORACLE_BASE/admin/xty/udump [oracle@xty udump]$ ls -lrt | grep 3765 -rw-r----- 1 oracle oinstall 599705 Nov 21 05:56 xty_ora_3765.trc
根据debugger工具attach进程的不同,trace文件一般在user_dump_dest或background_dump_dest目录下。
在LINUX下用gdb,在AIX下用dbx,那么在HP-UX下呢,可以用HP的wdb(可以到HP WDB查看HP WDB的详细信息和下载最新的版本。在solaris上,也会有dbx或gdb(各个平台有多种不同的debugger,其他还有adb,mdb等等)。有兴趣的朋友可以用用。
除了上面提到的systemstate dump,还能不能够做其他的dump?答案是肯定的,以下是一些dump相关的函数:
print ksdhng(3,1,0) 相当于oradebug hanganalyze 3 print ksudps(10) 相当于oradebug dump processstate 10 print curdmp() 相当于oradebug call curdmp(也就是oradebug dump cursordump) print ksdtrc(4) 相当于oradebug dump events 4(这里参数表示level,1--session,2--process,4--system)
以上列出的,不一定对处理HANG有意义,只是这里觉得有些意思。^_^.其他还有意思的包括:
print ksdsel(10046,12) --相当于为attach的进程设置10046事件level 12 print skdxipc() --相当于oradebug ipc print skdxprst() --相当于oradebug procstat
注意:不要在正常运行的生产系统上运行和测试。
当然,如果能用oradebug,那么就用oradebug,毕竟方便很多,也更安全。这里只是对使用debugger做dump一些扩充,供有兴趣研究的朋友参考。
补充:在HP-UX上使用gdb进行system dump,可能会更复杂,可以参考Oracle Metalink Doc 273324.1