1. 数据库的启动
首先来分析一下数据库的启动过程,Oracle数据库的启动主要包含3个步骤;
(1)启动数据库到nomount状态;
(2)启动数据库到mount状态;
(3)启动数据库到open状态。
下面逐个来看看各个步骤的具体过程与其含义。
1.1. 启动数据库到nomount状态
在启动的第一步骤,Oracle首先寻找参数文件(pfile/spfile),然后根据参数文件中的设置,创建实例,分配内存,启动后台进程。在这里可以看到,只有拥有了一个参数文件,就可以凭之启动实例(Instance),这一步骤并不需要任何控制文件或数据文件的参与。在创建数据库时,如果在这一步骤出现问题,那么通常可能是系统配置(内核参数等)存在问题,用户需要检查是否分配了足够的系统资源等;
来看一下启动到nomount状态的过程:
[oracle@czjie ~]$ cd $ORACLE_HOME/dbs
[oracle@czjie dbs]$ ls
hc_ORCL.dat init.ora lkORCL spfileORCL.ora spfileORCL.ora.bak0
initdw.ora initORCL.ora orapwORCL spfileORCL.ora.bak [oracle@czjie dbs]$ sqlplus "/ as sysdba"
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 22:19:24 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup nomount
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 121637896 bytes
Database Buffers 92274688 bytes
Redo Buffers 2924544 bytes
注意:Oracle根据参数文件的内容,创建了instance,分配了相应的内存区域,启动了相应的后台进程。
此时观察警报日志文件(alert_<sid>.log),可以看到这一阶段的启动过程,读取参数文件,应用参数启动实例,所有在参数文件中定义的非缺省参数都会记录在警报日志文件中:
Fri Nov 11 22:20:49 2011
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
IMODE=BR
ILAT =18
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
processes = 150
__shared_pool_size = 104857600
__large_pool_size = 4194304
__java_pool_size = 12582912
__streams_pool_size = 0
nls_length_semantics = BYTE
resource_manager_plan =
sga_target = 218103808
control_files = /opt/ora10g/oradata/ORCL/control01.ctl, /opt/ora10g/oradata/ORCL/control02.ctl, /opt/ora10g/oradata/ORCL/control03.ctl
db_block_size = 8192
__db_cache_size = 92274688
compatible = 10.2.0.1.0
db_file_multiblock_read_count= 16
db_recovery_file_dest = /opt/ora10g/flash_recovery_area
db_recovery_file_dest_size= 2147483648
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 900
remote_login_passwordfile= EXCLUSIVE
db_domain = 192.168.1.106
dispatchers = (PROTOCOL=TCP) (SERVICE=ORCLXDB)
job_queue_processes = 10
background_dump_dest = /opt/ora10g/admin/ORCL/bdump
user_dump_dest = /opt/ora10g/admin/ORCL/udump
core_dump_dest = /opt/ora10g/admin/ORCL/cdump
audit_file_dest = /opt/ora10g/admin/ORCL/adump
db_name = ORCL
open_cursors = 300
pga_aggregate_target = 71303168
aq_tm_processes = 0
然后后台进程一次启动
PSP0 started with pid=3, OS id=2789
PMON started with pid=2, OS id=2787
MMAN started with pid=4, OS id=2791
DBW0 started with pid=5, OS id=2793
LGWR started with pid=6, OS id=2795
SMON started with pid=8, OS id=2799
CKPT started with pid=7, OS id=2797
RECO started with pid=9, OS id=2801
MMON started with pid=11, OS id=2805
CJQ0 started with pid=10, OS id=2803
这里注意一下Oracle选择参数文件的顺序。
在oracle10G里,Oracle首选spfile<sid>.ora文件作为启动参数文件;如果该文件不存在,Oracle选择spfile.ora文件;如果前者都不存在,Oracle将会选择init<sid>.ora文件;如果以上3个文件都不存在,Oracle将无法创建和启动instance。
用户可以在SQL*PLUS中通过show patameter spfile命令来检查数据库是否使用了spfile文件,如果value不为Null,则数据库使用了spfile文件;
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string /opt/ora10g/product/10.2.0/db_1/dbs/spfileORCL.ora
注意:
ORACLE_HOME= /opt/ora10g/product/10.2.0/db_1
ORACLE_SID=ORCL
这时候也可以从操作系统查看启动了的后台进程:
[oracle@czjie ~]$ ps -ef|grep ora_
oracle 2787 1 0 22:20 ? 00:00:01 ora_pmon_ORCL
oracle 2789 1 0 22:20 ? 00:00:00 ora_psp0_ORCL
oracle 2791 1 0 22:20 ? 00:00:00 ora_mman_ORCL
oracle 2793 1 0 22:20 ? 00:00:00 ora_dbw0_ORCL
oracle 2795 1 0 22:20 ? 00:00:00 ora_lgwr_ORCL
oracle 2797 1 0 22:20 ? 00:00:00 ora_ckpt_ORCL
oracle 2799 1 0 22:20 ? 00:00:00 ora_smon_ORCL
oracle 2801 1 0 22:20 ? 00:00:00 ora_reco_ORCL
oracle 2803 1 0 22:20 ? 00:00:00 ora_cjq0_ORCL
oracle 2805 1 0 22:20 ? 00:00:00 ora_mmon_ORCL
oracle 2807 1 0 22:20 ? 00:00:00 ora_mmnl_ORCL
oracle 2809 1 0 22:20 ? 00:00:00 ora_d000_ORCL
oracle 2811 1 0 22:20 ? 00:00:00 ora_s000_ORCL
oracle 3005 2818 0 22:48 pts/2 00:00:00 grep ora_
SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.
现在更名spfile<sid>.ora文件,此后Oracle将选择spfile.ora文件来启动数据库:
[oracle@czjie dbs]$ ls
hc_ORCL.dat init.ora lkORCL spfileORCL.ora spfileORCL.ora.bak0
initdw.ora initORCL.ora orapwORCL spfileORCL.ora.bak
[oracle@czjie dbs]$ mv spfileORCL.ora spfile.ora
[oracle@czjie dbs]$ ls
hc_ORCL.dat init.ora lkORCL spfile.ora spfileORCL.ora.bak0
initdw.ora initORCL.ora orapwORCL spfileORCL.ora.bak
SQL> startup nomount
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 121637896 bytes
Database Buffers 92274688 bytes
Redo Buffers 2924544 bytes
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string /opt/ora10g/product/10.2.0/db_1/dbs/spfile.ora
SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.
SQL>
再更名spfile.ora文件,此时Oracle将选择init<sid>.ora文件启动数据库:
[oracle@czjie dbs]$ mv spfile.ora spfile.ora.bak
[oracle@czjie dbs]$ ls
hc_ORCL.dat init.ora lkORCL spfile.ora.bak spfileORCL.ora.bak0
initdw.ora initORCL.ora orapwORCL spfileORCL.ora.bak
SQL> startup nomount
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 71306248 bytes
Database Buffers 142606336 bytes
Redo Buffers 2924544 bytes
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string
SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.
如果这3个文件都不在,Oracle将无法启动:
[oracle@czjie dbs]$ mv initORCL.ora initORCL.ora.bak
......
SQL> startup
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/opt/ora10g/product/10.2.0/db_1/dbs/initORCL.ora'
可以看到这里出现了错误,报告无法找到参数文件,init<sid>.ora是Oracle最后一个查找的参数文件。
在Oracle整个启动过程中,参数文件是在应用程序中的硬代码,按照如上顺序进行查找,不能改变Oracle的搜索路径及行为,但是如果参数文件不在响应的位置,在Linux/UNIX系统上,可以通过符号链接来进行重定位。
在参数文件中,通常需要最好的参数是db_name,设置了这个参数后,数据库实例就可以启动,来看一个简单的测试。
随便命名一个实例(测试来自远Linux,使用于Linux/UNIX,对于Windows平台,需要通过oradmin工具创建实例),然后尝试启动到nomount状态:
[oracle@czjie ~]$ export ORACLE_SID=czjie
[oracle@czjie ~]$ sqlplus "/ as sysdba"
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 23:04:03 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup nomount
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/opt/ora10g/product/10.2.0/db_1/dbs/initczjie.ora'
参数文件查找失败会给出提示信息,创建一个最简单的参数文件,然后就可以启动实例:
[oracle@czjie ~]$ cd $ORACLE_HOME/dbs/
[oracle@czjie dbs]$ vi initczjie.ora
#添加下面内容
db_name=czjie
[oracle@czjie dbs]$ ls
hc_ORCL.dat init.ora orapwORCL spfileORCL.ora.bak0
initczjie.ora initORCL.ora.bak spfile.ora.bak
initdw.ora lkORCL spfileORCL.ora.bak
SQL> startup nomount
ORACLE instance started.
Total System Global Area 113246208 bytes
Fixed Size 1266080 bytes
Variable Size 58723936 bytes
Database Buffers 50331648 bytes
Redo Buffers 2924544 bytes
缺省情况下,如果不设置,background_dump_dest目录(警报日志文件alert_<sid>.log的存放地点)位于$ORACLE_HOME/edbms/log目录下:
SQL> show parameter background_dump
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest string /opt/ora10g/product/10.2.0/db_
1/rdbms/log
顺便看看其他几个缺省路径的地址:
SQL> show parameter dump_dest
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
background_dump_dest string /opt/ora10g/product/10.2.0/db_1/rdbms/log
core_dump_dest string /opt/ora10g/product/10.2.0/db_1/dbs
user_dump_dest string /opt/ora10g/product/10.2.0/db_1/rdbms/log
SQL> show parameter control_files
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_files string /opt/ora10g/product/10.2.0/db_1/dbs/cntrlczjie.dbf
收录简单启动实例日志工大家参考:
[oracle@czjie ~]$ cd $ORACLE_HOME/dbs/
[oracle@czjie dbs]$ vi initczjie.ora
[oracle@czjie dbs]$ ls
hc_ORCL.dat init.ora orapwORCL spfileORCL.ora.bak0
initczjie.ora initORCL.ora.bak spfile.ora.bak
initdw.ora lkORCL spfileORCL.ora.bak
[oracle@czjie dbs]$ cd ~
[oracle@czjie ~]$ clear
[oracle@czjie ~]$ cat $ORACLE_HOME/rdbms/log/alert_czjie.log
Fri Nov 11 23:08:24 2011
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_1 parameter default value as /opt/ora10g/product/10.2.0/db_1/dbs/arch
Autotune of undo retention is turned off.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
db_name = czjie
PSP0 started with pid=3, OS id=3167
PMON started with pid=2, OS id=3165
MMAN started with pid=4, OS id=3169
DBW0 started with pid=5, OS id=3171
LGWR started with pid=6, OS id=3173
CKPT started with pid=7, OS id=3175
SMON started with pid=8, OS id=3177
RECO started with pid=9, OS id=3179
MMON started with pid=10, OS id=3181
MMNL started with pid=11, OS id=3183
这样,就通过了最少的参数需求启动了Oracle实例。
在使用RMAN(Recovery Manager)时存在更为特殊的情况,Oracle允许在不存在参数文件的情况下启动一个实例,数据库db_name会被缺省命名为DUMMY:
[oracle@czjie ~]$ rman target /
Recovery Manager: Release 10.2.0.4.0 - Production on Fri Nov 11 23:17:30 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database (not started)
RMAN> startup nomount;
startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/opt/ora10g/product/10.2.0/db_1/dbs/initORCL.ora'
starting Oracle instance without parameter file for retrival of spfile
Oracle instance started
Total System Global Area 159383552 bytes
Fixed Size 1266344 bytes
Variable Size 54529368 bytes
Database Buffers 100663296 bytes
Redo Buffers 2924544 bytes
RMAN> host;
[oracle@czjie ~]$ sqlplus "/ as sysdba"
SQL*Plus: Release 10.2.0.4.0 - Production on Fri Nov 11 23:18:07 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> show parameter db_name
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_name string DUMMY
此时警告日志文件中会记录如下信息:
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
sga_target = 159383552
compatible = 10.2.0.4.0
_dummy_instance = TRUE
remote_login_passwordfile= EXCLUSIVE
db_name = DUMMY
PMON started with pid=2, OS id=3214
PSP0 started with pid=3, OS id=3216
MMAN started with pid=4, OS id=3218
DBW0 started with pid=5, OS id=3220
LGWR started with pid=6, OS id=3222
CKPT started with pid=7, OS id=3224
RECO started with pid=9, OS id=3228
MMNL started with pid=11, OS id=3232
SMON started with pid=8, OS id=3226
MMON started with pid=10, OS id=3230
在实例创建以后,Oracle就可以逐步导航,完成数据库的加载、打开等工作。
1.2 启动数据库到mount状态
启动到nomount 状态以后,Oracle就可以从参数文件中获取控制文件的位置信息,这一部分信息在参数文件中的记录类似如下所示(Oracle缺省会创建3个控制文件,这3个控制文件的内容完全一致,是Oracle为了安全而才用的景象手段,在生产环境中,通常应该将3个控制文件存放在不同的物理硬盘上,避免因为介质故障而同时损坏3个控制文件):
control_files = /opt/ora10g/oradata/ORCL/control01.ctl, /opt/ora10g/oradata/ORCL/control02.ctl, /opt/ora10g/oradata/ORCL/control03.ctl
在nomount状态,可以查询v$parameter视图,获取控制文件信息,这部分信息来自启动的参数文件;当数据库mount之后,可以查询v$controlfile视图获得关于控制文件的信息,此时,这部分信息来自控制文件:
SQL> startup nomount
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 121637896 bytes
Database Buffers 92274688 bytes
Redo Buffers 2924544 bytes
SQL> select * from v$controlfile;
no rows selected
SQL> show parameter control_file
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time integer 7
control_files string /opt/ora10g/oradata/ORCL/contr
ol01.ctl, /opt/ora10g/oradata/
ORCL/control02.ctl, /opt/ora10
g/oradata/ORCL/control03.ctl
SQL> alter database mount;
Database altered.
SQL> select * from v$controlfile;
STATUS
-------
NAME
--------------------------------------------------------------------------------
IS_ BLOCK_SIZE FILE_SIZE_BLKS
--- ---------- --------------
/opt/ora10g/oradata/ORCL/control01.ctl
NO 16384 430
/opt/ora10g/oradata/ORCL/control02.ctl
NO 16384 430
STATUS
-------
NAME
--------------------------------------------------------------------------------
IS_ BLOCK_SIZE FILE_SIZE_BLKS
--- ---------- --------------
/opt/ora10g/oradata/ORCL/control03.ctl
NO 16384 430
在mount数据库的过程中,Oracle需要找到控制文件并锁定控制文件。如果控制文件全部丢失此时就会报出如下错误:
[oracle@czjie ~]$ cd /opt/ora10g/oradata/ORCL/
[oracle@czjie ORCL]$ ls
control01.ctl redo01.log sysaux01.dbf undotbs01.dbf
control02.ctl redo02.log system01.dbf users01.dbf
control03.ctl redo03.log temp01.dbf
[oracle@czjie ORCL]$ mv control01.ctl control01.ctl.bak
[oracle@czjie ORCL]$ ls
control01.ctl.bak redo01.log sysaux01.dbf undotbs01.dbf
control02.ctl redo02.log system01.dbf users01.dbf
control03.ctl redo03.log temp01.dbf
……
SQL> startup
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 121637896 bytes
Database Buffers 92274688 bytes
Redo Buffers 2924544 bytes
ORA-00205: error in identifying control file, check alert log for more info
这时候alert<sid>.log文件中通常会记录更为详细的信息:
Sat Nov 12 00:04:11 2011
ORA-00202: control file: '/opt/ora10g/oradata/ORCL/control01.ctl'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
因为Oracle的3个(缺省的)控制文件内容完全相同,如果只是损失了其中1~2个,可以复制完好的控制文件,更改为相应的名称,就可以启动数据库;如果丢失了所有的控制文件,那么就需要恢复或重建控制文件来打开数据库。
在正常mount数据库的过程中,数据库的警报日志文件仅记录如下信息:
Sat Nov 12 00:10:34 2011
Successful mount of redo thread 1, with mount id 1294793846
Sat Nov 12 00:10:34 2011
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
在这一步骤中,数据库需要计算mount id并将其记录在控制文件中,然后开始启动Heartbeat(心跳),每3秒更新一次控制文件。可以用一下命令间隔3秒转储2次控制文件信息:
SQL> alter session set events 'immediate trace name CONTROLF level 10';
Session altered.
在linux上用diff命令比较两个文件,可以发现,控制文件在Mount状态下发生变化的只有这个Heartbeat:
[oracle@czjie ORCL]$ diff orcl_ora_25542.trc orcl_ora_25706.tac
…
64c63
<heartbeat:588983634 mount id:1408096182
…
>heartbeat:588983636 mount id: 1408096182
HeartBeat 表明实例已经被特定例程所mount,这个属性主要用于OPS/RAC环境。但是Heartbeat在单实例环境中同样存在。
可以从一个内部表(需要以SYS用户登录)中查询到当前的Heartbeat值:
SQL> select cphbt from x$kcccp;
从Oracle 9i 开始,Oracle在数据库内部通过等待事件control file heartbeat来记录这个事件的相关等待:
SQL> select event#,name from v$event_name where name like '%heart%';
EVENT# NAME
---------- ----------------------------------------------------------------
282 ASM mount : wait for heartbeat
423 control file heartbeat
了解了启动的各个步骤,也就可以在发生问题的时候,快速定位,准确判断,从而快速解决问题。
启动到mount状态,数据库必须具备的另外一个重要文件就是口令文件,该文件位于$ORACLE_HOEM/dbs目录下,缺省的名称为orapw<sid>。
口令文件中存放sysdba/sysoper用户的用户名及口令:
[oracle@czjie dbs]$ strings orapwORCL
]\[Z
ORACLE Remote Password file
INTERNAL
239C08867B270ADC
3E378BE1E39BBBA6
在数据库没有启动之前,数据库内建用户是无法通过数据库本身来验证身份的,通过口令文件,Oracle可以实现对用户的身份验证,在数据库为启动之前登陆,进而启动数据库。
如果丢失了口令文件,在mount阶段就会出现错误:
ORA-01990: error opening password file '/opt/ora10g/product/10.2.0/dbs/orapw'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
对于口令文件,Oracle缺省查找orapw<sid>文件,如果该文件不存在,则继续查找orapw文件,如果两个文件都不存在,则数据库将会出现错误。
如果口令文件丢失,通过orapw工具即可重建,所以通常的备份策略中可以不必包含口令文件:
[oracle@czjie dbs]$ orapwd
Usage: orapwd file=<fname> password=<password> entries=<users> force=<y/n> nosysdba=<y/n>
where
file - name of password file (mand),
password - password for SYS (mand),
entries - maximum number of distinct DBA,
force - whether to overwrite existing file (opt),
nosysdba - whether to shut out the SYSDBA logon (opt for Database Vault only).
There are no spaces around the equal-to (=) character.
初始化参数文件remote_login_passwordfile和口令文件有关,限于篇幅,本文不再过多介绍。
通过在Linux/UNIX平台下,在$ORACLE_HOME/dbs目录下,还会存在另外一个文件,该文件命名规则为lk<SID>,lk指lock,该文件在数据库启动时创建,用于操作系统对数据库的锁定。当数据库启动时获得锁定,数据库关闭时释放。
有时在系统出现异常时,可能数据库已经关闭,但是锁定并未释放,或者因为后台进程未正常停止等原因,会导致下次数据库无法启动,相关的错误信息类似如下:
Sun Apr 30 06:08:58 2011
ALTER DATABASE MOUNT
Sun Apr 30 06:08:58 2011
scumnt: failed to lock /export/product/oracle/app/dbs/lkBILL exclusive
Sun Apr 30 06:08:58 2011
ORA-09968: scumnt: unable to lock file
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 20169
该文件内容通常只有一行,提示不要删除,该文件仅仅用于锁定:
bash-2.03$ more lkBILL
DO NOT DELETE THIS FILE!
这样的文件很少出现,通常在正常关闭数据库后即可重新启动。
1.3 启动数据库open阶段
由于控制文件中记录了数据库中数据文件、日志文件的位置信息、检查点信息等重要信息,所以在数据库的open阶段,Oracle可以根据控制文件中记录的这些信息找到这些文件,然后进行检查点及完整性检查。
如果不存在问题就可以启动数据库,如果存在不一致或文件丢失则需要进行恢复。
进一步地说,实际上在数据库open的过程中,Oracle进行的检查中包括以下两项:
第一次检查数据文件头中的检查点技术(Checkpoint cnt)是否和控制文件中的检查点(CheckPoint Cnt)一致,此步骤检查用意确认数据文件是否来自同一版本,而不是从备份中恢复而来(因为Checkpoint Cnt不会被冻结,会一直被修改)。
下面通过一个简单的测试来说明一下Checkpoint Cnt的作用。
定位oracle实例的当前跟踪文件名及路径
SQL> select u_dump.value || '/' || db_name.value || '_ora_' || v$process.spid || nvl2(v$process.traceid, '_' || v$process.traceid, null )|| '.trc' "Trace File" from v$parameter u_dump cross join v$parameter db_name cross join v$process join v$session on v$process.addr = v$session.paddr where u_dump.name = 'user_dump_dest' and db_name.name = 'db_name' and v$session.audsid=sys_context('userenv','sessionid');
Trace File
--------------------------------------------------------------------------------
/opt/ora10g/admin/ORCL/udump/ORCL_ora_2754.trc
首先通过如下命令在不同条件下转储控制文件,第一步转储正常状态下的控制文件:
SQL> alter session set events 'immediate trace name controlf level 12';
Session altered.
将系统表空间置于热备份状态(热备状态会冻结表空间数据文件的检查点):
SQL> alter tablespace system begin backup;
Tablespace altered.
再来转储控制文件:
SQL> alter session set events 'immediate trace name controlf level 12';
Session altered.
手工执行检查点并转储控制文件:
SQL> alter system checkpoint;
System altered.
SQL> alter session set events 'immediate trace name controlf level 12';
Session altered.
结束表空间的热备状态,再次转储控制文件:
SQL> alter tablespace system end backup;
Tablespace altered.
SQL> alter session set events 'immediate trace name controlf level 12';
Session altered.
简要地来看一下跟踪文件(仅研究system表空间记录)。
(1) 正常情况下转储控制文件。
***************************************************************************
DATA FILE RECORDS
***************************************************************************
(size = 428, compat size = 428, section max = 100, section in-use = 4,
last-recid= 53, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
(name #7) /opt/ora10g/oradata/ORCL/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
tablespace 0, index=1 krfil=1 prev_file=0
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:106 scn: 0x0000.000d845f 11/14/2011 15:24:50
Stop scn: 0xffff.ffffffff 11/14/2011 14:31:00
Creation Checkpointed at scn: 0x0000.00000009 06/30/2005 19:10:11
……
注意这里记录的检查点计数器及SCN;
(2) 执行Begin Backup以后的。
注意到Checkpoing cnt 增加了1,对表空间执行Begin Backup 会触发一次表空间检查点:
***************************************************************************
DATA FILE RECORDS
***************************************************************************
(size = 428, compat size = 428, section max = 100, section in-use = 4,
last-recid= 53, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
(name #7) /opt/ora10g/oradata/ORCL/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
tablespace 0, index=1 krfil=1 prev_file=0
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:107 scn: 0x0000.000d8489 11/14/2011 15:26:11
Stop scn: 0xffff.ffffffff 11/14/2011 14:31:00
Creation Checkpointed at scn: 0x0000.00000009 06/30/2005 19:10:11
……
可以注意到检查点计数器随之增加。
(3) 执行手工检查点。
在表空间热备份模式下,手工执行检查点后,可以看到,此时Checkpoint cnt增加,但是scn不在改变。这是由于表空间处于热备份模式,数据文件检查点会被冻结(热备模式下,数据库会生成额外的Redo日志,在后面文章中会有详细介绍)。
***************************************************************************
DATA FILE RECORDS
***************************************************************************
(size = 428, compat size = 428, section max = 100, section in-use = 4,
last-recid= 53, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
(name #7) /opt/ora10g/oradata/ORCL/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
tablespace 0, index=1 krfil=1 prev_file=0
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:108 scn: 0x0000.000d8489 11/14/2011 15:26:11
Stop scn: 0xffff.ffffffff 11/14/2011 14:31:00
Creation Checkpointed at scn: 0x0000.00000009 06/30/2005 19:10:11
……
(4) End Backup后的情况
此时数据库头的冻结被取消,SCN开始变化:
***************************************************************************
DATA FILE RECORDS
***************************************************************************
(size = 428, compat size = 428, section max = 100, section in-use = 4,
last-recid= 53, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
(name #7) /opt/ora10g/oradata/ORCL/system01.dbf
creation size=0 block size=8192 status=0xe head=7 tail=7 dup=1
tablespace 0, index=1 krfil=1 prev_file=0
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:109 scn: 0x0000.000d849e 11/14/2011 15:26:49
Stop scn: 0xffff.ffffffff 11/14/2011 14:31:00
Creation Checkpointed at scn: 0x0000.00000009 06/30/2005 19:10:11
……
这就是检查点计数器及其在不同模式下的变化。
如果检查点计数检查通过,则数据库进行第二次检查。第二次检查数据文件头的开始SCN和控制文件中记录的该文件的结束SCN是否一致,如果控制文件中记录的结束SCN等于数据文件头的开始SCN,则不需要对那个文件进行恢复。
对每个数据文件都完成检查后,打开数据库,锁定数据文件,同时将每个数据文件的结束SCN设置为无穷大。
看一下一下测试,如果数据库中的某个文件丢失:
[oracle@czjie ORCL]$ pwd
/opt/ora10g/oradata/ORCL
[oracle@czjie ORCL]$ ls
control01.ctl control03.ctl redo02.log sysaux01.dbf temp01.dbf users01.dbf
control02.ctl redo01.log redo03.log system01.dbf undotbs01.dbf
[oracle@czjie ORCL]$ mv users01.dbf users01.dbf.bak
[oracle@czjie ORCL]$
那么在启动数据库的时候,在Open阶段,Oracle才会检查这个文件的存在性,如果文件不存在,数据库会给出错误信息,停止启动:
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 121637896 bytes
Database Buffers 92274688 bytes
Redo Buffers 2924544 bytes
SQL> alter database mount;
Database altered.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/opt/ora10g/oradata/ORCL/users01.dbf'
注意:仅在open阶段,Oracle才尝试打开并锁定数据文件,如果丢失或出现问题,则会给出错误提示。这时候就需要dba的介入进行处理,根据不同情况进行相应的恢复。
现在来看看alert<sid>.log文件中记录的open过程中提示的错误信息:
Mon Nov 14 15:50:10 2011
alter database open
Mon Nov 14 15:50:10 2011
Errors in file /opt/ora10g/admin/ORCL/bdump/orcl_dbw0_3625.trc:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/opt/ora10g/oradata/ORCL/users01.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
ORA-1157 signalled during: alter database open...
在数据库出现问题的时候,提示中给出的可能是不完整的信息,而警报日志中则记录了完整的错误过程和错误号。所以当数据库出现故障时,应该优先检查alert_<sid>.log,从中发现关于故障的详细信息。
在完成数据库的验证和恢复过程后,数据库处于抑制的状态,数据库还需要进行一系列的处理过程:将undo段在线等操作,然后数据库可以提供访问,同时SMON可以开始进行事务回滚等。
在启动日志里,我们经常可以看到这样一行:
Database Characterset is ZHS16GBK
在每次数据库的启动过程中,Oracle都需要判断控制文件中记录的字符集和数据库中的字符集是否相符,如果相符,则记录上一行日志;如果不相符,则以数据库中的字符集为准更新控制文件中的字符集记录,类似的日志如下:
Updating character set in controlfile to ZHS16CGB231280
提示:在Oracle 8i之前,可以通过Update props$表的方式修改字符集,从Oracle 8i开始,切忌绝对不要使用同样的方式修改字符集。
如果细致一些,启动日志的每条信息都是指的研究的。
1.4.深入分析
现在我们再深入一下,研究在数据库Open的过程中。Oracle实际需要执行的操作。
1.4.1.获得数据库Open的跟踪文件
数据库的信息都是存放在数据文件当中的,但是当数据库尚未打开之前,Oracle是无法获得这部分数据的,那么Oracle是怎样完成这个从数据文件到内存的初始化过程的呢?
[oracle@czjie ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Mon Nov 21 20:41:16 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 121637896 bytes
Database Buffers 92274688 bytes
Redo Buffers 2924544 bytes
Database mounted.
SQL> alter session set sql_trace=true;
Session altered.
SQL> alter database open;
Database altered.
这里通过SQL_TRACE获得一个跟踪文件,跟踪文件里将记录从mount到open的过程中,Oracle所执行的后台操作。
1.4.2 .bootstrap$及数据库初始化过程
SQL> select u_dump.value || '/' || db_name.value || '_ora_' || v$process.spid || nvl2(v$process.traceid, '_' || v$process.traceid, null )|| '.trc' "Trace File" from v$parameter u_dump cross join v$parameter db_name cross join v$process join v$session on v$process.addr = v$session.paddr where u_dump.name = 'user_dump_dest' and db_name.name = 'db_name' and v$session.audsid=sys_context('userenv','sessionid');
Trace File
--------------------------------------------------------------------------------
/opt/ora10g/admin/ORCL/udump/ORCL_ora_2932.trc
[oracle@czjie ~]$ tkprof /opt/ora10g/admin/ORCL/udump/orcl_ora_2932.trc /opt/ora10g/orcl_ora_2932.txt
TKPROF: Release 10.2.0.4.0 - Production on Mon Nov 21 21:15:43 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
通过tkprof格式化跟踪文件之后,来看一下其中的内容。首先来参考跟踪文件的前面部分,这事第一个对象的创建:
create table bootstrap$ ( line# number not null, obj#
number not null, sql_text varchar2(4000) not null) storage (initial
50K objno 56 extents (file 1 block 377))
这一步骤中,实际上Oracle是在内存中创建bootstrap$结构,然后从数据文件的file 1 block 377读取数据导内存中,完成第一次初始化。
提示:file 1 block 377字句是内部语句,改语法对用户是不可用的。
可以从数据库查询一下,file 1 block 377上存储的是什么对象:
SQL> select segment_name,file_id,block_id from dba_extents where block_id=377;
SEGMENT_NAME
--------------------------------------------------------------------------------
FILE_ID BLOCK_ID
---------- ----------
BOOTSTRAP$
1 377
I_DIR$INSTANCE_JOB_NAME
3 377
File 1 block 377 开始存放的正式bootstrap$对象。
接下来在看trace文件中的内容,继续向下,Oracle执行的是:
select line#, sql_text
from
bootstrap$ where obj# != :1
在创建并从数据文件中装载了bootstrap$的内容之后,Oracle开始递归的从改表总读取信息,加载数据。那么bootstap$中记录的是什么信息呢?
在数据库中,bootstrap$是一张实际存在的系统表:
SQL> desc bootstrap$
Name Null? Type
----------------------------------------- -------- ----------------------------
LINE# NOT NULL NUMBER
OBJ# NOT NULL NUMBER
SQL_TEXT NOT NULL VARCHAR2(4000)
来看一下这张表的具体内容:
SQL> select * from bootstrap$ where line#<5;
LINE# OBJ#
---------- ----------
SQL_TEXT
--------------------------------------------------------------------------------
-1 -1
8.0.0.0.0
0 0
CREATE ROLLBACK SEGMENT SYSTEM STORAGE ( INITIAL 112K NEXT 1024K MINEXTENTS 1 M
AXEXTENTS 32765 OBJNO 0 EXTENTS (FILE 1 BLOCK 9))
2 2
CREATE CLUSTER C_OBJ#("OBJ#" NUMBER) PCTFREE 5 PCTUSED 40 INITRANS 2 MAXTRANS 25
LINE# OBJ#
---------- ----------
SQL_TEXT
--------------------------------------------------------------------------------
5 STORAGE ( INITIAL 136K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCRE
ASE 0 OBJNO 2 EXTENTS (FILE 1 BLOCK 25)) SIZE 800
3 3
CREATE INDEX I_OBJ# ON CLUSTER C_OBJ# PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE
( INITIAL 64K NEXT 1024K MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJN
O 3 EXTENTS (FILE 1 BLOCK 49))
4 4
LINE# OBJ#
---------- ----------
SQL_TEXT
--------------------------------------------------------------------------------
CREATE TABLE TAB$("OBJ#" NUMBER NOT NULL,"DATAOBJ#" NUMBER,"TS#" NUMBER NOT NULL
,"FILE#" NUMBER NOT NULL,"BLOCK#" NUMBER NOT NULL,"BOBJ#" NUMBER,"TAB#" NUMBER,"
COLS" NUMBER NOT NULL,"CLUCOLS" NUMBER,"PCTFREE$" NUMBER NOT NULL,"PCTUSED$" NUM
BER NOT NULL,"INITRANS" NUMBER NOT NULL,"MAXTRANS" NUMBER NOT NULL,"FLAGS" NUMBE
R NOT NULL,"AUDIT$" VARCHAR2(38) NOT NULL,"ROWCNT" NUMBER,"BLKCNT" NUMBER,"EMPCN
T" NUMBER,"AVGSPC" NUMBER,"CHNCNT" NUMBER,"AVGRLN" NUMBER,"AVGSPC_FLB" NUMBER,"F
LBCNT" NUMBER,"ANALYZETIME" DATE,"SAMPLESIZE" NUMBER,"DEGREE" NUMBER,"INSTANCES"
NUMBER,"INTCOLS" NUMBER NOT NULL,"KERNELCOLS" NUMBER NOT NULL,"PROPERTY" NUMBER
NOT NULL,"TRIGFLAG" NUMBER,"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3" NUMBER,"SPA
LINE# OBJ#
---------- ----------
SQL_TEXT
--------------------------------------------------------------------------------
RE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) STORAGE ( OBJNO 4 TA
BNO 1) CLUSTER C_OBJ#(OBJ#)
以上只查询了表中的5条记录,大家可以自行研究一下其他记录的内容。从这些语句中可以看出,bootstrap$中实际上只是记录了一些数据库系统基本对象的创建语句。Oracle通过bootstrap$进行引导,进一步创建相关的重要对象,从而启动了数据库。
1.4.3. bootstrap$的重要性
由上面的讨论可以知道bootstrap$表的重要,如果bootstrap$表发生损坏,则数据库将无法启动。读者可能遭遇以下案例,bootstrap$表被恶意修改,如果关闭数据库,之后将无法启动。
以下此时仅为说明问题,请勿模仿,在未做好备份之前,请勿试验此类操作:
SQL> col sql_text for a15;
SQL> select * from bootstrap$ where rownum<2;
LINE# OBJ# SQL_TEXT
---------- ---------- ---------------
-1 -1 8.0.0.0.0
SQL> update bootstrap$ set sql_text='9.0.0.0.0' where line#=-1;
1 row updated.
SQL> commit;
Commit complete.
SQL> select * from bootstrap$ where rownum<2;
LINE# OBJ# SQL_TEXT
---------- ---------- ---------------
-1 -1 9.0.0.0.0
如果不关闭数据库,该修改是无影响的,bootstrap$也仅在数据库启动时才发挥其重要作用。继续往下看,如果bootstrap$损坏或者被恶意修改,在数据库启动时会受到如下错误:
SQL> startup
ORACLE instance started.
Total System Global Area 218103808 bytes
Fixed Size 1266680 bytes
Variable Size 113249288 bytes
Database Buffers 100663296 bytes
Redo Buffers 2924544 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
进一步检查alert文件,可以发现更为详细的提示信息:
Mon Nov 21 22:29:22 2011
Errors in file /opt/ora10g/admin/ORCL/udump/orcl_ora_503.trc:
ORA-00704: bootstrap process failure
ORA-00702: bootstrap verison '9.0.0.0.0' inconsistent with version '8.0.0.0.0'
Mon Nov 21 22:29:22 2011
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 503
ORA-1092 signalled during: ALTER DATABASE OPEN..
日志给出了详细的错误信息,在这种情况下,最好的方式是从备份中进行不完全恢复,如果没有备份,则恢复将会非常复杂和艰难。