zoukankan      html  css  js  c++  java
  • Oracle RAC ASM disk header 备份 恢复 与 重建 示例说明

    一. 准备知识

           RAC ASM由于其高度的封装性,使得我们很难知道窥探其内部的原理。ASM如果一旦出现问题,通常都很难处理。即便在有很完备的RMAN备份的情况下,恢复起来都可能需要很长的时间。

           而ASM 中最为脆弱的又是ASM disk header。如果disk header逻辑损坏了,即corrupt了,整个disk group将不能够mount,依赖于ASM实例的database也将不能够startup。

           在RAC中增删节点后asm的disk header就很容易出现问题。如果是因为disk header的原因而需要重建整个diskgroup,进而用RMAN恢复就会浪费很多时间,所以最好的方法就是需要定期对ASM diskheader进行dd备份。

    Oracle ASM 系列 小结

    http://blog.csdn.net/tianlesoftware/article/details/6364422

    Oracle ASM 详解

    http://blog.csdn.net/tianlesoftware/article/details/5314541

    官网的说明:

    Introduction to Automatic StorageManagement (ASM)

    http://download.oracle.com/docs/cd/B28359_01/server.111/b31107/asmcon.htm

     

    1.1 什么是ASM(Automatic StorageManagement )

           ASM是一个管理卷组或者文件系统的软件。它是通过ASM instance 来实现对磁盘的管理。 这个和Oracle instance 很类似。 ASM instance 也有SGA和background processes组成. 但是ASM 相对的task 很少,所以它的SGA 相对较小。

    1.1.1 ASM instance

    ASM instance 维护以下ASMmetadata:

           (1)The disks that belong to a disk group

           (2)The amount of space that is available in a disk group

           (3)The filenames of the files in a disk group

           (4)The location of disk group datafile dataextents

           (5)A redo log that records information about atomically changing datablocks

           ASM instace 通过维护asm metadata 来在file layout层面上对database instance提供支持。 一个ASM instance 可以对应多个database instance。

    准确的说ASM 的Metadata 可以分为3种:

           (1)diskgroup metadata: files with NUMBER_KFFIL <256 ASM metadataandASMlog files. These files have high redundancy (3 copies) and block size=4KB.

                           1)ASM log filesare used for ASM instance and crash recovery when a crash happens with metadataoperations (seebelow COD and ACD)

                           2)at diskgroupcreation 6 files withmetadata are visible from x$kffil

           (2)disk metadata: disk headers (typically the first 2 AU ofeach disk)are not listed in x$kffil (they appear as file number 0 in x$kfdat).Containdisk membership information. This part of the disk has to be 'zeroedout'before the disk can be added to ASM diskgroup as a new disk.

           (3)file metadata: 3 mirrored extents with file metadata,visible fromx$kffxp and x$kfdat

    1.1.2 ASM Disk groups

           Diskgroups 由多个disks 组成,每个disk 就是我们的的一个分区。 ASM disk groups包含的metadata信息就是ASM instance 管理的信息。

           大多数情况只需要创建很少的disk groups,一般是2个groups,很少有3个。

    为了保护disks 上的数据,Oracle 对disk groups 有3种冗余方式:

           (1)external redundancy表示Oracle不帮你管理镜像,功能由外部存储系统实现,比如通过RAID技术。
           (2)normalredundancy(默认方式)表示Oracle提供2路镜像来保护数据。

           (3)high redundancy表示Oracle提供3路镜像来保护数据。

           如果使用ASM 的冗余,就是通过 ASMfailure group 来实现。ASM使用的镜像算法并不是镜像整个disk,而是作extent级的镜像。所以很明显如果为各个failure group使用不同容量的disk是不明智的,因为这样在Oracle分配下一个extent的时候可能就会出现问题。

           在normal redundancy模式下,ASM环境中每分配一个extent都会有一个primary copy和一个second copy,ASM的算法保证了second copy和primary copy一定是在不同的failure group中,这就是failure group的意义。通过这个算法,ASM保证了即使一个failuregroup中的所有disk都损坏了,数据也是毫发无伤的。

           Oracle在分配extent的时候,所有failure group中的这个将拥有相同数据的extent称为一个extent set,当Oracle将数据写入文件的时候,primary copy可能在任何一个failure group中,而second copy则在另外的failure group中,当Oracle读取数据的时候,除非是primary copy不可用,否则将优先从primary copy中读取数据,通过这种写入无序,读取有序的算法,Oracle保证了数据读取尽量分布在多个disk中。

           因为公用一个硬件模块的磁盘很可能会同时损坏或者失效,所以通常我们在设计failuregroup时,应该把一个大的盘阵中在一个tray中的磁盘放在一个failuregroup中,这样我们就可以拿走一个tray,失效这个failure group,然后换上新的tray和磁盘,这跟RAID的思想是一样的。

           ASM的冗余方式是在创建disk groups时指定的,一经设定就无法更改,如果我们想把normal redundancy改为high redundancy就只能是创建一个新的failure group,然后把旧failure group中的文件通过RMAN或者DBMS_FILE_TRANSFER的方法移动到新failure group中去。

           如果在创建disk groups时,如果没有创建failure groups,即使没有显式指定,failure groups也是始终会创建的。在这种情况下,每个disk都属于一个failure group,在创建磁盘组的时候,failure group也会默认创建,名称就是disk的名字。

    在我的blog:

           Oracle ASM 相关的 视图(V$)和 数据字典(X$)

           http://blog.csdn.net/tianlesoftware/article/details/6733039

           里面提到了一些与ASM 相关的视图,可以通过v$asm_diskgroup 来查看groups 的信息。

    SYS@+ASM2(rac2)>  desc v$asm_diskgroup

     Name                                     Null?    Type

     ------------------------------------------------- ----------------------------

     GROUP_NUMBER                                       NUMBER

     NAME                                              VARCHAR2(30)

     SECTOR_SIZE                                        NUMBER

     BLOCK_SIZE                                         NUMBER

     ALLOCATION_UNIT_SIZE                               NUMBER

     STATE                                             VARCHAR2(11)

     TYPE                                              VARCHAR2(6)

     TOTAL_MB                                          NUMBER

     FREE_MB                                           NUMBER

     REQUIRED_MIRROR_FREE_MB                            NUMBER

     USABLE_FILE_MB                                     NUMBER

     OFFLINE_DISKS                                      NUMBER

     UNBALANCED                                        VARCHAR2(1)

     COMPATIBILITY                                      VARCHAR2(60)

     DATABASE_COMPATIBILITY                             VARCHAR2(60)

    SYS@+ASM2(rac2)> select group_number,name,allocation_unit_size,total_mb from v$asm_diskgroup;

    GROUP_NUMBER NAME                           ALLOCATION_UNIT_SIZE   TOTAL_MB

    ------------ -------------------------------------------------- ----------

              1 DATA                                       1048576      11993

              2 FRA                                        1048576       7993

    这里我们分配了2个disk groups,启动AU 大小为1M。 即默认值,关于AU 下节有说明。

    1.1.3 ASM Disks

           ASM disk 组成disk group,在OS 上的表现就是每个disk 对应一个分区。 ASM disks 由extent 组成,而每个extent 又由一个或者多个AU 组成。

    Allocation Units

           Every ASM disk is divided into allocation units (AU). An AU is the fundamental unitof allocation within a disk group. A file extent consists of one or more AU. AnASM file consists of one or more file extents.

           When you create a disk group, you can setthe ASM AU size to be between 1 MB and 64 MB in powers of two, such as, 1, 2, 4,8, 16, 32, or 64. Larger AU sizes typically provide performance advantages fordata warehouse applications that use large sequential reads.

          

           默认的AU 大小是1M。 这个在上节通过v$asm_diskgroup 视图可以查看,而已可以查看指定AU的参数:_asm_ausize

           关于这个参数具体查看方法,参考:

                  Oracle ASM 相关的 视图(V$)和 数据字典(X$)

                  http://blog.csdn.net/tianlesoftware/article/details/6733039

           中的第八小节。

    我们也可以通过v$asm_disk视图来查看disk 的信息:

    SYS@+ASM2(rac2)> desc v$asm_disk

     Name                                     Null?    Type

     ------------------------------------------------- ----------------------------

     GROUP_NUMBER                                       NUMBER

     DISK_NUMBER                                        NUMBER

     COMPOUND_INDEX                                     NUMBER

     INCARNATION                                        NUMBER

     MOUNT_STATUS                                      VARCHAR2(7)

     HEADER_STATUS                                     VARCHAR2(12)

     MODE_STATUS                                       VARCHAR2(7)

     STATE                                              VARCHAR2(8)

     REDUNDANCY                                        VARCHAR2(7)

     LIBRARY                                           VARCHAR2(64)

     TOTAL_MB                                          NUMBER

     FREE_MB                                           NUMBER

     NAME                                              VARCHAR2(30)

     FAILGROUP                                         VARCHAR2(30)

     LABEL                                             VARCHAR2(31)

     PATH                                              VARCHAR2(256)

     UDID                                              VARCHAR2(64)

     PRODUCT                                           VARCHAR2(32)

     CREATE_DATE                                        DATE

     MOUNT_DATE                                        DATE

     REPAIR_TIMER                                       NUMBER

     READS                                             NUMBER

     WRITES                                            NUMBER

     READ_ERRS                                          NUMBER

     WRITE_ERRS                                         NUMBER

     READ_TIME                                         NUMBER

     WRITE_TIME                                         NUMBER

     BYTES_READ                                         NUMBER

     BYTES_WRITTEN                                      NUMBER

    SYS@+ASM2(rac2)> select group_number,disk_number,name,path from v$asm_disk;

    GROUP_NUMBER DISK_NUMBER NAME                           PATH

    ------------ ----------------------------------------- ------------------------

              1           0 DATA                           /dev/mapper/datap1

              2           0 FRA_0000                       /dev/mapper/frap1

    1.1.4 ASM 文件的命名规则说明

           ASM文件名字的格式是固定的:+group/dbname/file type/tag.file.incarnation

           在创建db时系统自动创建的几个表空间(system,undotbs,sysaux,users)对应的都是真实的数据文件,即ASM 文件默认的命名格式。而且这个信息都写到了控制文件里。 如果我们使用别名的话,会方便很多。 对于这些创建数据库时自动创建的表空间,我们要他们使用别名,除了手工创建对应别名外,还需要重建控制文件,并且在重建时,datafile 里写别名的信息。 这样数据库也就使用别名了。

    SYS@anqing2(rac2)> select file_id,file_name,AUTOEXTENSIBLE from dba_data_files order by 1;

      FILE_ID FILE_NAME                                AUTOEXTENS

    -------------------------------------------------- ----------

            1 +DATA/anqing/datafile/system01.dbf       YES

            2 +DATA/anqing/datafile/undotbs01.dbf      YES

            3 +DATA/anqing/datafile/sysaux01.dbf       YES

            4 +DATA/anqing/datafile/users.273.75154823 YES

            5 +DATA/anqing/datafile/undotbs02.dbf      YES

            6 +DATA/anqing/datafile/system02.dbf       YES

            7 +DATA/anqing/datafile/dave01.dbf         YES

            8 +DATA/anqing/datafile/test01.dbf         YES

           这里我使用了别名,所以只有user 表空间是默认的ASM名称,我们用ASMCMD 命令来验证一下这个:

    [oracle@rac2 ~]$ export ORACLE_SID=+ASM2

    [oracle@rac2 ~]$ asmcmd

    ASMCMD> ls

    DATA/

    FRA/

    ASMCMD> cd DATA

    ASMCMD> ls

    ANQING/

    DAVE/

    DB_UNKNOWN/

    RAC/

    ASMCMD> cd ANQING

    ASMCMD> ls

    CONTROLFILE/

    DATAFILE/

    ONLINELOG/

    PARAMETERFILE/

    TEMPFILE/

    ASMCMD> cd DATAFILE

    ASMCMD> pwd

    +DATA/ANQING/DATAFILE

    ASMCMD> ls

    DAVE.285.755349075

    SYSAUX.275.751548237

    SYSTEM.276.751548261

    SYSTEM.280.755038499

    TEST.286.755567335

    UNDOTBS1.274.751548233

    UNDOTBS2.281.751559213

    USERS.273.751548233

    dave01.dbf

    sysaux01.dbf

    system01.dbf

    system02.dbf

    test01.dbf

    undotbs01.dbf

    undotbs02.dbf

    ASMCMD>

    从上面的结果我们可以看到别名和原始名称的对应关系。

    连上ASM 实例,用sqlplus的查询进一步确认一下:

    [oracle@rac2 ~]$ exportORACLE_SID=+ASM2

    [oracle@rac2 ~]$ sqlplus / as sysdba;

    SQL*Plus: Release 10.2.0.4.0 - Productionon Tue Aug 30 15:31:03 2011

    Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

    Connected to:

    Oracle Database 10g Enterprise EditionRelease 10.2.0.4.0 - Production

    With the Partitioning, Real ApplicationClusters, OLAP, Data Mining

    and Real Application Testing options

    SYS@+ASM2(rac2)> select name,file_numberfrom v$asm_alias order by 2;

    NAME                                            FILE_NUMBER

    -----------------------------------------------------------

    SYSTEM.256.746634087                                     256

    Current.256.746634203                                    256

    SYSAUX.257.746634087                                     257

    group_1.257.746989321                                    257

    UNDOTBS1.258.746634089                                   258

    group_2.258.746989329                                    258

    group_3.259.746989339                                    259

    USERS.259.746634089                                      259

    Current.260.746634201                                    260

    group_4.260.746989347                                    260

    group_1.261.746989315                                    261

    以上内容简单的说明了一下什么是ASM,以及ASM 文件的分配。即:

           ASMinstance 管理ASM Diskgroups,disk groups 由disk 组成,每个disk 由多个extent 组成,每个extent又由多个AU组成。

           这个就是ASM 层面上的一个组成。 那么对于DB instance,它只能看到disk groups 这个层面。 当我们创建datafile 的时候,就是放到对应的asm disk groups里。 至于disk group 内部的balance 和 stripe 就交给ASM instance 来处理。

    1.2 ASM disk header 具体内容

           在1.1 节里,我们知道了asm disk 与asm 的关系。 ASM 中最脆弱的就是ASM disk header。如果disk header逻辑损坏了,即corrupt了,整个disk group将不能够mount,依赖于ASM实例的database也将不能够startup。

    可以使用KFED 命令或者BBED 命令来查看asm disk header 里的具体内容:

           Oracle KFED 和 KFOD 工具说明

           http://blog.csdn.net/tianlesoftware/article/details/6729950

           Oracle BBED 工具 说明

           http://blog.csdn.net/tianlesoftware/article/details/5006580

           Oracle bbed 五个 实用示例

           http://blog.csdn.net/tianlesoftware/article/details/6684505

    使用如下语句查看:

    SYS@+ASM2(rac2)> select group_kfdatgroup#,FNUM_KFDAT file#, sum(1) AU_used from x$kfdat where v_kfdat='V' group bygroup_kfdat,FNUM_KFDAT,v_kfdat;

       GROUP#      FILE#    AU_USED

    ---------- ---------- ----------

             1          0          2 --这里保存的就是我们的disk header

            1          1          2

            1          2          1

            1          3         85

            1          4          2

            1          5          1

            1          6          1

            1        256        522

            1        257        602

            1        258        337

            1        259          8

                  ......

            1        286         51

            1    1048575        103

             2          0          2 --这里保存的就是我们的disk header

            2          1          2

            2          2          1

            2          3         85

            2          4          2

       GROUP#      FILE#    AU_USED

    ---------- ---------- ----------

            2          5          1

             2         6          1

            2        400         15

            2        256         16

            2        257         56

         ......

            2        429          8

            2    1048575         71

    74 rows selected.

    SYS@+ASM2(rac2)>

           以上SQL 显示,在每个disk groups 上,都有file# 从0-6 的信息,并且显示了该File 占用的AU 大小。这里的信息就是我们需要关注的信息。 关于这7个File#的说明如下:

    (1).  File#0, AU=0: disk header (disk name, etc), Allocation Table (AT)andFree Space Table (FST)

    (2).  File#0, AU=1: PartnerStatus Table (PST)

    (3).  File#1: File Directory(files and their extent pointers)

    (4).  File#2: Disk Directory

    (5).  File#3: Active ChangeDirectory (ACD) The ACD is analogous to a redolog, where changes to themetadata are logged. Size=42MB * number of instances

    (6).  File#4: Continuing OperationDirectory (COD). The COD is analogousto an undo tablespace. It maintains thestate of active ASM operations such asdisk or datafile drop/add. The COD logrecord is either committed or rolledback based on the success of the operation.

    (7).  File#5: Template directory

    (8).  File#6: Alias directory

    (9).  11g, File#9: AttributeDirectory

    (10).  11g, File#12:Stalenessregistry, created when needed to track offline disks

    这里的相关术语解释如下:

    (1).  PST - Partner StatusTable. Maintains info on disk-to-diskgroupmembership.

    (2).  COD - ContinuingOperation Directory. The COD structuremaintains the state of active ASMoperations or changes, such as disk ordatafile drop/add. The COD log record iseither committed or rolled back basedon the success of the operation. (source Oraclewhitepaper)

    (3).  ACD - Active ChangeDirectory. The ACD is analogous to a redolog, where changes to the metadata arelogged. The ACD log record is used todetermine point of recovery in the case ofASM operation failures or instancefailures. (source Oracle whitepaper)

    (4).  OSM Oracle StorageManager, legacy name, synonymous of ASM

    (5).  CSS ClusterSynchronization Services. Part of Oracleclusterware, mandatory with ASM even insingle instance. CSS is used toheartbeat the health of the ASM instances.

    (6).  RBAL - Oraclebackgroud process. In an ASM instance coordinatedrebalancing operations. In aDB instance, opens and mount diskgroups from thelocal ASM instance.

    (7).  ARBx - Oraclebackgroud processes. In an ASM instance, a slavefor rebalancing operations

    (8).  PSPx - Oraclebackgroud processes. In an ASM instance, ProcessSpawners

    (9).  GMON - Oraclebackgroud processes. In an ASM instance,diskgroup monitor.

    (10).  ASMB - Oraclebackgroudprocess. In an DB instance, keeps a (bequeath) persistent DB connectionto thelocal ASM instance. Provides hearthbeat and ASM statistics. During adiskgrouprebalancing operation ASM communicates to the DB AU changes via thisconnection.

    (11).   O00x - Oraclebackgroudprocesses. Slaves used to connected from the DB to the ASM instancefor 'shortoperations'.

           可以使用KFED 命令来查看disk header的具体内容。 这个在我之前有关KFED的blog里有示例:

           Oracle KFED 和 KFOD 工具说明

           http://blog.csdn.net/tianlesoftware/article/details/6729950

    这里截取部分内容:

    [oracle@rac2 ~]$ kfedread /dev/mapper/datap1

    kfbh.endian:                         1 ; 0x000: 0x01

    kfbh.hard:                         130 ; 0x001: 0x82

    kfbh.type:                           1 ; 0x002:KFBTYP_DISKHEAD

    kfbh.datfmt:                         1 ; 0x003: 0x01

    kfbh.block.blk:                      0 ; 0x004: T=0 NUMB=0x0

    kfbh.block.obj:             2147483648 ; 0x008: TYPE=0x8NUMB=0x0

    kfbh.check:                 1508168608 ; 0x00c:0x59e4d3a0

    kfbh.fcn.base:                       0 ; 0x010: 0x00000000

    kfbh.fcn.wrap:                       0 ; 0x014: 0x00000000

    kfbh.spare1:                         0 ; 0x018: 0x00000000

    kfbh.spare2:                         0; 0x01c: 0x00000000

    kfdhdb.driver.provstr:    ORCLDISKDATA ; 0x000: length=12

    -->磁盘卷名

    kfdhdb.driver.reserved[0]:  1096040772 ; 0x008: 0x41544144

    kfdhdb.driver.reserved[1]:           0 ; 0x00c: 0x00000000

    kfdhdb.driver.reserved[2]:           0 ; 0x010: 0x00000000

    kfdhdb.driver.reserved[3]:           0 ; 0x014: 0x00000000

    kfdhdb.driver.reserved[4]:           0 ; 0x018: 0x00000000

    kfdhdb.driver.reserved[5]:           0 ; 0x01c: 0x00000000

    kfdhdb.compat:               168820736 ; 0x020: 0x0a100000

    kfdhdb.dsknum:                       0 ; 0x024: 0x0000

    kfdhdb.grptyp:                       1 ; 0x026:KFDGTP_EXTERNAL

    --> ThisindicatesRedundancy for Group.Check TYPE in query output.

    kfdhdb.hdrsts:                       3 ; 0x027:KFDHDR_MEMBER

    --> This indicatesDiskHeader status. Here it indicates it is member of Group.

    kfdhdb.dskname:                   DATA ; 0x028: length=4

    --> This indicatesDisk Name

    kfdhdb.grpname:                   DATA ; 0x048: length=4

    --> This indicates theGroupName for the disk.

    kfdhdb.fgname:                    DATA ; 0x068: length=4

    --> This indicatestheFailure Group Name.

           使用BBED 命令查看disk header 参考:

                  Oracle 使用BBED 查看 ASMDisk Header 内容

                  http://blog.csdn.net/tianlesoftware/article/details/6739369

    二. 使用DD 命令进行asmdisk 备份与恢复

    2.1 DD 备份需要多少个bytes?

           我们通过KFED命令可以查看到最后一个bytes的信息:

                  kfdhdb.acdb.ub2spare:                0 ; 0x1de: 0x0000      

           这里的0x1de 转成十进制是478.即这个disk header 占用了478个bytes。 但是我们用dd 备份disk header需要备份4096个bytes,即4k。 为什么是4k?

           这是受隐含参数控制的:_asm_blksize=4096。 即一个block的大小。

           可以通过all_parameters 视图查看该隐含参数的值:

                  Oracle all_parameters 视图

                  http://blog.csdn.net/tianlesoftware/article/details/6641281

    SYS@anqing2(rac2)> select name,valuefrom all_parameters where name='_asm_blksize';

    NAME       VALUE

    ------------------------------------------------

    _asm_blksize    4096

           在disk header 中某些状态位和效验位是会发生变化,但是基本信息是固定的。 使用这些固定信息就可以进行恢复。  

           还有一个重要的一点:dd 备份最好停机做,kfed 可以在线做。

    2.2  如何清理ASMDisk

           有时候一个ASM Disk由于故障,导致我们删也删不掉,加也加不进去,通常现象是磁盘的headerstatus状态不正确但是diskheader中仍然保留了部分磁盘组的信息。此时我们就需要clear这个磁盘的diskheader,然后再将它重新加入磁盘组中。

           清理操作的命令如下:

                  ddif=<device name> of=<backup file> bs=4096 count=1
                  dd if=/dev/zeroof=<device name> bs=4096 count=1

           强调一点:慎用该命令。

    2.3 开始DD 备份

    SYS@anqing2(rac2)> select name,path fromv$asm_disk;

    NAME            PATH

    -------------------------------------------------------------------------------

    DATA            /dev/mapper/datap1

    FRA_0000        /dev/mapper/frap1

    [oracle@rac2 ~]$ dd if=/dev/mapper/datap1 of=/u01/datap1header bs=4096 count=1

    1+0 records in

    1+0 records out

    4096 bytes (4.1 kB) copied, 0.000122762seconds, 33.4 MB/s

    [oracle@rac2 ~]$ dd if=/dev/mapper/frap1 of=/u01/fraheader bs=4096 count=1;

    1+0 records in

    1+0 records out

    4096 bytes (4.1 kB) copied, 0.000325073seconds, 12.6 MB/s

    2.4 停掉ASM 实例

    SYS@anqing2(rac2)> select name,state,type from v$asm_diskgroup;

    NAME            STATE       TYPE

    --------------- ----------- ------

    DATA            CONNECTED   EXTERN

    FRA             CONNECTED   EXTERN

    [oracle@rac2 u01]$ sh crs_stat.sh

    Name                           Target     State     Host     

    ------------------------------ -------------------  -------  

    ora.anqing.anqing1.inst        ONLINE     ONLINE    rac1     

    ora.anqing.anqing2.inst        ONLINE     ONLINE    rac2     

    ora.anqing.db                  ONLINE     ONLINE    rac1     

    ora.rac1.ASM1.asm              ONLINE     ONLINE    rac1     

    ora.rac1.LISTENER_RAC1.lsnr    ONLINE    ONLINE     rac1     

    ora.rac1.gsd                   ONLINE     ONLINE    rac1     

    ora.rac1.ons                   ONLINE     ONLINE    rac1     

    ora.rac1.vip                   ONLINE     ONLINE    rac1     

    ora.rac2.ASM2.asm              ONLINE     ONLINE    rac2     

    ora.rac2.LISTENER_RAC2.lsnr    ONLINE    ONLINE     rac2     

    ora.rac2.gsd                   ONLINE     ONLINE    rac2     

    ora.rac2.ons                   ONLINE     ONLINE    rac2     

    ora.rac2.vip                   ONLINE     ONLINE    rac2     

    [oracle@rac2 u01]$ srvctl stop database -danqing

    [oracle@rac2 u01]$ sh crs_stat.sh

    Name                           Target     State     Host     

    ------------------------------ -------------------  -------  

    ora.anqing.anqing1.inst        OFFLINE    OFFLINE             

    ora.anqing.anqing2.inst        OFFLINE    OFFLINE             

    ora.anqing.db                  OFFLINE    OFFLINE             

    ora.rac1.ASM1.asm              ONLINE     ONLINE    rac1     

    ora.rac1.LISTENER_RAC1.lsnr    ONLINE    ONLINE     rac1     

    ora.rac1.gsd                   ONLINE     ONLINE    rac1     

    ora.rac1.ons                   ONLINE     ONLINE    rac1     

    ora.rac1.vip                   ONLINE     ONLINE    rac1     

    ora.rac2.ASM2.asm              ONLINE     ONLINE    rac2     

    ora.rac2.LISTENER_RAC2.lsnr    ONLINE    ONLINE     rac2     

    ora.rac2.gsd                   ONLINE     ONLINE    rac2     

    ora.rac2.ons                   ONLINE     ONLINE    rac2     

    ora.rac2.vip                   ONLINE     ONLINE    rac2     

    [oracle@rac2 u01]$ srvctl stop asm -n rac1

    [oracle@rac2 u01]$ srvctl stop asm -n rac2

    [oracle@rac2 u01]$ sh crs_stat.sh        

    Name                           Target     State     Host     

    ------------------------------ -------------------  -------  

    ora.anqing.anqing1.inst        OFFLINE    OFFLINE             

    ora.anqing.anqing2.inst        OFFLINE    OFFLINE             

    ora.anqing.db                  OFFLINE    OFFLINE             

    ora.rac1.ASM1.asm              OFFLINE    OFFLINE             

    ora.rac1.LISTENER_RAC1.lsnr    ONLINE    ONLINE     rac1     

    ora.rac1.gsd                   ONLINE     ONLINE    rac1     

    ora.rac1.ons                   ONLINE     ONLINE    rac1     

    ora.rac1.vip                   ONLINE     ONLINE    rac1     

    ora.rac2.ASM2.asm              OFFLINE    OFFLINE             

    ora.rac2.LISTENER_RAC2.lsnr    ONLINE    ONLINE     rac2     

    ora.rac2.gsd                   ONLINE     ONLINE    rac2     

    ora.rac2.ons                   ONLINE     ONLINE    rac2     

    ora.rac2.vip                   ONLINE     ONLINE    rac2     

    [oracle@rac2 u01]$

    2.5 模拟diskheader 故障

           使用2.2中的方法。

    [oracle@rac2 u01]$ dd if=/dev/zero of=/dev/mapper/datap1 bs=4096 count=1

    1+0 records in

    1+0 records out

    4096 bytes (4.1 kB) copied, 0.00558218seconds, 734 kB/s

    2.6 用KFED 查看此时的diskheader

    [oracle@rac2 u01]$ kfed read /dev/mapper/datap1

    kfbh.endian:                          0 ; 0x000: 0x00

    kfbh.hard:                            0 ; 0x001: 0x00

    kfbh.type:                            0 ; 0x002:KFBTYP_INVALID

    kfbh.datfmt:                          0 ; 0x003: 0x00

    kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0

    kfbh.block.obj:                       0 ; 0x008: TYPE=0x0NUMB=0x0

    kfbh.check:                           0 ; 0x00c:0x00000000

    kfbh.fcn.base:                        0 ; 0x010: 0x00000000

    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

    kfbh.spare1:                          0 ; 0x018: 0x00000000

    kfbh.spare2:                          0 ; 0x01c: 0x00000000

    2.7 启动ASM 实例

    [oracle@rac2 u01]$ export ORACLE_SID=+ASM2

    [oracle@rac2 u01]$ sqlplus / as sysdba;

    SQL*Plus: Release 10.2.0.4.0 - Productionon Thu Sep 1 16:35:06 2011

    Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

    Connected to an idle instance.

    SQL> startup

    ASM instance started

    Total System Global Area   92274688 bytes

    Fixed Size                  1265960 bytes

    Variable Size              65842904 bytes

    ASM Cache                  25165824 bytes

    ORA-15032: not all alterations performed

    ORA-15063: ASM discoveredan insufficient number of disks for diskgroup "DATA"

    这里提示DATAdiskgroup 不能mout,ASM 实例不能启动

    2.8 用之前的备份恢复

    [oracle@rac2 u01]$ dd if=/u01/datap1header of=/dev/mapper/datap1 bs=4096 count=1

    1+0 records in

    1+0 records out

    4096 bytes (4.1 kB) copied, 0.00666105seconds, 615 kB/s

    2.9 用KFED 验证diskheader

    [oracle@rac2 u01]$ kfed read /dev/mapper/datap1

    kfbh.endian:                          1 ; 0x000: 0x01

    kfbh.hard:                          130 ; 0x001: 0x82

    kfbh.type:                            1 ; 0x002:KFBTYP_DISKHEAD

    kfbh.datfmt:                          1 ; 0x003: 0x01

    kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0

    kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8NUMB=0x0

    kfbh.check:                  1508168608 ; 0x00c:0x59e4d3a0

    kfbh.fcn.base:                        0 ; 0x010: 0x00000000

    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

    kfbh.spare1:                          0 ; 0x018: 0x00000000

    kfbh.spare2:                          0 ; 0x01c: 0x00000000

    ....

    现在恢复正常

    2.10 将Data diskgroup mount 上去

    SYS@+ASM2(rac2)> select name,state,typefrom v$asm_diskgroup;

    NAME                           STATE       TYPE

    ------------------------------ -----------------

    DATA                           DISMOUNTED

    FRA                            MOUNTED     EXTERN

    SYS@+ASM2(rac2)> alter diskgroup DATAmount;

    Diskgroup altered.

    SYS@+ASM2(rac2)> select name,state,typefrom v$asm_diskgroup;

    NAME                           STATE       TYPE

    ------------------------------ -----------------

    DATA                           MOUNTED     EXTERN

    FRA                           MOUNTED     EXTERN

    mout 成功,现在RAC 可以正常启动了。

    [oracle@rac2 u01]$ sh crs_stat.sh

    Name                           Target     State     Host     

    ------------------------------ -------------------  -------  

    ora.anqing.anqing1.inst        ONLINE    ONLINE     rac1     

    ora.anqing.anqing2.inst        ONLINE     ONLINE    rac2     

    ora.anqing.db                  ONLINE     ONLINE    rac2     

    ora.rac1.ASM1.asm              ONLINE     ONLINE    rac1     

    ora.rac1.LISTENER_RAC1.lsnr    ONLINE    ONLINE     rac1     

    ora.rac1.gsd                   ONLINE     ONLINE    rac1     

    ora.rac1.ons                   ONLINE     ONLINE    rac1     

    ora.rac1.vip                   ONLINE     ONLINE    rac1     

    ora.rac2.ASM2.asm              ONLINE    ONLINE     rac2     

    ora.rac2.LISTENER_RAC2.lsnr    ONLINE    ONLINE     rac2     

    ora.rac2.gsd                   ONLINE     ONLINE    rac2     

    ora.rac2.ons                   ONLINE     ONLINE    rac2     

    ora.rac2.vip                   ONLINE    ONLINE     rac2  

    三. 使用KFED 进行备份恢复

           这种方式和dd 一样,先把asm disk header 导出,然后导入就可以了。不过这里要注意的几点,就是当我们导出以后,在导入。 在这段时间内disk header的信息可能会发生变化。 所以在导入之前需要关注一下这些信息。

    如:

    kfdhdb.dsknum: 0 ; 0x024: 0x0000
    kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
    kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
    kfdhdb.dskname: DATA_0000 ; 0x028: length=9
    kfdhdb.grpname: DATA ; 0x048: length=4
    kfdhdb.fgname: DATA_0000 ; 0x068: length=9
    kfdhdb.crestmp.hi: 32937833 ; 0x0a8: HOUR=0x9 DAYS=0x1b MNTH=0x5 YEAR=0x7da
    kfdhdb.mntstmp.hi: 32937834 ; 0x0b0: HOUR=0xa DAYS=0x1b MNTH=0x5 YEAR=0x7da
    kfdhdb.secsize: 512 ; 0x0b8: 0x0200
    kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
    kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
    kfdhdb.dsksize: 51200 ; 0x0c4: 0x0000c800
    kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
    kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000
    kfdhdb.grpstmp.hi: 32937833 ; 0x0e4: HOUR=0x9 DAYS=0x1b MNTH=0x5 YEAR=0x7da
    kfdhdb.grpstmp.lo: 1704339456 ; 0x0e8: USEC=0x0 MSEC=0x18a SECS=0x19 MINS=0x19

    以上信息的解释说明:

    dsknum:磁盘号
    grptyp:磁盘所属类型EXTERNALREDUNDANCY
    磁盘所属类型主要有:
           NORMAL REDUNDANCY - Two-waymirroring, requiring two failure groups. 
           HIGH REDUNDANCY - Three-waymirroring, requiring three failure groups. 
           EXTERNAL REDUNDANCY - No mirroringfor disks that are already protected using hardware mirroring or RAID. 
    ddrsts:磁盘头状态
    dskname:在asm中磁盘名
    grpname:磁盘组名
    fgname:failure groupname
    crestmp.hi:asm磁盘组创建时间
    mntstmp.hi:asm磁盘组mount时间
    blksize:磁盘头块大小 4096
    ausize:条带化大小 默认1M
    dsksize:磁盘大小
    f1b1locn:FileDirectory blk 1 AU num 

           这里需要强调一点,如果一个disk group里有多个disk 的时候,并且他们都是同时添加到disk group里的,那么这种情况下,他们的disk header 是差不多的。 所以在同一个disk group里,当某个disk header 出现corrupt的时候,只需要将改组的其他disk header 导出,然后导入corrupt的就ok了。

    3.1 KFED 备份asmdisk header

    SYS@anqing2(rac2)> select path fromv$asm_disk;

    PATH

    --------------------------------------------------------------------------------

    /dev/mapper/datap1

    /dev/mapper/frap1

    [oracle@rac2 u01]$ kfed read/dev/mapper/datap1 text=/u01/datap1disker

    [oracle@rac2 u01]$ ll datap1disker

    -rw-r--r-- 1 oracle oinstall 6607 Sep  2 10:48 datap1disker

    [oracle@rac2 u01]$ cat datap1disker

    kfbh.endian:                          1 ; 0x000: 0x01

    kfbh.hard:                          130 ; 0x001: 0x82

    kfbh.type:                            1 ; 0x002:KFBTYP_DISKHEAD

    kfbh.datfmt:                          1 ; 0x003: 0x01

    kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0

    kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8NUMB=0x0

    kfbh.check:                   868534624 ; 0x00c:0x33c4c960

    kfbh.fcn.base:                        0 ; 0x010: 0x00000000

    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

    kfbh.spare1:                          0 ; 0x018: 0x00000000

    kfbh.spare2:                          0 ; 0x01c: 0x00000000

    kfdhdb.driver.provstr:     ORCLDISKDATA ; 0x000: length=12

    kfdhdb.driver.reserved[0]:   1096040772 ; 0x008: 0x41544144

    kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000

    kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000

    kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000

    kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000

    kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000

    kfdhdb.compat:                168820736 ; 0x020: 0x0a100000

    kfdhdb.dsknum:                        0 ; 0x024: 0x0000

    kfdhdb.grptyp:                        1 ; 0x026:KFDGTP_EXTERNAL

    kfdhdb.hdrsts:                        3 ; 0x027:KFDHDR_MEMBER

    kfdhdb.dskname:                    DATA ; 0x028: length=4

    kfdhdb.grpname:                    DATA ; 0x048: length=4

    kfdhdb.fgname:                     DATA ; 0x068: length=4

    kfdhdb.capname:                         ; 0x088: length=0

    kfdhdb.crestmp.hi:             32952076 ; 0x0a8: HOUR=0xcDAYS=0x18 MNTH=0x3 YEAR=0x7db

    kfdhdb.crestmp.lo:           3374491648 ; 0x0ac: USEC=0x0MSEC=0xaa SECS=0x12 MINS=0x32

    kfdhdb.mntstmp.hi:             32957488 ; 0x0b0: HOUR=0x10DAYS=0x1 MNTH=0x9 YEAR=0x7db

    kfdhdb.mntstmp.lo:           2804987904 ; 0x0b4: USEC=0x0MSEC=0x2e SECS=0x33 MINS=0x29

    kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

    kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000

    kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000

    kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80

    kfdhdb.dsksize:                   11993 ; 0x0c4: 0x00002ed9

    kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002

    kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001

    kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002

    kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

    kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000

    kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000

    kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000

    kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000

    kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000

    kfdhdb.grpstmp.hi:             32952076 ; 0x0e4: HOUR=0xcDAYS=0x18 MNTH=0x3 YEAR=0x7db

    kfdhdb.grpstmp.lo:           3374396416 ; 0x0e8: USEC=0x0MSEC=0x4d SECS=0x12 MINS=0x32

    .....

    3.2 清空asmdisk header

           要清空头4k disk header的原因,是由于一些垃圾位信息的存在,导致check校验值计算有误,清空完头后再merge的话,校验计算就正确了。如果不清空,那么前4k不仅仅只包含merge的header信息,还有其他被corrupt的信息,所以用merge进去会导致校验值错误,就算修改check的16进制代码,还是不能加载diskgroup,v$asm_disk显示header_status为provision(错误的check值会显示imcompatible),需要清空前4k再merge这样check才会正确。

    [oracle@rac2 u01]$ dd if=/dev/zero of=/dev/mapper/datap1 bs=4096 count=1

    1+0 records in

    1+0 records out

    4096 bytes (4.1 kB) copied, 0.000133662seconds, 30.6 MB/s

    注意:

           我的这步dd 操作实在DB open 状态下进行的,我们看一下此时的状态。

    SYS@anqing2(rac2)> select name,state,offline_disks from v$asm_diskgroup;

    NAME                           STATE       OFFLINE_DISKS

    ------------------------------ ----------- -------------

    DATA                           CONNECTED               0

    FRA                            CONNECTED               0

    SYS@anqing2(rac2)> select mount_status,header_status,state,path from v$asm_disk;

    MOUNT_S HEADER_STATU STATE    PATH

    ------- ------------ ----------------------------------------------------------

    OPENED UNKNOWN      NORMAL   /dev/mapper/datap1

    OPENED UNKNOWN      NORMAL   /dev/mapper/frap1

    进行一下事务操作:

    SYS@anqing2(rac2)> create table d1 asselect * from all_objects;

    Table created.

    SYS@anqing2(rac2)> select count(*) fromd1;

     COUNT(*)

    ----------

        49868

    事务操作也正常。

    现在我们重启一下ASM实例。

    SYS@+ASM2(rac2)> shutdown immediate

    ASM diskgroups dismounted

    ASM instance shutdown

    SYS@+ASM2(rac2)> startup

    ASM instance started

    Total System Global Area   92274688 bytes

    Fixed Size                  1265960 bytes

    Variable Size              65842904 bytes

    ASM Cache                  25165824 bytes

    ORA-15032: not all alterations performed

    ORA-15063: ASM discovered an insufficientnumber of disks for diskgroup "DATA"

    重启之后,之前的dd 破坏就有影响了。

    3.3 使用KFEDMerge 恢复

           在前面讲过,使用Merge 恢复,要检查下之前导出来的内容。 因为可能有变跟。

    我这里直接使用KFEDmerge 回去。 在mout Data disk group.

    [oracle@rac2 u01]$ kfed merge /dev/mapper/datap1 text=/u01/datap1disker

    SYS@+ASM2(rac2)> select name,state fromv$asm_diskgroup;

    NAME                           STATE

    ------------------------------ -----------

    DATA                           DISMOUNTED

    FRA                            MOUNTED

    SYS@+ASM2(rac2)> alter diskgroup DATAmount;

    Diskgroup altered.

    SYS@+ASM2(rac2)> select name,state fromv$asm_diskgroup;

    NAME                           STATE

    ------------------------------ -----------

    DATA                           MOUNTED

    FRA                            MOUNTED

    成功Mount。

    四. 重建ASM Disk Header

    Oracle 官方文档:

           Creatinga New ASM Disk Header After Existing One Is Corrupted

           http://blog.csdn.net/tianlesoftware/article/details/6740716

           Oracle的asm这块很脆弱,如果我们没有对disk header进行,或者使用kfed merge 也失败,那么就只有最好一招:重建disk header。 这里要注意,不是所有情况下都可以重建成功的。如果重建失败,那么就只有最后一个解决方法,重建diskgroup,然后通过备份进行全库恢复。

           在Oracle 11g里引入了AMDU工具,不过该工具在10g里也可以使用。具体参考MOS 文档:[ID 553639.1].

           AMDU isa tool introduced in 11g where it is posible to extract all the availablemetadata from one or more ASM disks, generate formatted block printouts fromthe dump output, extract one or more files from a diskgroup (mounted/unmounted)and write them to the OS file system.
           This tool is very important whendealing with internal errors related to the ASM metadata.
           Although this tool was releasedwith 11g, it can be used with ASM 10g.

           而且在11gR2里,asmcmd 的md_backup 和 md_restore命令也可以进行备份。 关于这个命令的使用,参考eygle 的blog:

           http://www.eygle.com/archives/2011/03/asm_md_backup_md_restore.html

    我们通过x$kfdat 字典查看时,会显示每个file# 对应的AU 数。如下:

    SYS@+ASM2(rac2)> select group_kfdatgroup#,FNUM_KFDAT file#, sum(1) AU_used from x$kfdat where v_kfdat='V' group bygroup_kfdat,FNUM_KFDAT,v_kfdat;

       GROUP#     FILE#    AU_USED

    ---------- ---------- ----------

            1         0          2

            1         1          2

            1         2          1

            1         3         85

            1         4          2

            1         5          1

            1         6          1

           其中我们在disk header 重建时需要关注的的几个信息:file direcroy 和 disk directory。

    (1).  File#0, AU=0: disk header(disk name, etc), Allocation Table (AT)and Free Space Table (FST)

    (2).  File#0, AU=1: PartnerStatus Table (PST)

    (3).  File#1: File Directory(files and their extent pointers)

    (4).  File#2: Disk Directory

    注意几点:

    1.  KFED 工具版本要10.2.0.2 以上的的。否则会有bug:5039964.

    2. 重建disk header思路如下:

           1).找到filedirectory ,再根据filedirectory 找到 diskdirectory;
           2). 根据disk directory找出磁盘信息,手工编辑磁盘头文件,最后用kfed merge到对应磁盘中,生成disk header。

           3).file directory一般在磁盘组某个磁盘au=2的位置上,如果对磁盘组做过删除盘和增加盘的操作,file directory不一定在au=2的位置上,需要手工去查找。

    4.1 官网的示例

           Forthis test we have 3 ASM disks in an external redundancy diskgroup. For the

    test we will wipe out the header for ASMdisk 3 (data03):

           /ocfs02/asm/data01

           /ocfs02/asm/data02

           /ocfs02/asm/data03

    测试环境的diskgroup里有3个disk, 实验破坏data 03的diak header。

    1. Make sure all ASMinstances are shut down.

           --关闭所有ASM 实例

    2. Make a back up of thefirst 4k of the bad disk with dd:

           ddif=<bad disk> of=<file> bs=4096 count=1

           备份损坏的disk header

    3. Check existing disksand see which one has “file 1 block 1″:

    To find the disk with f1b1 run:

           kfedread <device name> | grep f1b1

           搜索含有file 1 block 1的字段。

    Example:

    $ kfed read /ocfs02/asm/data01 | grep f1b1

    kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

    $ kfed read /ocfs02/asm/data02 | grep f1b1

    kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000

           Sincedata01, has a non-zero value, data01 is the disk with “file 1 block 1″.

           --注意这里的值,如果非0,就是代表搜索到了file 1 block 1.

           Confirmthis by checking the following to see if you see “KFBTYP_LISTHEAD” in the 2ndallocation unit:

           可以可以通过第二个AU 单元来验证。

    kfed read <device name> aunum=2 |grep kfbh.type

    Also specify the ausize with AUSZ=# ifusing a non default allocation unit size.

           如果使用非默认AUsize 的话,也可以指定ausize。

    Example:

    $ kfed read /ocfs02/asm/data01 aunum=2 |grep kfbh.type

    kfbh.type: 5 ; 0x002: KFBTYP_LISTHEAD

           Ifthe lost disk is the “file 1 block 1″ disk then scan every AU of the bad disk till you find a headerwhich claims to be FILE_DIRECTORY (KFBTYP_FILEDIR).

           如果通过grep没有找到f1b1,就需要查找所有的AU.直到找到file directory。

           Onceyou find that you can set f1b1locn to that AU number and continue…  If the file directory cannotbe found anywhere then we have no choice but to re-create the diskgroup andrestore from a backup.

           如果找到了f1b1locn,就将其设置为正确的AU Number,如果说没有找到File directory。 那么就只有重建diskgroup,然后通过备份进行restore了。

    4. Make a copy of a gooddisk header with kfed that IS NOT the disk that contains f1b1 and is in theSAME diskgroup as the bad disk.

           copy 一个disk header。这个disk header是非f1b1的。 在上面的测试,f1b1在data01上。

    In our example this is data02:

           kfedread <device name> > fix.txt

    Example:

    $ kfed read /ocfs02/asm/data02 > fix.txt

    5. Edit the fix.txt and change thefollowing fields to the proper values (use the ASM alert log for reference):

           kfdhdb.dsknum
           kfdhdb.dskname
           kfdhdb.fgname

           修改相关的参数值

    Example:

    Check the alert log for proper names:

    NOTE: cache opening disk 0 of grp 1:DATA_0000 path:/ocfs02/asm/data01

    NOTE: cache opening disk 1 of grp 1:DATA_0001 path:/ocfs02/asm/data02

    NOTE: cache opening disk 2 of grp 1:DATA_0002 path:/ocfs02/asm/data03

    Old values from fix.txt:

           kfdhdb.dsknum:1 ; 0x024: 0x0001

           kfdhdb.grptyp:1 ; 0x026: KFDGTP_EXTERNAL

          

           kfdhdb.hdrsts:3 ; 0x027: KFDHDR_MEMBER

           kfdhdb.dskname:DATA_0001 ; 0x028: length=9

           kfdhdb.grpname:DATA ; 0x048: length=4

           kfdhdb.fgname:DATA_0001 ; 0x068: length=9

    New values from fix.txt:

           kfdhdb.dsknum:2 ; 0x024: 0x0002

           kfdhdb.grptyp:1 ; 0x026: KFDGTP_EXTERNAL

           kfdhdb.hdrsts:3 ; 0x027: KFDHDR_MEMBER

           kfdhdb.dskname:DATA_0002 ; 0x028: length=9

           kfdhdb.grpname:DATA ; 0x048: length=4

           kfdhdb.fgname:DATA_0002 ; 0x068: length=9

    6. Find the diskdirectory by dumping aunum=2 and blknum=2 for the disk with f1b1:

           根据file directory查找disk directory,命令如下:

          kfed read <device name> aunum=2 blknum=2 | more

    Example:

    $ kfed read /ocfs02/asm/data01 aunum=2blknum=2 | more

    kfffde[0].xptr.au: 2 ; 0x4a0: 0x00000002

    kfffde[0].xptr.disk: 2 ; 0x4a4: 0x0002

    kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0D=0 S=0

    kfffde[0].xptr.chk: 42 ; 0x4a7: 0x2a

    kfffde[1].xptr.au: 4294967295; 0x4a8:0xffffffff

    kfffde[1].xptr.disk: 65535 ; 0x4ac: 0xffff

    kfffde[1].xptr.flags: 0 ; 0x4ae: L=0 E=0D=0 S=0

    kfffde[1].xptr.chk: 42 ; 0x4af: 0x2a

           Afterthe initial file directory header, you will see the extent map. If thediskgroup is external redundancy then each entry refers to an extent of thefile. For normal redundancy, every pair is a extent set, similarly for highredundancy [012] form the extent set. Here we see thedisk directory is at au = 2 in disk number = 2.

           In this example, it turned out to bein that location on the second AU, but it is not guaranteed that it will alwaysbe there.

    7. Once the diskdirectory location is found, find the info for your disk number.

           一旦确定了disk directory 的位置,就可以查看disk number 的信息。命令如下:

           kfedread <device name> aunum=2 blknum=0 | more

    Example:

    kfed read /ocfs02/asm/data02 aunum=2blknum=0 | more

    kfbh.type: 6 ; 0x002: KFBTYP_DISKDIR

    ...

    kfddde[0].entry.incarn: 1 ;0x024: A=1 NUMM=0x0

    --为1 才是allocatedentries,为0表示该entry 已经被deleted。

    ...

    kfddde[2].dsknum: 2 ; 0x3b4: 0x0002

    kfddde[2].state: 2 ; 0x3b6: KFDSTA_NORMAL

    kfddde[2].ub1spare:0 ; 0x3b7: 0x00

    kfddde[2].dskname: DATA_0002 ; 0x3b8:length=9

    kfddde[2].fgname: DATA_0002 ; 0x3d8:length=9

    kfddde[2].crestmp.hi: 32885842; 0x3f8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

    kfddde[2].crestmp.lo:3860343808 ; 0x3fc: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39

    kfddde[2].failstmp.hi: 0 ; 0x400: HOUR=0x0DAYS=0x0 MNTH=0x0 YEAR=0x0

    kfddde[2].failstmp.lo: 0 ; 0x404: USEC=0x0MSEC=0x0 SECS=0x0 MINS=0x0

           Various kfddde refer to the disk directory entries.Only entries with entry.incarn numbers shouldA=1 are allocated entries. You might find entries with dskname populated, butif A=0 then it means that entry was deleted.

    8. Now go back to fix.txt and adjust thecrestmp.hi and crestmp.lo to match what the disk directory shows. Ifit is already the same then leave it.

           根据diskdirectory里的值修改crestmp.hi 和 crestmp.lo 参数

    Example:

    Before:

    kfdhdb.crestmp.hi: 32879468 ; 0x0a8:HOUR=0xc DAYS=0x1b MNTH=0xc YEAR=0x7d6

    kfdhdb.crestmp.lo:

    296378368 ; 0x0ac: USEC=0x0 MSEC=0x298SECS=0x1a MINS=0x4

    kfdhdb.mntstmp.hi: 32879468 ; 0x0b0:HOUR=0xc DAYS=0x1b MNTH=0xc YEAR=0x7d6

    kfdhdb.mntstmp.lo: 309633024 ; 0x0b4:USEC=0x0 MSEC=0x128 SECS=0x27 MINS=0x4

    After:

    kfdhdb.crestmp.hi:32885842 ; 0x0a8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

    kfdhdb.crestmp.lo:3860343808 ; 0x0ac: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39

    kfdhdb.mntstmp.hi: 32885842 ; 0x0b0:HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

    kfdhdb.mntstmp.lo: 3870944256 ; 0x0b4:USEC=0x0 MSEC=0x27b SECS=0x2b MINS=0x39

    9. Do a kfed merge to put the new headerinto the disk using fix.txt:

           用kfed 命令将我们修改的新的disk header merge 到损坏的disk header上。

    命令如下:

    kfed merge <device name> text=fix.txt

    Example:

    kfed merge /ocfs02/asm/data03 text=fix.txt

           Ifyou are using ASMLIB, at this point you will need to run the following to fixthe ASMLIB portion of the header:

           如果使用ASMLIB,还需要修复对应的header,命令如下:

           /etc/init.d/oracleasmforce-renamedisk /dev/sdbg1 <ASMLIB Disk Name>
           /etc/init.d/oracleasm scandisks
           /etc/init.d/oracleasm listdisks

    10. Startup nomount the ASM instance:

    SQL> startup nomount;

           启动ASM 实例

    11. Check v$asm_disk.header_status toverify that the disk header is in a “MEMBER” state.

           检查asmdisk header 的状态。

    Example:

    SQL> select path, header_status fromv$asm_disk where path like '%data03%';

    PATH

    --------------------------------------------------------------------------------

    HEADER_STATU

    ------------

    /ocfs02/asm/data03

    MEMBER

    12. Mount the diskgroup.

           mount diskgroup,命令如下:

           alterdiskgroup <diskgroup name> mount;

           Ifthe diskgroup fails to mount at this point, you may want to either considerre-creating the diskgroup and restoring or engaging BDE to assist. 

           Youmay also want to try clearing the first 4k of the disk with dd then do a kfedmerge again in case there are any extra characters causing problems (MAKE SURE YOU HAVE A BACKUP OF THE FIRST 4K FIRST):

           如果mount 失败,可以先考虑清空头4k的内容,然后在merge,如果还失败,就只能重建diskgroup,然后restore DB了。

    Example:

    dd if=<device name> of=<backupfile> bs=4096 count=1
    dd if=/dev/zero of=<device name> bs=4096 count=1

    4.2 说明

           我的测试环境的diskgroup 都只有一个disk,所以不能进行测试。只能通过备份进行恢复,而无法进行重建。

    如果进行重建,那么分别从filedirectory 中获取如下参数:

    kfdhdb.dsknum:                        0 ; 0x024: 0x0000

    kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL

    kfdhdb.hdrsts:                        3 ; 0x027:KFDHDR_MEMBER

    kfdhdb.dskname:                    DATA ; 0x028: length=4

    kfdhdb.grpname:                    DATA ; 0x048: length=4

    kfdhdb.fgname:                     DATA ; 0x068: length=4

    从diskdirectory 中获取如下参数:

    kfdhdb.crestmp.hi:32885842 ; 0x0a8: HOUR=0x12 DAYS=0x2 MNTH=0x3 YEAR=0x7d7

    kfdhdb.crestmp.lo:3860343808 ; 0x0ac: USEC=0x0 MSEC=0x20b SECS=0x21 MINS=0x39

    重新生成disk header 之后进行kfed merge恢复。 具体的操作步骤参考官网示例的步骤。总之备份终于一切。

    -------------------------------------------------------------------------------------------------------

    Blog: http://blog.csdn.net/tianlesoftware

    Weibo: http://weibo.com/tianlesoftware

    Email: dvd.dba@gmail.com

    DBA1 群:62697716(满);   DBA2 群:62697977(满)  DBA3 群:62697850(满)  

    DBA 超级群:63306533(满);  DBA4 群:83829929(满) DBA5群: 142216823(满) 

    DBA6 群:158654907(满)   DBA7 群:69087192(满)  DBA8 群:172855474

    DBA 超级群2:151508914  DBA9群:102954821     聊天 群:40132017(满)

    --加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请

  • 相关阅读:
    20145310《网络对抗》Exp2 后门原理与实践
    20145310《网络对抗》逆向及Bof基础
    20144303《网络对抗》免考项目——恶意代码分析以及检测
    20144303石宇森《网络对抗》Web安全基础实践
    20144303石宇森 《网络对抗》 WEB基础实践
    20144303石宇森 《网络对抗技术》 网络欺诈技术防范
    20144303 石宇森《网络对抗》信息收集和漏洞扫描技术
    20144303石宇森《网络对抗》MSF基础应用
    20144303石宇森 《网络对抗技术》 恶意代码分析
    20144303 石宇森 《网络对抗技术》免杀原理与实践
  • 原文地址:https://www.cnblogs.com/tianlesoftware/p/3609575.html
Copyright © 2011-2022 走看看