zoukankan      html  css  js  c++  java
  • asm 磁盘分区丢失恢复----惜分飞

    有朋友反馈,他们做了xx存储的双活之后,重启主机发现gi无法正常启动,分析发现所有该存储的磁盘分区信息丢失,导致asmlib无法发现磁盘(使用分区做asm disk)
    类似如下错误(磁盘分区丢失)

    --fdisk -l 显示部分结果
    Disk /dev/mapper/datahds1: 1099.5 GB, 1099511627776 bytes
    255 heads, 63 sectors/track, 133674 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk identifier: 0x00000000
     
    --ls -l /dev/mapper/   显示结果无分区信息
    lrwxrwxrwx 1 root root      7 May  6 03:44 datahds1 -> ../dm-1
    lrwxrwxrwx 1 root root      7 May  6 03:26 datahds2 -> ../dm-3
    lrwxrwxrwx 1 root root      7 May  6 03:26 datahds3 -> ../dm-8
    lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds1 -> ../dm-0
    lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds2 -> ../dm-2
    lrwxrwxrwx 1 root root      7 May  6 03:26 ocrhds3 -> ../dm-4

    asm日志显示

    SUCCESS: diskgroup DATADG was mounted
    NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
    SUCCESS: diskgroup OCRHDS was mounted
    ORA-15032: not all alterations performed
    ORA-15017: diskgroup "DATA" cannot be mounted
    ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

    分析系统日志

    May  6 02:23:27 db2 kernel: sdb: unknown partition table
    May  6 02:23:27 db2 kernel: sde: unknown partition table
    May  6 02:23:27 db2 kernel: sdc: unknown partition table
    May  6 02:23:27 db2 kernel: sdf: unknown partition table
    May  6 02:23:27 db2 kernel: sdd: unknown partition table
    May  6 02:23:27 db2 kernel: sdj:Dev sdj: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sdi: sdi1
    May  6 02:23:27 db2 kernel: sdk: sdk1
    May  6 02:23:27 db2 kernel: sdg: unknown partition table
    May  6 02:23:27 db2 kernel: sdl: sdl1
    May  6 02:23:27 db2 kernel: sdm:Dev sdm: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sdo:Dev sdo: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sdn:Dev sdn: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sdp:Dev sdp: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sds:Dev sds: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sdh:
    May  6 02:23:27 db2 kernel: sdt: sdt1
    May  6 02:23:27 db2 kernel: sdv:Dev sdv: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sdq:Dev sdq: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sd 1:0:1:9: [sdr] Very big device. Trying to use READ CAPACITY(16).
    May  6 02:23:27 db2 kernel: sdr:Dev sdr: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sd 2:0:0:9: [sdab] Very big device. Trying to use READ CAPACITY(16).
    May  6 02:23:27 db2 kernel: sdab: unknown partition table
    May  6 02:23:27 db2 kernel: sdac: unknown partition table
    May  6 02:23:27 db2 kernel: sdw: sdw1
    May  6 02:23:27 db2 kernel: sdu:Dev sdu: unable to read RDB block 0
    May  6 02:23:27 db2 kernel: unable to read partition table
    May  6 02:23:27 db2 kernel: sdx: sdx1
    May  6 02:23:27 db2 kernel: sdy: sdy1
    May  6 02:23:27 db2 kernel: sdaa: sdaa1
    May  6 02:23:27 db2 kernel: sdz: sdz1
    May  6 02:23:27 db2 kernel: sdae: unknown partition table
    May  6 02:23:27 db2 kernel: sdaf: unknown partition table
    May  6 02:23:27 db2 kernel: sdag: unknown partition table
    May  6 02:23:27 db2 kernel: sdai:
    May  6 02:23:27 db2 kernel: sdah: unknown partition table
    May  6 02:23:27 db2 kernel: sdad: unknown partition table
    May  6 02:23:28 db2 mcelog: failed to prefill DIMM database from DMI data

    这里错误比较明显unknown partition table,磁盘的分区信息损坏.使用fdisk无法发现分区

    partprobe也无效

     
    [root@db2 oracle]# partprobe /dev/mapper/ocrhds3
    [root@db2 oracle]#
    [root@db2 oracle]# ls -l /dev/mapper/ocrhds3*
    lrwxrwxrwx 1 root root 7 May  6 07:30 /dev/mapper/ocrhds3 -> ../dm-4

    从尚需信息看,磁盘的分区表信息应该已经损坏,现在能够做的,就是希望运气好,磁盘的分区的实际数据没有损坏

    分析磁盘实际分区数据

    [root@db2 ~]$ dd if=/dev/mapper/datahds1 of=/tmp/datahds1.dd bs=1024k count=50
    [root@db2 ~]$ dd if=/tmp/datahds1.dd of=/tmp/xff01.dd  bs=3225 skip=1
    [grid@db2 ~]$ kfed read /tmp/xff01.dd |more
    kfbh.endian:                          1 ; 0x000: 0x01
    kfbh.hard:                          130 ; 0x001: 0x82
    kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
    kfbh.datfmt:                          1 ; 0x003: 0x01
    kfbh.block.blk:                       0 ; 0x004: blk=0
    kfbh.block.obj:              2147483648 ; 0x008: disk=0
    kfbh.check:                  3110278718 ; 0x00c: 0xb963163e
    kfbh.fcn.base:                        0 ; 0x010: 0x00000000
    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
    kfbh.spare1:                          0 ; 0x018: 0x00000000
    kfbh.spare2:                          0 ; 0x01c: 0x00000000
    kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16
    kfdhdb.driver.reserved[0]:   1146307656 ; 0x008: 0x44534448
    kfdhdb.driver.reserved[1]:    826364993 ; 0x00c: 0x31415441
    kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
    kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
    kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
    kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
    kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
    kfdhdb.dsknum:                        0 ; 0x024: 0x0000
    kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
    kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
    kfdhdb.dskname:             DATADG_0000 ; 0x028: length=11
    kfdhdb.grpname:                  DATADG ; 0x048: length=6
    kfdhdb.fgname:              DATADG_0000 ; 0x068: length=11
    kfdhdb.capname:                         ; 0x088: length=0
    kfdhdb.crestmp.hi:             33050696 ; 0x0a8: HOUR=0x8 DAYS=0x2 MNTH=0x4 YEAR=0x7e1
    kfdhdb.crestmp.lo:           3813740544 ; 0x0ac: USEC=0x0 MSEC=0x44 SECS=0x35 MINS=0x38
    kfdhdb.mntstmp.hi:             33050701 ; 0x0b0: HOUR=0xd DAYS=0x2 MNTH=0x4 YEAR=0x7e1
    kfdhdb.mntstmp.lo:            411385856 ; 0x0b4: USEC=0x0 MSEC=0x150 SECS=0x8 MINS=0x6

    通过上述分析,我们可以初步判断,分区磁盘的信息很可能是好的(因为asm disk header是好的,根据一般的规则从前往后覆盖,既然header是好的,后面的block被覆盖的概率非常小)

    通过准备新磁盘直接把磁盘分区dd到新设备上

    dd if=/dev/mapper/ocrhds1 of=/dev/mapper/ocrhdsnew1 skip=1 bs=3225
    dd if=/dev/mapper/ocrhds2 of=/dev/mapper/ocrhdsnew2 skip=1 bs=3225
    dd if=/dev/mapper/ocrhds3 of=/dev/mapper/ocrhdsnew3 skip=1 bs=3225
    dd if=/dev/mapper/datahds1 of=/dev/mapper/datahdsnew1 skip=1 bs=3225
    dd if=/dev/mapper/datahds2 of=/dev/mapper/datahdsnew2 skip=1 bs=3225
    dd if=/dev/mapper/datahds3 of=/dev/mapper/datahdsnew3 skip=1 bs=3225

    asmlib重新扫描磁盘

    [root@db1 disks]# oracleasm scandisks
    Reloading disk partitions: done
    Cleaning any stale ASM disks...
    Scanning system for ASM disks...
    Instantiating disk "HDSOCR3"
    Instantiating disk "HDSDATA2"
    Instantiating disk "HDSDATA1"
    Instantiating disk "HDSDATA3"
    Instantiating disk "HDSOCR1"
    Instantiating disk "HDSOCR2"
    [root@db1 disks]# ls -ltr
    total 0
    brw-rw---- 1 grid asmadmin  8, 160 May  6 13:49 HDSOCR3
    brw-rw---- 1 grid asmadmin  8, 192 May  6 13:49 HDSDATA2
    brw-rw---- 1 grid asmadmin  8, 176 May  6 13:49 HDSDATA1
    brw-rw---- 1 grid asmadmin  8, 208 May  6 13:49 HDSDATA3
    brw-rw---- 1 grid asmadmin  8, 128 May  6 13:49 HDSOCR1
    brw-rw---- 1 grid asmadmin  8, 144 May  6 13:49 HDSOCR2

    kfed验证拷贝的分区

    [root@db2 tmp]# /oracle/app/11.2.0/grid_1/bin/kfed read /dev/oracleasm/disks/HDSDATA1
    kfbh.endian:                          1 ; 0x000: 0x01
    kfbh.hard:                          130 ; 0x001: 0x82
    kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
    kfbh.datfmt:                          1 ; 0x003: 0x01
    kfbh.block.blk:                       0 ; 0x004: blk=0
    kfbh.block.obj:              2147483648 ; 0x008: disk=0
    kfbh.check:                  3110278718 ; 0x00c: 0xb963163e
    kfbh.fcn.base:                        0 ; 0x010: 0x00000000
    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
    kfbh.spare1:                          0 ; 0x018: 0x00000000
    kfbh.spare2:                          0 ; 0x01c: 0x00000000
    kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16
    kfdhdb.driver.reserved[0]:   1146307656 ; 0x008: 0x44534448
    kfdhdb.driver.reserved[1]:    826364993 ; 0x00c: 0x31415441
    kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
    kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
    kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
    kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
    kfdhdb.compat:                186646528 ; 0x020: 0x0b200000
    kfdhdb.dsknum:                        0 ; 0x024: 0x0000
    kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
    kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
    kfdhdb.dskname:             DATADG_0000 ; 0x028: length=11
    kfdhdb.grpname:                  DATADG ; 0x048: length=6
    kfdhdb.fgname:              DATADG_0000 ; 0x068: length=11
    kfdhdb.capname:                         ; 0x088: length=0

    asm和数据库启动正常

    [grid@db2 ~]$ asmcmd
    ASMCMD> lsdg
    State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
    MOUNTED  EXTERN  N         512   4096  1048576   3145710  2378034                0         2378034              0             N  DATADG/
    MOUNTED  NORMAL  N         512   4096  1048576     15342    14416             5114            4651              0             Y  OCRHDS/
    ASMCMD>
     
    [oracle@db2 ~]$ sqlplus  / as sysdba
     
    SQL*Plus: Release 11.2.0.4.0 Production on Sat May 6 13:54:21 2017
     
    Copyright (c) 1982, 2013, Oracle.  All rights reserved.
     
    Connected to an idle instance.
     
    SQL> startup
    ORACLE instance started.
     
    Total System Global Area 3.6077E+10 bytes
    Fixed Size                  2260648 bytes
    Variable Size            7247757656 bytes
    Database Buffers         2.8723E+10 bytes
    Redo Buffers              104382464 bytes
    Database mounted.
    Database opened.
    SQL>

    asm-disk-partition-lost-recovery


    通过上述恢复,实现asm磁盘分区丢失数据0丢失
    如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
    Phone:13429648788    Q Q:107644445QQ咨询惜分飞    E-Mail:dba@xifenfei.com

  • 相关阅读:
    WinForm里保存TreeView状态
    动态规划 回溯和较难题
    go 基本链表操作
    leetcode 42接雨水
    leetcode 旋转数组搜索
    leetcode 牛客编程 子序列 树 数组(积累)
    剑指offer(积累)
    go快排计算最小k个数和第k大的数
    leetcode 打家劫舍
    leetcode 字符串相关问题
  • 原文地址:https://www.cnblogs.com/xifenfei/p/10023465.html
Copyright © 2011-2022 走看看