zoukankan      html  css  js  c++  java
  • NetApp存储方案及巡检命令

    一、MCC概述

    Clustered Metro Cluster(简称MCC)是Netapp Data Ontap提供的存储双活解决方案,当初的方案是把1个FAS/ V系列双控在数据中心之间拉远形成异地HA Pair,每站点只有单控制器节点,数据中心两站点之间通过额外的FC/VI集群适配器相连,数据中心间SAS磁盘框通过SAS转FC的FibreBridge相连在500米以内、同一个机房采用直接光纤通道交换机连接;在500米以上(最远100km)采用光纤通道和DWDM交换机相连。

    640?wx_fmt=png&wxfrom=5&wx_lazy=1

    0?wx_fmt=png

          MetroCluster在此架构上也进行了演变。通过在站点A、B两个站点分别放置两套FAS/ V双控阵列,阵列A的A控和阵列B的A控,阵列A的B控和阵列B的B控分别形成集群,这样可以充分把A、B站点数据中心资源充分利用,同时对外提供存储服务;但阵列内的A、B不是集群。如果站点间形成集群Pair的任意一个控制器节点故障,故障站点的主机都需要远程访问远端控制器节点;如何站点间形成集群Pair的两个节点同时故障,就会发生业务中断。

          Netapp Data Ontap8.3版本推出了4控双活解决方案,最远支持200公里距离,4控Metro Cluster方案首先由2个HA Pair组成2个本地集群,然后再从2个集群上做4节点集群。集群控制器之间内存日志通过存放在NVRAM里面,NVRAM对没有下盘的日志做了镜像,保证节点故障以后,HA Pair集群的Partner节点能够接管业务;或者站点故障以后,远端HA Pair集群能够接管业务。当日志到达一定水位或者发生系统操作刷盘时,下盘数据同步通过SyncMirror实现主从站点双写,从而确保一个站点磁盘故障以后,另外一个站点磁盘还能提供系统访问,实现站点故障切换,保证业务不中断。

    0?wx_fmt=png

          MetroCluster使用两个不同地点的镜像和集群来保护数据,每个集群把数据和Storage Virtual Machine (SVM) 配置都镜像同步另一个集群。当某个站点发生灾难时,管理员可以激活远端SVM并在另一站点接管业务。此外,每个集群在本地节点均配置为HA Pair,从而提供了本地故障转移能力。

    0?wx_fmt=png

          NetApp MetroCluster是以NetApp SyncMirror是配合Cluster_remote和控制器Cluster Failover的功能实现的。

        • Clustered Failover – 在主存储和容灾存储间提供高可用性失败恢复能力,故障接管的决策是由管理员通过单一命令行决定的。

        • SyncMirror – 为远端存储提供即时的数据拷贝,当故障接管时,数据可以仅通过远端的存储进行访问。

        • ClusterRemote – 提供管理机制用以判断灾难的发生并初始远端存储进行接管。

    二、MCC巡检常用命令

    1、系统健康状态检查

    cluster1::> system health status show
    Status
    ---------------
    ok

    2、集群状态检查

    cluster1::> cluster show              
    Node                  Health  Eligibility
    --------------------- ------- ------------
    cluster1-01           true    true
    cluster1-02           true    true
    2 entries were displayed.

    3、集群统计状态检查

    cluster1::> cluster statistics show
             Counter             Value         Delta
    ---------------- ----------------- -------------
           CPU Busy:                0%             -
         Operations:
              Total:                 0             -
                NFS:                 0             -
               CIFS:                 0             -
       Data Network:
               Busy:                0%             -
           Received:            5.78GB             -
               Sent:            13.7GB             -
    Cluster Network:
               Busy:                0%             -
           Received:             967KB             -
               Sent:             979KB             -
       Storage Disk:
               Read:            6.38PB             -
              Write:            6.26PB             -

    4、查看RAID组信息

    cluster1::> aggr show
                                                                          
    
    Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
    --------- -------- --------- ----- ------- ------ ---------------- ------------
    aggr0_A1   953.8GB   247.3GB   74% online       1 cluster1-01      raid4,
                                                                       mirrored,
                                                                       normal
    aggr0_A2   953.8GB   247.3GB   74% online       1 cluster1-02      raid4,
                                                                       mirrored,
                                                                       normal
    aggr_data_A1 
               68.93TB   16.04TB   77% online      32 cluster1-01      mixed_raid_
                                                                       type,
                                                                       mirrored,
                                                                       hybrid,
                                                                       normal
    aggr_data_A2 
               68.93TB   14.77TB   79% online      31 cluster1-02      mixed_raid_
                                                                       type,
                                                                       mirrored,
                                                                       hybrid,
                                                                       normal
    4 entries were displayed.

    5、查看节点信息

    cluster1::> node show
    Node      Health Eligibility Uptime        Model       Owner    Location  
    --------- ------ ----------- ------------- ----------- -------- ---------------
    cluster1-01 
              true   true        
                                369 days 19:12 FAS8040              gz_idc
    cluster1-02 
              true   true        
                                369 days 19:23 FAS8040              gz_idc
    2 entries were displayed.

    6、查看版本信息

    cluster1::> version
    NetApp Release 8.3.2P9: Fri Jan 06 05:54:05 UTC 2017

    7、查看序列号

    cluster1::> system license show
    
    Serial Number: 1-80-023992
    Owner: cluster1
    Package           Type    Description           Expiration
    ----------------- ------- --------------------- --------------------
    Base              license Cluster Base License  -
    
    Serial Number: 1-81-0000000000000451515******
    Package           Type    Description           Expiration
    ----------------- ------- --------------------- --------------------
    NFS               license NFS License           -
    iSCSI             license iSCSI License         -
    
    Serial Number: 1-81-0000000000000451515******
    Owner: cluster1-02
    Package           Type    Description           Expiration
    ----------------- ------- --------------------- --------------------
    NFS               license NFS License           -
    iSCSI             license iSCSI License         -
    5 entries were displayed.

    8、查看子系统健康状态

    cluster1::> system health subsystem show
    Subsystem         Health
    ----------------- ------------------
    SAS-connect       ok
    Environment       ok
    Memory            ok
    Service-Processor ok
    Switch-Health     ok
    CIFS-NDO          ok
    Motherboard       ok
    IO                ok
    MetroCluster      ok
    MetroCluster_Node ok
    FHM-Switch        ok
    FHM-Bridge        ok
    12 entries were displayed.

    9、查看MCC集群信息状态及节点信息状态

    cluster1::> metrocluster show
    
    Configuration: fabric
    
    Cluster                        Configuration State    Mode
    ------------------------------ ---------------------- ------------------------
     Local: cluster1               configured             normal
    Remote: cluster1_dr            configured             normal
    
    cluster1::> metrocluster node show
    DR                               Configuration  DR
    Group Cluster Node               State          Mirroring Mode
    ----- ------- ------------------ -------------- --------- --------------------
    1     cluster1
                  cluster1-01        configured     enabled   normal
                  cluster1-02        configured     enabled   normal
          cluster1_dr
                  cluster1_dr-01     configured     enabled   normal
                  cluster1_dr-02     configured     enabled   normal
    4 entries were displayed.

    10、查看控制器状态

    cluster1::> system controller show
    Controller Name           System ID     Serial Number     Model    Status      
    ------------------------- ------------- ----------------- -------- ----------- 
    cluster1-01               536964819     451515******      FAS8040  ok
    cluster1-02               536961600     451515******      FAS8040  ok
    2 entries were displayed.

    11、查看故障硬盘

    cluster1::> storage disk show -broken 
    There are no entries matching your query.

    12、查看spare硬盘

    cluster1::> storage disk show -spare  
    Original Owner: cluster1-01                                           
      Checksum Compatibility: block
                                                                Usable Physical
        Disk            HA Shelf Bay Chan   Pool  Type    RPM     Size     Size Owner
        --------------- ------------ ---- ------ ----- ------ -------- -------- --------
        1.30.11         3a    30  11    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        1.30.13         3a    30  13    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        1.31.4          3a    31   4    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        1.32.20         4b    32  20    B  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        1.32.23         3a    32  23    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        1.33.0          3a    33   0    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        1.33.1          3a    33   1    A  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        1.33.10         4b    33  10    B  Pool0   SAS  10000   1.09TB   1.09TB cluster1-01
        2.42.22         3a    42  22    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
        2.42.23         4b    42  23    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
        2.43.2          4b    43   2    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
        2.43.22         3b    43  22    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
        2.43.23         4b    43  23    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
        3.11.21         4b    11  21    B  Pool0   SSD      -  372.4GB  372.6GB cluster1-01
        4.20.21         3a    20  21    A  Pool1   SSD      -  372.4GB  372.6GB cluster1-01
        4.21.14         3a    21  14    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-01
    Original Owner: cluster1-02
      Checksum Compatibility: block
                                                                Usable Physical
        Disk            HA Shelf Bay Chan   Pool  Type    RPM     Size     Size Owner
        --------------- ------------ ---- ------ ----- ------ -------- -------- --------
        2.44.23         3b    44  23    A  Pool1   SAS  10000   1.09TB   1.09TB cluster1-02
        3.12.21         4a    12  21    B  Pool0   SSD      -  372.4GB  372.6GB cluster1-02
        4.23.21         3b    23  21    A  Pool1   SSD      -  372.4GB  372.6GB cluster1-02
        5.60.23         3b    60  23    B  Pool1   SAS  10000   1.09TB   1.09TB cluster1-02
    20 entries were displayed.

    13、查看SAS桥故障

    cluster1::> storage bridge show
                                           Is        Monitor
    Bridge                   Symbolic Name Monitored Status  Vendor Model                 Bridge WWN
    ------------------------ ------------- --------- ------- ------ --------------------- ----------------
    ATTO_10.0.15.17          BRIDGE_B_1
                                           true      ok      Atto   FibreBridge 6500N     2000001086627bc0
    ATTO_10.0.15.18          BRIDGE_B_2
                                           true      ok      Atto   FibreBridge 6500N     2000001086630f0e
    ATTO_10.0.15.19          BRIDGE_B_3
                                           true      ok      Atto   FibreBridge 6500N     2000001086630edc
    ATTO_10.0.15.20          BRIDGE_B_4
                                           true      ok      Atto   FibreBridge 6500N     2000001086630ed2
    ATTO_10.0.15.6           BRIDGE_A_1
                                           true      ok      Atto   FibreBridge 6500N     2000001086630eb4
    ATTO_10.0.15.7           BRIDGE_A_2
                                           true      ok      Atto   FibreBridge 6500N     2000001086630efa
    ATTO_10.0.15.8           BRIDGE_A_3
                                           true      ok      Atto   FibreBridge 6500N     2000001086630f18
    ATTO_10.0.15.9           BRIDGE_A_4
                                           true      ok      Atto   FibreBridge 6500N     2000001086630ef0
    ATTO_FibreBridge6500N_10 -
                                           false     -       Atto   FibreBridge6500N      200000108663e514
    ATTO_FibreBridge6500N_11 -
                                           false     -       Atto   FibreBridge6500N      200000108663e3f2
    ATTO_FibreBridge6500N_12 -
                                           false     -       Atto   FibreBridge6500N      200000108663e488
    ATTO_FibreBridge6500N_13 -
                                           false     -       Atto   FibreBridge6500N      20000010866114ec
    ATTO_FibreBridge6500N_14 -
                                           false     -       Atto   FibreBridge6500N      2000001086627bc0
    ATTO_FibreBridge6500N_7  -
                                           false     -       Atto   FibreBridge6500N      2000001086630e96
    ATTO_FibreBridge6500N_9  -
                                           false     -       Atto   FibreBridge6500N      200000108663e4c4
    15 entries were displayed.

    14、查看纤交换机故障

    cluster1::> storage switch show
                          Symbolic                                Is        Monitor
    Switch                Name     Vendor  Model Switch WWN       Monitored Status
    --------------------- -------- ------- ----- ---------------- --------- -------
    Brocade_10.0.15.10
                          SW_A_1
                                   Brocade Brocade6505
                                                 100050eb1a88327f true      ok
    Brocade_10.0.15.11
                          SW_A_2
                                   Brocade Brocade6505
                                                 100050eb1a881582 true      ok
    Brocade_10.0.15.21
                          SW_B_3
                                   Brocade Brocade6505
                                                 100050eb1a882f69 true      ok
    Brocade_10.0.15.22
                          SW_B_4
                                   Brocade Brocade6505
                                                 100050eb1a881522 true      ok
    4 entries were displayed.

    15、查看failover状态

    cluster1::> storage failover show 
                                  Takeover          
    Node           Partner        Possible State Description  
    -------------- -------------- -------- -------------------------------------
    cluster1-01    cluster1-02    true     Connected to cluster1-02
    cluster1-02    cluster1-01    true     Connected to cluster1-01
    2 entries were displayed.

    16、查看严重告警日志及错误告警日志

    cluster1::> event log show -severity critical 
    There are no entries matching your query.
    
    cluster1::> event log show -severity error
    Time                Node             Severity      Event
    ------------------- ---------------- ------------- ---------------------------
    3/6/2018 02:28:30   cluster1-02      ERROR         asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (MANAGEMENT_LOG) INFO) for host (0) was not posted to NetApp. The system will drop the message.
    3/6/2018 01:28:18   cluster1-02      ERROR         asup.post.drop: AutoSupport message (HA Group Notification from cluster1-02 (PERFORMANCE DATA) INFO) for host (0) was not posted to NetApp. The system will drop the message.
    3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) cluster1, Serial Number 5589765F, Certificate Authority 'cluster1' and type server for Vserver cluster1 has expired.
    3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM2, Serial Number 55A03966, Certificate Authority 'SVM2' and type server for Vserver SVM2 has expired.
    3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UC_SVM, Serial Number 559FFD76, Certificate Authority 'SVM' and type server for Vserver SVM has expired.
    3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM_DR, Serial Number 545845C16E278, Certificate Authority 'SVM_DR' and type server for Vserver SVM_DR-mc has expired.
    3/6/2018 00:00:07   cluster1-02      ERROR         mgmtgwd.certificate.expired: A digital certificate with Fully Qualified Domain Name (FQDN) UCS_SVM2_DR, Serial Number 545845A7B01FA, Certificate Authority 'SVM2_DR' and type server for Vserver SVM2_DR-mc has expired.
    7 entries were displayed.

     17、查看某个聚合下的Volume状态信息
    cluster1::> vol show -aggregate aggr_data_A1

     18、查看Lun信息及Lun详细信息

    cluster1::> lun show
    cluster1::> lun show -v

     19、查看map信息及map详情

    cluster1::> igroup show
    cluster1::> igroup show -v

     20、查看Lun的map情况

    cluster1::> lun show -m

    21、进入某一节点

    cluster1::> run -node cluster1-01 
    Type 'exit' or 'Ctrl-D' to return to the CLI
    cluster1-01>

     22、节点下查看spare disks

    cluster1-01> vol status -s
    
    Local spares
    
    Pool1 spare disks
    
    RAID Disk       Device                  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
    ---------       ------                  ------------- ---- ---- ---- ----- --------------    --------------
    Spare disks for block checksum
    spare           SW_B_3:6.126L41         3a    21  14  FC:A   1   SAS 10000 1142352/2339537408 1144641/2344225968 (not zeroed)
    spare           SW_B_3:7.126L75         3a    42  22  FC:A   1   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_B_3:7.126L101        3b    43  22  FC:A   1   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_B_4:7.126L76         4b    42  23  FC:B   1   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_B_4:7.126L29         4b    43  2   FC:B   1   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_B_4:7.126L50         4b    43  23  FC:B   1   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_B_3:6.126L22         3a    20  21  FC:A   1   SSD   N/A 381304/780910592  381554/781422768 
    
    Pool0 spare disks
    
    RAID Disk       Device                  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
    ---------       ------                  ------------- ---- ---- ---- ----- --------------    --------------
    Spare disks for block checksum
    spare           SW_A_1:7.126L12         3a    30  11  FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_1:7.126L14         3a    30  13  FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_1:7.126L31         3a    31  4   FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_1:7.126L76         3a    32  23  FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_1:7.126L79         3a    33  0   FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_1:7.126L80         3a    33  1   FC:A   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_2:7.126L73         4b    32  20  FC:B   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_2:7.126L37         4b    33  10  FC:B   0   SAS 10000 1142352/2339537408 1144641/2344225968 
    spare           SW_A_2:6.126L74         4b    11  21  FC:B   0   SSD   N/A 381304/780910592  381554/781422768

     23、节点下查看fail disk

    cluster1-01> vol status -f
    
    Broken disks (empty)

     24、显示没有ownership(归属权)的硬盘

    cluster1-01> disk show -n
    
    disk show : No unassigned disks

     25、分配硬盘的归属(硬盘更换常用)

    cluster1-01> disk assign all 

      26、查看所有硬盘位置信息

    cluster1-01> storage show disk -p 
  • 相关阅读:
    Docker学习笔记之常见 Dockerfile 使用技巧
    Docker学习笔记之通过 Dockerfile 创建镜像
    Docker学习笔记之保存和共享镜像
    Linux学习笔记之Linux环境变量总结
    Docker学习笔记之Docker的数据管理和存储
    Docker学习笔记之为容器配置网络
    Prometheus监控学习笔记之360基于Prometheus的在线服务监控实践
    Java学习笔记之Linux下的Java安装和配置
    Prometheus监控学习笔记之教程推荐
    ROS学习笔记(一) # ROS参数服务器
  • 原文地址:https://www.cnblogs.com/cloudos/p/8515574.html
Copyright © 2011-2022 走看看