zoukankan      html  css  js  c++  java
  • Filestream/Windows Share导致Alwayson Failover失败

    最近做了一个case, 客户在ALWAYSON环境下进行failover操作, 之后所有replica上的alwayson group状态变成了resolving。 并且在执行failover的replica上生成1个到多个dump 文件。

    下面是具体的排查问题。

    环境

    ===

    SQL Server 2014 SP1 CU3

    Primary replica: p1

    Secondary replica: p2

    Secondary replica: p3

    P1和P2属于同一个子网

    P3在另外一个子网。

    Availability mode均为sync mode.

    和客户讨论和得知,在p1和p2之间进行failover一切正常,并不会失败或生成dump。只有尝试将p3设置为primary replica才会发生错误。

    执行的语句为alter availability group groupName failover

    Errorlog记录了下面的内容

    2015-12-14 09:57:47.18 spid52 ***Stack Dump being sent to F:MSSQL12.DBAAGINS1MSSQLLOGSQLDump0001.txt

    2015-12-14 09:57:47.18 spid52 * *******************************************************************************

    2015-12-14 09:57:47.18 spid52 *

    2015-12-14 09:57:47.18 spid52 * BEGIN STACK DUMP:

    2015-12-14 09:57:47.18 spid52 * 12/14/15 09:57:47 spid 52

    2015-12-14 09:57:47.18 spid52 *

    2015-12-14 09:57:47.18 spid52 * Location:     HadrFstrVnnUtils.cpp:479

    2015-12-14 09:57:47.18 spid52 * Expression:     SUCCEEDED (hr)

    2015-12-14 09:57:47.18 spid52 * SPID:         52

    2015-12-14 09:57:47.18 spid52 * Process ID:     5412

    2015-12-14 09:57:47.18 spid52 *

    2015-12-14 09:57:47.18 spid52 * Input Buffer 255 bytes -

    2015-12-14 09:57:47.18 spid52 * 16 00 00 00 12 00 00 00 02 00 00 00 00 00 00 00 00 00

    2015-12-14 09:57:47.18 spid52 * ÿÿ & ç 01 00 00 00 ff ff 0d 00 00 00 00 01 26 04 00 00 00 e7

    2015-12-14 09:57:47.18 spid52 * ÿÿ     þÿÿÿÿÿÿÿF ff ff 09 04 00 02 00 fe ff ff ff ff ff ff ff 46 00 00

    2015-12-14 09:57:47.18 spid52 * @ P 1 n v a r c 00 40 00 50 00 31 00 20 00 6e 00 76 00 61 00 72 00 63

    2015-12-14 09:57:47.18 spid52 * h a r ( 8 0 ) , @ 00 68 00 61 00 72 00 28 00 38 00 30 00 29 00 2c 00 40

    2015-12-14 09:57:47.18 spid52 * P 2 b i g i n t 00 50 00 32 00 20 00 62 00 69 00 67 00 69 00 6e 00 74

    2015-12-14 09:57:47.18 spid52 * , @ P 3 i n t 00 2c 00 40 00 50 00 33 00 20 00 69 00 6e 00 74 00 00

    2015-12-14 09:57:47.18 spid52 * çÿÿ     þÿÿÿÿ 00 00 00 00 00 e7 ff ff 09 04 00 02 00 fe ff ff ff ff

    2015-12-14 09:57:47.18 spid52 * ÿÿÿx e x e c s ff ff ff 78 00 00 00 65 00 78 00 65 00 63 00 20 00 73

    2015-12-14 09:57:47.18 spid52 * p _ a v a i l a b 00 70 00 5f 00 61 00 76 00 61 00 69 00 6c 00 61 00 62

    2015-12-14 09:57:47.18 spid52 * i l i t y _ g r o 00 69 00 6c 00 69 00 74 00 79 00 5f 00 67 00 72 00 6f

    2015-12-14 09:57:47.18 spid52 * u p _ c o m m a n 00 75 00 70 00 5f 00 63 00 6f 00 6d 00 6d 00 61 00 6e

    2015-12-14 09:57:47.18 spid52 * d _ i n t e r n a 00 64 00 5f 00 69 00 6e 00 74 00 65 00 72 00 6e 00 61

    2015-12-14 09:57:47.18 spid52 * l @ P 1 , 1 , 00 6c 00 20 00 40 00 50 00 31 00 2c 00 20 00 31 00 2c

    2015-12-14 09:57:47.18 spid52 * @ P 2 , @ P 3 00 20 00 40 00 50 00 32 00 2c 00 20 00 40 00 50 00 33

    2015-12-14 09:57:47.18 spid52 * ç       H 8 00 00 00 00 00 00 00 e7 a0 00 09 04 00 02 00 48 00 38

    2015-12-14 09:57:47.18 spid52 * e a 6 b e b 5 - 0 00 65 00 61 00 36 00 62 00 65 00 62 00 35 00 2d 00 30

    2015-12-14 09:57:47.18 spid52 * d e 3 - 4 f 7 1 - 00 64 00 65 00 33 00 2d 00 34 00 66 00 37 00 31 00 2d

    2015-12-14 09:57:47.18 spid52 * 9 0 b 5 - 3 5 d f 00 39 00 30 00 62 00 35 00 2d 00 33 00 35 00 64 00 66

    2015-12-14 09:57:47.18 spid52 * d 1 0 3 6 5 c 2 00 64 00 31 00 30 00 33 00 36 00 35 00 63 00 32 00 00

    2015-12-14 09:57:47.18 spid52 * & ø ¨ & © 00 26 08 08 f8 06 a8 0d 00 00 00 00 00 00 26 04 04 a9

    2015-12-14 09:57:47.18 spid52 * UM 03 55 4d

    2015-12-14 09:57:47.18 spid52 *

    所以首先分析了dump文件。生成dump的callstack 内容如下:

    Callstack
    ===
    sqlmin!HadrFstrVnnUtils::GetRsFxEndpointPath+0x7e           
    sqlmin!HadrFstrVnnUtils::SetClusterResourceProperties+0x153 
    sqlmin!HadrFstrVnnUtils::RefreshWsfcConfig+0x299            
    sqlmin!CHadrArProxy::RefreshFilestreamInWsfc+0xff           
    sqlmin!CHadrArController::RefreshFilestreamInWsfc+0x4f      
    sqlmin!CFstrSubscriber::Publish+0x138                       
    sqlmin!CHadrPublisher::Publish+0x333                        
    sqlmin!CHadrArProxy::PublishRoleChangeEvent+0x19d           
    sqlmin!CHadrArProxy::Signal+0x469                           
    sqlmin!CHadrArController::Online+0x1b5                      
    sqlmin!CHadrArManager::OnlineAg+0x12d                       
    sqlmin!SpAvailabilityGroupCommand+0x2f5    

       

    经过测试和排查, 终于发现了原因:

    p1和p2均配置了Filestream和Windows Share,但p3没有这些配置.

    解释:

    Alwayson以及SQL Cluster中有一个概念叫做WSFC Storage(存储在注册表内),用于存储一些共享信息。在Alwayson中,如果primary的一些配置发生变化,这些变化也会反映到wsfc storage里,并在同步到其他的secondary replica中。

    如果primary replica启动了Filestream和windows share name,那么这些信息会存储在WSFC store(注册表)。这些信息会被同步到所有的replica

    secondary replica接收到failover命令时,他会去读取本地的WSFC Store。如果WSFC Store显示Filestream和windows share没有启动,那么执行正常的failover操作。如果已经启动,那就会去尝试得到相应的windows share。如果这当前的replica没有启动Filestream,或没有启动windows Share,那么就会出现异常,导致failover失败并生成dump文件。

     

     

    重现方式如下:

    创建两个replica的,

    P1为primary replica

    P2为secondary replica

    同步模式。

    Failover的方式均为手动(manual)。

    其中P1的配置如下

    启用了Filestream,并且设置了Windows Share name.

    如果p2的配置和p1不同,那么failover就会失败。

     

     

    解决方法也很简单:

    保持replica的配置一致.

    如果不需要使用这些功能,那么将这些工作在所有的replica上禁用即可。

    或者在所有的replica上都开启这些功能。

  • 相关阅读:
    xib中Autolayout的使用
    duplicate symbol _gestureMinimumTranslation in:
    oracle数字字符串是否有非法数字
    ora-01536 space quota exceeded for tablespace
    Linux core 文件(二)
    Linux core 文件(一)
    完全卸载oracle11g步骤
    oracle11gr2 for windows 32-bit win7安装与卸载
    Oracle的几个概念:数据库名,全局数据库名,SID,实例,命名空间,schema
    cut 字符截取命令
  • 原文地址:https://www.cnblogs.com/stswordman/p/5098351.html
Copyright © 2011-2022 走看看