zoukankan      html  css  js  c++  java
  • MongoDB 集群 config server 查询超时导致 mongos 集群写入失败

    环境

    OS:CentOS 7.x
    DB:MongoDB 3.6.12
    集群模式:mongod-shard1 *3 + mongod-shard2 *3 + mongod-conf-shard *3 + mongos *3

    业务错误日志

    caused by :: NetworkInterfaceExceededTimeLimit: Operation time out on server ****:27018
    ....
    at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:107)
    

    故障复现


    在一个集合执行 insert 操作的时候,提示 NetworkInterfaceExceededTimeLimit: Operation time out
    在另一个不存在的集合执行就可以正常操作。

    怀疑 config server 查询分片信息的时候有问题。

    排查问题

    2020-07-07T09:55:36.605+0800 D REPL     [conn52850] Required snapshot optime: { ts: Timestamp(1594086936, 7), t: 19 } is not yet part of the current 'committed' snapshot: { ts: Timestamp(1594086936, 3), t: 19 }
    2020-07-07T09:55:36.605+0800 D REPL     [conn35081] Required snapshot optime: { ts: Timestamp(1594086936, 7), t: 19 } is not yet part of the current 'committed' snapshot: { ts: Timestamp(1594086936, 3), t: 19 }
    2020-07-07T09:55:37.084+0800 D REPL     [conn72545] waitUntilOpTime: waiting for optime:{ ts: Timestamp(1594086683, 2), t: 20 } to be in a snapshot -- current snapshot: { ts: Timestamp(1594086936, 7), t: 19 }
    2020-07-07T09:55:37.187+0800 I COMMAND  [conn72537] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1594086804, 1), t: 20 } }, maxTimeMS: 30000, $readPreference: { mode: "nearest" }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1594086903, 1), signature: { hash: BinData(0, CD6262BF59D2AAC318183C6109F3B31DEE2E1837), keyId: 6807014219125358676 } }, $configServerState: { opTime: { ts: Timestamp(1594086804, 1), t: 20 } }, $db: "config" }
    2020-07-07T09:55:37.187+0800 I COMMAND  [conn72537] command config.$cmd command: find { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1594086804, 1), t: 20 } }, maxTimeMS: 30000, $readPreference: { mode: "nearest" }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1594086903, 1), signature: { hash: BinData(0, CD6262BF59D2AAC318183C6109F3B31DEE2E1837), keyId: 6807014219125358676 } }, $configServerState: { opTime: { ts: Timestamp(1594086804, 1), t: 20 } }, $db: "config" } numYields:0 reslen:517 locks:{} protocol:op_msg 30009ms
    2020-07-07T09:55:37.187+0800 I NETWORK  [conn72537] end connection *.*.*.*:45296 (34 connections now open)
    2020-07-07T09:55:40.425+0800 D REPL     [conn72539] Required snapshot optime: { ts: Timestamp(1594086940, 1), t: 19 } is not yet part of the current 'committed' snapshot: { ts: Timestamp(1594086936, 7), t: 19 }
    

    在 config server 的日志里找到一行 Command on database config timed out waiting for read concern to be satisfied.
    具体原因未知,但是显示在 config server 上执行 find 操作的时候,执行超时。 和业务日志报错限制一致。

    重启 config server PRIMARY 节点,触发 config server 副本集SECONDARY节点的重新选举机制。
    故障恢复。

  • 相关阅读:
    Oracle基础 07 参数文件 pfile/spfile
    Oracle基础 06 控制文件 controlfile
    Oracle基础 05 联机日志 redolog
    Oracle基础 04 归档日志 archivelog
    Oracle基础 02 临时表空间 temp
    Oracle基础 03 回滚表空间 undo
    Oracle基础 01 表空间 tablespace
    PL/SQL Developer 连接 Oracle
    Windows下卸载Oracle
    Vue:基础语法
  • 原文地址:https://www.cnblogs.com/TopGear/p/13259952.html
Copyright © 2011-2022 走看看