zoukankan      html  css  js  c++  java
  • Greenplum failed segment的恢复方法--primary与mirror都可修复

      当在使用greenplum过程中有不当的操作时,可能会出现segment节点宕掉的情况(比如在greenplum运行的过程中停掉其中几台segment节点的服务器),通过下面的方法可以恢复segment。

    下面是现场出现的故障情况:

    复制代码
    [gpadmin@tj-soc-c04-csfb1 ~]$ gpstate -m
    20161010:16:35:54:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:-Starting gpstate with args: -m
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.2 build 1'
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.2 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Nov 12 2015 23:50:28'
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:-Obtaining Segment details from master...
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:--------------------------------------------------------------
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:--Current GPDB mirror list and status
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:--Type = Group
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:--------------------------------------------------------------
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:-   Mirror             Datadir                       Port    Status              Data Status       
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-tj-soc-c04-csfb2   /data1/gpdata/mirror/gpseg0   41000   Failed                                <<<<<<<<
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-tj-soc-c04-csfb2   /data1/gpdata/mirror/gpseg1   41001   Failed                                <<<<<<<<
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-tj-soc-c04-csfb3   /data1/gpdata/mirror/gpseg2   41000   Failed                                <<<<<<<<
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-tj-soc-c04-csfb3   /data1/gpdata/mirror/gpseg3   41001   Failed                                <<<<<<<<
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:-   tj-soc-c04-csfb4   /data1/gpdata/mirror/gpseg4   41000   Acting as Primary   Change Tracking
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:-   tj-soc-c04-csfb4   /data1/gpdata/mirror/gpseg5   41001   Acting as Primary   Change Tracking
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-tj-soc-c04-csfb1   /data1/gpdata/mirror/gpseg6   41000   Failed                                <<<<<<<<
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-tj-soc-c04-csfb1   /data1/gpdata/mirror/gpseg7   41001   Failed                                <<<<<<<<
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[INFO]:--------------------------------------------------------------
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-2 segment(s) configured as mirror(s) are acting as primaries
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-6 segment(s) configured as mirror(s) have failed
    20161010:16:35:55:026100 gpstate:tj-soc-c04-csfb1:gpadmin-[WARNING]:-2 mirror segment(s) acting as primaries are in change tracking
    复制代码

    可以看到有6个节点Failed,有2个节点的Primary和Mirror交换了。

    一、首先需要停掉GP

    gpstop -M fast -a    这样会告诉你有几个节点DOWN了

    二、启动GP

    gpstart    启动数据库会忽略DOWN的节点

    三、生成一个恢复配置文件

    gprecoverseg -o ./recov    会在当前目录生成一个recov文件,里面包含了要恢复的节点信息

    recov文件内容如下:(注意:这个文件不是手动创建的,而是通过gprecoverseg -o ./recov命令生成的

    复制代码
    filespaceOrder=
    tj-soc-c04-csfb2:41000:/data1/gpdata/mirror/gpseg0 tj-soc-c04-csfb2:41001:/data1/gpdata/mirror/gpseg1 tj-soc-c04-csfb3:41000:/data1/gpdata/mirror/gpseg2 tj-soc-c04-csfb3:41001:/data1/gpdata/mirror/gpseg3 tj-soc-c04-csfb1:41000:/data1/gpdata/mirror/gpseg6 tj-soc-c04-csfb1:41001:/data1/gpdata/mirror/gpseg7
    复制代码

    四、使用恢复配置文件恢复节点

    $gprecoverseg -i ./recov

    恢复过程中可以用gpstate -m 查看恢复状态:Resynchronizing(表示正在恢复中),Synchronized(表示恢复完毕)

    五、调整Primary和Mirror

    上面的情况中有Primary和Mirror兑换的情况,所以需要把他们换回来,可以用下面的命令

    gprecoverseg -r

    等待所有的节点都是Synchronized后,segment就恢复好了

  • 相关阅读:
    python找出数组中第二大的数
    【高并发解决方案】5、如何设计一个秒杀系统
    如何找出单链表中的倒数第k个元素
    二叉树的前序,中序,后序遍历
    剑指Offer题解(Python版)
    python之gunicorn的配置
    python3实现字符串的全排列的方法(无重复字符)
    python实现斐波那契数列
    每天一个linux命令(56):netstat命令
    每天一个linux命令(55):traceroute命令
  • 原文地址:https://www.cnblogs.com/xibuhaohao/p/11418113.html
Copyright © 2011-2022 走看看