之前有两位朋友碰到过在对greenplum的系统构架更改后,出现全量恢复gprecoverseg -F也无法正常执行的情况。
报错信息为Unable to connect to database. Retrying 1
gprecoverseg failed. (Reason='Unable to connect to database and start transaction') exiting...
有幸拷得一份虚拟机上的所有文件。对其进行分析。
下面为大致分析的过程,感兴趣的能够看一下。
--查看content=0的primary此次启动的日志记录。与以下content=1的primary此次启动的日志记录对照,发现还是CHANGETRACKING有问题。并且gprecoverseg -F运行后没有产生日志记录,说明对应文件仅仅在数据库启动时生效。
--查看/data/primary/gpseg0/pg_changetracking下的文件,就这么三个文件。而造成上面gprecoverseg无法运行的正是当中的CT_METADATA。
[gpadmin@gpslave-1 pg_changetracking]$ ll
total 102900
-rw-------. 1 gpadmin gpadmin 128 Jul 27 22:49 CT_METADATA --二进制文件,就是这个出问题了的,关于这个文件的信息,我也没有查到。
-rw-------. 1 gpadmin gpadmin 322636 Jul 27 22:46 FILEREP_CONFIG_LOG --记录配置变化的文件。
-rw-------. 1 gpadmin gpadmin 105028298 Jul 27 22:46 FILEREP_LOG --记录数据变化的文件。
--对照content=0的primary和content=1的primary,这两个文件还是不同。但详细信息我也不是非常清楚。
[gpadmin@gpslave-1 pg_changetracking]$ od -tx1 -tc -Ax CT_METADATA
000000 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
报错信息为Unable to connect to database. Retrying 1
gprecoverseg failed. (Reason='Unable to connect to database and start transaction') exiting...
有幸拷得一份虚拟机上的所有文件。对其进行分析。
发现事实上出现这个问题仅仅须要改动pg_changetracking下的CT_METADATA。或者说从其它正常的主事例上拷贝一份到出问题的主事例上就可以。
下面为大致分析的过程,感兴趣的能够看一下。
--启动数据库。有一个mirror出错。
[gpadmin@gpmaster ~]$ gpstart -a 20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args: -a 20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment... 20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.4.1 build 2' 20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20150727:22:28:29:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance in admin mode 20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Setting new master era 20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Master Started... 20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Shutting down master 20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-2 directory /data/mirror/gpseg0 <<<<< 20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ............................................... 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Process results... 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:----------------------------------------------------- 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Successful segment starts = 3 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Failed segment starts = 0 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 1 <<<<<<<< 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:----------------------------------------------------- 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Successfully started 3 of 3 segment instances, skipped 1 other segments 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:----------------------------------------------------- 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-**************************************************************************** 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-There are 1 segment(s) marked down in the database 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-**************************************************************************** 20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance gpmaster directory /data/master/gpseg-1 20150727:22:29:24:001922 gpstart:gpmaster:gpadmin-[INFO]:-Command pg_ctl reports Master gpmaster instance active 20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[INFO]:-No standby master configured. skipping... 20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Number of segments not attempted to start: 1 20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[INFO]:-Check status of database with gpstate utility
[gpadmin@gpmaster ~]$ gprecoverseg -F 20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -F 20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.4.1 build 2' 20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10' 20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready 20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 1 20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready 20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 2 20150727:22:38:14:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready 20150727:22:38:14:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:38:15:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 3 20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready 20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 4 20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready 20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 5 20150727:22:38:30:002478 gprecoverseg:gpmaster:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason='Unable to connect to database and start transaction') exiting...
template1=# select * from gp_segment_configuration ; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+-----------+-----------+------------------+------------ 1 | -1 | p | p | s | u | 5432 | gpmaster | gpmaster | | 2 | 0 | p | p | c | u | 40000 | gpslave-1 | gpslave-1 | 41000 | 4 | 0 | m | m | s | d | 50000 | gpslave-2 | gpslave-2 | 51000 | 3 | 1 | p | p | s | u | 40000 | gpslave-2 | gpslave-2 | 41000 | 5 | 1 | m | m | s | u | 50000 | gpslave-1 | gpslave-1 | 51000 | (5 rows)
--又一次启动数据库。
[gpadmin@gpmaster ~]$ gpstart -a 20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args: -a 20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment... 20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.4.1 build 2' 20150727:22:49:42:002958 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20150727:22:49:42:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance in admin mode 20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Setting new master era 20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Master Started... 20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Shutting down master 20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-2 directory /data/mirror/gpseg0 <<<<< 20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-1 directory /data/mirror/gpseg1 <<<<< 20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait..... 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Process results... 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:----------------------------------------------------- 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:- Successful segment starts = 2 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:- Failed segment starts = 0 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 2 <<<<<<<< 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:----------------------------------------------------- 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:- 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances, skipped 2 other segments 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:----------------------------------------------------- 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-**************************************************************************** 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-There are 2 segment(s) marked down in the database 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-**************************************************************************** 20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance gpmaster directory /data/master/gpseg-1 20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-Command pg_ctl reports Master gpmaster instance active 20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-No standby master configured. skipping... 20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Number of segments not attempted to start: 2 20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-Check status of database with gpstate utility
template1=# select * from gp_segment_configuration ; dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts ------+---------+------+----------------+------+--------+-------+-----------+-----------+------------------+------------ 1 | -1 | p | p | s | u | 5432 | gpmaster | gpmaster | | 2 | 0 | p | p | c | u | 40000 | gpslave-1 | gpslave-1 | 41000 | 4 | 0 | m | m | s | d | 50000 | gpslave-2 | gpslave-2 | 51000 | 3 | 1 | p | p | c | u | 40000 | gpslave-2 | gpslave-2 | 41000 | 5 | 1 | m | m | s | d | 50000 | gpslave-1 | gpslave-1 | 51000 | (5 rows)
[gpadmin@gpmaster ~]$ gprecoverseg -F -v 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -F -v 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set. 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Obtaining master's port from master data directory 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Read from postgresql.conf port=5432 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Read from postgresql.conf max_connections=200 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set. 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.4.1 build 2' 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Connecting to dbname=template1 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10' 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master... 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Connecting to dbname=template1 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Validating configuration... 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] got cmd: $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] got cmd: $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' gpslave-1 ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000"' had result: cmd had rc=1 completed=True halted=False stdout='' stderr='mode: PrimarySegment segmentState: ChangeTrackingDisabled --content=0的primary的状态是ChangeTrackingDisabled dataState: InChangeTracking faultType: NotInitialized mode: PrimarySegment segmentState: ChangeTrackingDisabled dataState: InChangeTracking faultType: NotInitialized ' 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] finished cmd: Get segment status cmdStr='ssh -o 'StrictHostKeyChecking no' gpslave-2 ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000"' had result: cmd had rc=1 completed=True halted=False stdout='' stderr='mode: PrimarySegment segmentState: Ready --content=1的primary的状态是Ready dataState: InChangeTracking faultType: NotInitialized mode: PrimarySegment segmentState: Ready dataState: InChangeTracking faultType: NotInitialized ' 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Encountered error Not ready to connect to database mode: PrimarySegment segmentState: ChangeTrackingDisabled --由此可见,之前无法运行全量恢复是因为content=0的primary有问题,与mirror无关 dataState: InChangeTracking faultType: NotInitialized mode: PrimarySegment segmentState: ChangeTrackingDisabled dataState: InChangeTracking faultType: NotInitialized 20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 1 ^C20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-WorkerPool haltWork() 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker2] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker3] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker4] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker5] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker6] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker7] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker8] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker9] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker10] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker11] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker12] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker13] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker14] haltWork 20150727:22:51:34:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker15] haltWork 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker5] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker3] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker6] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker4] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker2] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker7] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker9] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker10] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker12] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker8] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker11] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker14] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker13] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker15] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] stopping 20150727:22:51:37:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] stopping User Interrupted
--查看content=0的primary此次启动的日志记录。与以下content=1的primary此次启动的日志记录对照,发现还是CHANGETRACKING有问题。并且gprecoverseg -F运行后没有产生日志记录,说明对应文件仅仅在数据库启动时生效。
[gpadmin@gpslave-1 pg_log]$ cat gpdb-2015-07-27_224945.csv 2015-07-27 22:49:46.228113 CST,"gpadmin","postgres",p2367,th2061358880,"[local]",,2015-07-27 22:49:46 CST,0,,,seg-1,,,,,"FATAL","57M01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994, 2015-07-27 22:49:46.248685 CST,,,p2370,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","mirror transition, primary address(port) 'gpslave-1(41000)' mirror address(port) 'gpslave-2(51000)'",,,,,"mirroring role 'primary role' mirroring state 'change tracking' segment state 'not initialized' process name(pid) 'filerep main process(2370)' filerep state 'not initialized' ",,0,,"cdbfilerep.c",3472, 2015-07-27 22:49:46.252539 CST,,,p2371,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToInsync() found insync_transition_completed:'false' full resync:'true'",,,,,,,0,,"cdbresynchronizechangetracking.c",2545, 2015-07-27 22:49:46.252589 CST,,,p2371,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToResync() found resync_transition_completed:'false' full resync:'true'",,,,,,,0,,"cdbresynchronizechangetracking.c",2582, 2015-07-27 22:49:46.254921 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2015-07-27 22:46:49 CST",,,,,,,0,,"xlog.c",6372, 2015-07-27 22:49:46.255156 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","checkpoint record is at 0/8C75AD8",,,,,,,0,,"xlog.c",6560, 2015-07-27 22:49:46.255194 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","redo record is at 0/8C75AD8; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6617, 2015-07-27 22:49:46.255219 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","next transaction ID: 0/3051; next OID: 43252",,,,,,,0,,"xlog.c",6621, 2015-07-27 22:49:46.255241 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6624, 2015-07-27 22:49:46.255280 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/8C75B70",,,,,,,0,,"xlog.c",6988, 2015-07-27 22:49:46.255863 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Oldest active transaction from prepared transactions 3051",,,,,,,0,,"xlog.c",6033, 2015-07-27 22:49:46.292261 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",6084, 2015-07-27 22:49:46.292308 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:10:54",,,,,,,0,,"xlog.c",6095, 2015-07-27 22:49:46.294384 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Finished normal startup for clean shutdown case",,,,,,,0,,"xlog.c",7202, 2015-07-27 22:49:46.300763 CST,,,p2373,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Finished BASIC startup integrity checking",,,,,,,0,,"xlog.c",7647, 2015-07-27 22:49:46.303294 CST,,,p2365,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,,0,,"postmaster.c",4640, 2015-07-27 22:49:46.303322 CST,,,p2365,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,0,,"postmaster.c",4647, 2015-07-27 22:49:47.847981 CST,"gpadmin","template1",p2381,th2061358880,"192.168.22.110","5303",2015-07-27 22:49:47 CST,3054,con5,,seg0,,,x3054,sx1,"LOG","00000","gp_max_distributed_xid returning 1",,,,,,"select gp_max_distributed_xid()",0,,"cdbmaxdistributedxid.c",29, [gpadmin@gpslave-2 pg_log]$ cat gpdb-2015-07-27_224944.csv 2015-07-27 22:49:45.707976 CST,"gpadmin","postgres",p2302,th-240429280,"[local]",,2015-07-27 22:49:45 CST,0,,,seg-1,,,,,"FATAL","57M01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994, 2015-07-27 22:49:45.726077 CST,,,p2305,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","mirror transition, primary address(port) 'gpslave-2(41000)' mirror address(port) 'gpslave-1(51000)'",,,,,"mirroring role 'primary role' mirroring state 'change tracking' segment state 'not initialized' process name(pid) 'filerep main process(2305)' filerep state 'not initialized' ",,0,,"cdbfilerep.c",3472, 2015-07-27 22:49:45.735230 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","last checkpoint location for generating initial changetracking log 0/CC12DD8",,,,,,,0,,"xlog.c",11560, 2015-07-27 22:49:45.735376 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"WARNING","XX000","unable to read change tracking 'full log' file : Success (cdbresynchronizechangetracking.c:3382)",,,,,,,0,,"cdbresynchronizechangetracking.c",3382, 2015-07-27 22:49:45.735423 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","last changetracked location in changetracking full log 0/0",,,,,,,0,,"xlog.c",11572, 2015-07-27 22:49:45.735475 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","record with zero length at 0/CC12E70",,,,,,,0,,"xlog.c",4155, 2015-07-27 22:49:45.735505 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","scanned through 1 initial xlog records since last checkpoint for writing into the resynchronize change log",,,,,,,0,,"cdbresynchronizechangetracking.c",206, 2015-07-27 22:49:45.736987 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2015-07-27 22:46:49 CST",,,,,,,0,,"xlog.c",6372, 2015-07-27 22:49:45.737260 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","checkpoint record is at 0/CC12DD8",,,,,,,0,,"xlog.c",6560, 2015-07-27 22:49:45.737298 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","redo record is at 0/CC12DD8; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6617, 2015-07-27 22:49:45.737322 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","next transaction ID: 0/3006; next OID: 43252",,,,,,,0,,"xlog.c",6621, 2015-07-27 22:49:45.737343 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6624, 2015-07-27 22:49:45.737382 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/CC12E70",,,,,,,0,,"xlog.c",6988, 2015-07-27 22:49:45.738000 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Oldest active transaction from prepared transactions 3006",,,,,,,0,,"xlog.c",6033, 2015-07-27 22:49:45.743458 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",6084, 2015-07-27 22:49:45.743497 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:10:54",,,,,,,0,,"xlog.c",6095, 2015-07-27 22:49:45.745424 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Finished normal startup for clean shutdown case",,,,,,,0,,"xlog.c",7202, 2015-07-27 22:49:45.751648 CST,,,p2308,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Finished BASIC startup integrity checking",,,,,,,0,,"xlog.c",7647, 2015-07-27 22:49:45.754195 CST,,,p2300,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,,0,,"postmaster.c",4640, 2015-07-27 22:49:45.754222 CST,,,p2300,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,0,,"postmaster.c",4647, 2015-07-27 22:49:47.296961 CST,"gpadmin","template1",p2316,th-240429280,"192.168.22.110","39217",2015-07-27 22:49:47 CST,3009,con5,,seg1,,,x3009,sx1,"LOG","00000","gp_max_distributed_xid returning 1",,,,,,"select gp_max_distributed_xid()",0,,"cdbmaxdistributedxid.c",29,
--查看/data/primary/gpseg0/pg_changetracking下的文件,就这么三个文件。而造成上面gprecoverseg无法运行的正是当中的CT_METADATA。
[gpadmin@gpslave-1 pg_changetracking]$ ll
total 102900
-rw-------. 1 gpadmin gpadmin 128 Jul 27 22:49 CT_METADATA --二进制文件,就是这个出问题了的,关于这个文件的信息,我也没有查到。
-rw-------. 1 gpadmin gpadmin 322636 Jul 27 22:46 FILEREP_CONFIG_LOG --记录配置变化的文件。
-rw-------. 1 gpadmin gpadmin 105028298 Jul 27 22:46 FILEREP_LOG --记录数据变化的文件。
--对照content=0的primary和content=1的primary,这两个文件还是不同。但详细信息我也不是非常清楚。
[gpadmin@gpslave-1 pg_changetracking]$ od -tx1 -tc -Ax CT_METADATA
000000 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00