昨晚给某个gpdb集群做元数据检查,执行gpcheckcat 之后发现 persistent 测试有问题,日志提示如下:
SUMMARY REPORT =================================================================== Total runtime for 15 test(s): 0:00:11.36 Failed test(s) that are not reported here: persistent See /home/gpadmin/gpAdminLogs/gpcheckcat_20171122.log for detail
进一步查看 gpcheckcat_20171122.log 日志文件,查看到报错信息如下:
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[INFO]:-[FAIL] gp_persistent_relation_node <=> filesystem 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-gp_persistent_relation_node <=> filesystem found 4 issue(s) 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:- SELECT coalesce(a.tablespace_oid, b.tablespace_oid) as tablespace_oid, coalesce(a.database_oid, b.database_oid) as database_oid, coalesce(a.relfilenode_oid, b.relfilenode_oid) as relfilenode_oid, coalesce(a.segment_file_num, b.segment_file_num) as segment_file_num, a.relfilenode_oid is null as filesystem, b.relfilenode_oid is null as persistent, b.relkind, b.relstorage FROM gp_persistent_relation_node a FULL OUTER JOIN ( SELECT p.*, c.relkind, c.relstorage FROM gp_persistent_relation_node_check() p LEFT OUTER JOIN pg_class c ON (p.relfilenode_oid = c.relfilenode) WHERE (p.segment_file_num = 0 or c.relstorage != 'h') ) b ON (a.tablespace_oid = b.tablespace_oid and a.database_oid = b.database_oid and a.relfilenode_oid = b.relfilenode_oid and a.segment_file_num = b.segment_file_num) WHERE (a.relfilenode_oid is null OR (a.persistent_state = 2 and b.relfilenode_oid is null)) and coalesce(a.database_oid, b.database_oid) in ( SELECT oid FROM pg_database WHERE datname = current_database() UNION ALL SELECT 0 ); 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:--------- 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-gpmdw:40000:/data/pri/gpseg0 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:- tablespace_oid | database_oid | relfilenode_oid | segment_file_num | filesystem | persistent | relkind | relstorage 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:- 1663 | 17146 | 9991234 | 0 | t | f | None | None 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:- 1663 | 17146 | 9998763 | 0 | t | f | None | None 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:- 1663 | 17146 | 9998764 | 0 | t | f | None | None 20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:- 1663 | 17146 | 9998765 | 0 | t | f | None | None
发现是文件系统中的文件和-gp_persistent_relation_node 中记录的不一致,进一步检查发现是文件系统中多了一些残留文件,是之前这个节点发生过实例宕机,事务回滚时,有些磁盘文件没有清理;
处理方法:
定位到/data/pri/gpseg0/base/17146 目录下,找到这些多余文件9991234,9998763,9998764,9998765,将其移到备份目录;
然后重新执行元数据检查,检查结果正常。
[gpadmin@gpmdw gpcheckcat_log]$ cat gpcheckcat2.log Connected as user 'gpadmin' to database 'testdb1', port '5432', gpdb version '4.3' ------------------------------------------------------------------- Performing test 'unique_index_violation' Total runtime for test 'unique_index_violation': 0:00:01.13 Performing test 'duplicate' Total runtime for test 'duplicate': 0:00:01.67 Performing test 'missing_extraneous' Total runtime for test 'missing_extraneous': 0:00:03.33 Performing test 'inconsistent' Total runtime for test 'inconsistent': 0:00:02.65 Performing test 'foreign_key' Total runtime for test 'foreign_key': 0:00:01.47 Performing test 'acl' Total runtime for test 'acl': 0:00:00.05 Performing test 'persistent' Total runtime for test 'persistent': 0:00:00.18 Performing test 'pgclass' Total runtime for test 'pgclass': 0:00:00.02 Performing test 'namespace' Total runtime for test 'namespace': 0:00:00.02 Performing test 'distribution_policy' Total runtime for test 'distribution_policy': 0:00:00.00 Performing test 'dependency' Total runtime for test 'dependency': 0:00:00.58 Performing test 'owner' Total runtime for test 'owner': 0:00:00.06 Performing test 'part_integrity' Total runtime for test 'part_integrity': 0:00:00.04 Performing test 'part_constraint' Total runtime for test 'part_constraint': 0:00:00.08 Performing test 'duplicate_persistent' Total runtime for test 'duplicate_persistent': 0:00:00.05 SUMMARY REPORT =================================================================== Total runtime for 15 test(s): 0:00:11.38 Found no catalog issue