2021-09-10 17:22:42.417183T @ startup 00000 [2021-09-10 17:22:42 CST] 0 [9298] LOCATION: StartupXLOG, xlog.c:6347 2021-09-10 17:22:42.417206T @ startup XX000 [2021-09-10 17:22:42 CST] 0 [9298] FATAL: XX000: required WAL directory "pg_wal" does not exist 2021-09-10 17:22:42.417206T @ startup XX000 [2021-09-10 17:22:42 CST] 0 [9298] LOCATION: ValidateXLOGDirectoryStructure, xlog.c:4262 2021-09-10 17:22:42.417407T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG: 00000: startup process (PID 9298) exited with exit code 1 2021-09-10 17:22:42.417407T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION: LogChildExit, postmaster.c:3714 2021-09-10 17:22:42.417417T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG: 00000: aborting startup due to startup process failure 2021-09-10 17:22:42.417417T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION: reaper, postmaster.c:2969 2021-09-10 17:22:42.427171T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOG: 00000: database system is shut down 2021-09-10 17:22:42.427171T @ postmaster 00000 [2021-09-10 17:22:42 CST] 0 [9296] LOCATION: UnlinkLockFiles, miscinit.c:928
执行pg_resetwal -f PGDATA可以重新初始化wal文件,但是会丢失事务日志以及数据不一致,因为可能有full checkpoint之前的数据丢失,极端情况下某些数据块丢失。此时初始化WAL文件如下:
[zjh@lightdb1 pgsql13.2]$ cd data/pg_wal/ [zjh@lightdb1 pg_wal]$ ll total 1048576 -rw------- 1 zjh zjh 1073741824 Sep 10 21:44 00000001000000BB00000001 drwx------ 2 zjh zjh 6 Sep 10 21:42 archive_status
再启动PG,备份、重建。
具体会丢失多少数据,可以通过pg_controldata输出中的latest checkpoint确认。
如果因为wal_size设置的比较大,希望删除历史wal的话,可以通过pg_archivecleanup清理latest checkpoint之前的wal日志,如下:
pg_archivecleanup /data1/zjh/coordinator/pg_wal/ 000000010000000900000023
清理000000010000000900000023之前的wal文件。
确实,比他小的没有了,但是问题在于之前的日志都还没删除。