zoukankan      html  css  js  c++  java
  • postgresql 9.x 的base backup + restore + pitr(point in time recovery)

    os: centos6.8
    postgresql:9.1.22

    base backup

    最初使用 tar 花费4个小时,之后改为 pigz 只花费了30分钟。

    # vi /backup_scripts/pg_basebackup.sh
    #!/bin/bash
    
    DATE=`date +%Y%m%d`
    
    #删除14天前的物理备份
    echo "#############################################"
    echo "delete expire base backup start...`date` 
    "
    
    find /backup/* -mtime +14 -not -wholename "*lost+found*"|xargs  -I  '{}'  rm  {}
    
    echo "delete expire base backup end...`date` 
    "
    
    #备份PG数据库
    echo "#############################################"
    echo "base backup start...`date` 
    "
    
    su - postgres -c "/usr/bin/psql -c "select pg_start_backup('$DATE');""
    
    #tar is a single thread command,too slow
    #tar --exclude=/var/lib/postgresql/9.1/main/postmaster.pid --exclude=/var/lib/postgresql/9.1/main/pg_xlog -Pzcf - /var/lib/postgresql/9.1/main/ /pg_tbs/ | split --bytes=10G - /backup/${DATE}_base.tar.gz.
    
    tar --exclude=/var/lib/postgresql/9.1/main/postmaster.pid --exclude=/var/lib/postgresql/9.1/main/pg_xlog -Pcf - /var/lib/postgresql/9.1/main/ /pg_tbs/ | pigz -p 16  | split --bytes=10G - /backup/${DATE}_base.tgz.
    
    su - postgres -c "/usr/bin/psql -c 'select pg_stop_backup();'"
    
    echo "base backup end...`date` 
    "
    
    #gpg加密
    echo "#############################################"
    echo "gpg start...`date` 
    "
    
    #并发加密文件,缩短执行时间
    for FILE in `ls -1 /backup/${DATE}_base.tgz.*`
    do
      (gpg -er zlfund -o ${FILE}.gpg ${FILE} )&
    done
    wait
    
    echo "gpg end...`date` 
    "
    
    #rsync异机备份
    echo "#############################################"
    echo "rsync start...`date` 
    "
    
    #同步文件
    for GPGFILE in `ls -1 /backup/${DATE}_base.tgz.*.gpg`
    do
      #拷贝到nas
      #rsync -auzP --bwlimit=30.0M ${FILE}.gpg /mnt/nas_dbbackup/basebackup/
      rsync -auP --bwlimit=60.0M ${GPGFILE} /mnt/nas_dbbackup/basebackup/
    done
    
    echo "rsync end...`date` 
    "
    
    #rm删除gpg加密文件
    echo "#############################################"
    echo "delete gpg start...`date` 
    "
    
    #删除加密文件
    rm /backup/${DATE}_base.tgz*.gpg
    
    echo "delete gpg end..`date` 
    "

    restore

    restore database

    # cd /u02/backup_gpg/
    # cp /mnt/nas_dbbackup/basebackup/20180520* /u02/backup_gpg/
    # chmod 777 ./*

    开8个并发窗口同时解密

    gpg -o /u02/backup/20180520_base.tgz.aa -d /u02/backup_gpg/20180520_base.tgz.aa.gpg
    gpg -o /u02/backup/20180520_base.tgz.ab -d /u02/backup_gpg/20180520_base.tgz.ab.gpg
    gpg -o /u02/backup/20180520_base.tgz.ac -d /u02/backup_gpg/20180520_base.tgz.ac.gpg
    gpg -o /u02/backup/20180520_base.tgz.ad -d /u02/backup_gpg/20180520_base.tgz.ad.gpg
    gpg -o /u02/backup/20180520_base.tgz.ae -d /u02/backup_gpg/20180520_base.tgz.ae.gpg
    gpg -o /u02/backup/20180520_base.tgz.af -d /u02/backup_gpg/20180520_base.tgz.af.gpg
    gpg -o /u02/backup/20180520_base.tgz.ag -d /u02/backup_gpg/20180520_base.tgz.ag.gpg
    gpg -o /u02/backup/20180520_base.tgz.ah -d /u02/backup_gpg/20180520_base.tgz.ah.gpg
    gpg -o /u02/backup/20180520_base.tgz.ai -d /u02/backup_gpg/20180520_base.tgz.ai.gpg

    使用 gpg 解密时需要输入解密密码

    开始解压

    cat /u02/backup/20180520_base.tgz.a* | tar -Pxv -I pigz

    结束后就会看到在如下两个路径被填充了文件

    /var/lib/postgresql/9.1/main
    /pg_tbs

    recovery database

    # su - postgres
    $ cd /var/lib/postgresql/9.1/main
    $ mkdir pg_xlog
    $ chmod 700 ./archive_status
    $ cd pg_xlog
    $ mkdir archive_status
    $ chmod 700 ./archive_status

    配置 recovery.conf,基于wal的 pitr 恢复

    $ cd $PGDATA
    $ vi recovery.conf
    restore_command = 'cp /mnt/nas_dbbackup/archivelog/%f %p'
    #archive_cleanup_command = ''
    #recovery_end_command = ''
    recovery_target_time = '2018-05-24 20:00:00+08'
    #recovery_target_xid = ''
    recovery_target_inclusive = 'true'
    recovery_target_timeline = 'latest'
    pause_at_recovery_target = 'true'
    standby_mode = on
    trigger_file = '/tmp/postgresql.trigger.5432'

    编辑启动,关闭脚本

    $ vi /u01/pgsql/pgsql9.1_start.sh
    pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"
    $ vi /u01/pgsql/pgsql9.1_stop.sh
    pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main

    启动pgsql

    $ /u01/pgsql/pgsql9.1_start.sh

    观察日志输出,会有类似下面的持续输出

    2018-05-24 11:10:09.259 CST,,,5283,,59f2a370.14a3,28,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000053"" from archive",,,,,,,,,""
    2018-05-24 11:10:10.792 CST,,,5283,,59f2a370.14a3,29,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000054"" from archive",,,,,,,,,""
    2018-05-24 11:10:12.201 CST,,,5283,,59f2a370.14a3,30,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000055"" from archive",,,,,,,,,""
    2018-05-24 11:10:13.592 CST,,,5283,,59f2a370.14a3,31,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000056"" from archive",,,,,,,,,""
    2018-05-24 11:10:15.144 CST,,,5283,,59f2a370.14a3,32,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000057"" from archive",,,,,,,,,""
    2018-05-24 11:10:16.074 CST,,,5283,,59f2a370.14a3,33,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000058"" from archive",,,,,,,,,""

    达到指定的 recovery_target_time 后,数据库库会暂停住,类似如下日志输出

    2018-05-24 11:11:40.809 CST,,,5283,,59f2a370.14a3,120,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD8000000AF"" from archive",,,,,,,,,""
    2018-05-24 11:11:40.961 CST,,,5283,,59f2a370.14a3,121,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"recovery stopping before commit of transaction 1348696203, time 2017-10-23 11:15:00.036417+08",,,,,,,,,""
    2018-05-24 11:11:40.961 CST,,,5283,,59f2a370.14a3,122,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,,""
    
    2018-05-24 11:11:40.961 CST,,,5544,,59f2b79c.15a8,1,,2018-05-24 11:09:36 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""

    这个时候是可以连接上去查询使用,如果需要读写,则需要做如下操作

    此时有两种选择
    1)touch /tmp/postgresql.trigger.5432

    2018-05-24 18:27:28.794 CST,,,6028,,5a8fec90.178c,1,,2018-05-24 18:27:28 CST,,0,LOG,00000,"database system was shut down in recovery at 2018-05-24 18:27:05 CST",,,,,,,,,""
    2018-05-24 18:27:28.801 CST,,,6028,,5a8fec90.178c,2,,2018-05-24 18:27:28 CST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000400000EC4000000AB"" (log file 3780, segment 171): No such file or directory",,,,,,,,,""
    2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,3,,2018-05-24 18:27:28 CST,,0,LOG,00000,"invalid primary checkpoint record",,,,,,,,,""
    2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,4,,2018-05-24 18:27:28 CST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000400000EC400000039"" (log file 3780, segment 57): No such file or directory",,,,,,,,,""
    2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,5,,2018-05-24 18:27:28 CST,,0,LOG,00000,"invalid secondary checkpoint record",,,,,,,,,""
    2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,6,,2018-05-24 18:27:28 CST,,0,PANIC,XX000,"could not locate a valid checkpoint record",,,,,,,,,""
    2018-05-24 18:27:54.801 CST,,,6026,,5a8fec8f.178a,1,,2018-05-24 18:27:27 CST,,0,LOG,00000,"startup process (PID 6028) was terminated by signal 6: Aborted",,,,,,,,,""
    2018-05-24 18:27:54.801 CST,,,6026,,5a8fec8f.178a,2,,2018-05-24 18:27:27 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""

    再启动时就会提示恢复需要若干个wal日志,可以把需要的wal日志拷贝到 $PGDATA/pg_xlog下。

    2)关闭pgsql,然后执行 pg_resetxlog

    $ /u01/pgsql/pgsql9.1_stop.sh
    
    $ which pg_resetxlog
    $ pg_resetxlog -f /var/lib/postgresql/9.1/main
    $ cd /var/lib/postgresql/9.1/main
    $ mv recovery.conf recovery.done
    
    $ /u01/pgsql/pgsql9.1_start.sh

    官方对pg_resetxlog 的描述还是很谨慎的。

    pg_resetxlog会清除预写式日志(WAL)并且有选择地重置存储在 pg_control文件中的一些其他控制信息。如果这些文件已经被损坏, 某些时候就需要这个功能。当服务器由于这样的损坏而无法启动时, 这只应该被用作最后的手段。

    在运行这个命令之后,就应该可以启动服务器, 但是记住数据库可能包含由于部分提交事务产生的不一致数据。 你应当立刻转储你的数据、运行initdb并且重新载入。重新载入后, 检查不一致并且根据需要修复之。

    这个工具只能被安装服务器的用户运行,因为它要求对数据目录的读写访问。 出于安全原因,你必须在命令行中指定数据目录。pg_resetxlog 不使用环境变量 PGDATA。

    如果 pg_resetxlog 抱怨它无法为 pg_control 决定合法数据,你可以通过指定-f(强制)选项强制它继续。 在这种情况下,丢失的数据将被替换为看似合理的值。可以期望大部分域是匹配的, 但是下一个 OID、下一个事务 ID 和纪元、下一个多事务 ID 和偏移以及 WAL 开始地址域可能还是需要人工协助。这些域可以使用下面讨论的选项设置。 如果你不能为所有这些域决定正确的值,-f还是可以被使用, 但是恢复的数据库还是值得怀疑:一次立即的转储和重新载入是势在必行的。 在你转储之前不要在该数据库中执行任何数据修改操作, 因为任何这样的动作都可能使破坏更严重。

    $ pg_resetxlog --help
    pg_resetxlog resets the PostgreSQL transaction log.
    
    Usage:
      pg_resetxlog [OPTION]... DATADIR
    
    Options:
      -e XIDEPOCH     set next transaction ID epoch
      -f              force update to be done
      -l TLI,FILE,SEG force minimum WAL starting location for new transaction log
      -m XID          set next multitransaction ID
      -n              no update, just show extracted control values (for testing)
      -o OID          set next OID
      -O OFFSET       set next multitransaction offset
      -x XID          set next transaction ID
      --help          show this help, then exit
      --version       output version information, then exit
    
    Report bugs to <pgsql-bugs@postgresql.org>.

    参考:
    http://postgres.cn/docs/9.6/app-pgresetxlog.html

  • 相关阅读:
    观察者模式
    hdu 4712 Hamming Distance bfs
    leetcode Sum Root to Leaf Numbers(所有路径之和)
    Oracle实用-01:绑定变量
    jQuery实现AJAX定时刷新局部页面实例
    给Ajax一个漂亮的嫁衣——Ajax系列之五(下)之序列化和反序列化
    jquery的ajax同步和异步
    报表技术
    告别.NET生成报表统计图的烦恼
    浅谈ASP.NET报表控件
  • 原文地址:https://www.cnblogs.com/ctypyb2002/p/9792979.html
Copyright © 2011-2022 走看看