zoukankan      html  css  js  c++  java
  • PostgreSQL基于时间点故障恢复PITR( point-in-time recovery )

    PostgreSQL在使用过程中经常会发生一些失误的操作,但往往是可以弥补的。但是如果真遇到了无法挽回的误操作,只能寄希望于有备份了。

    接下来的故障恢复也是基于有备份的情况,没有备份的情况,目前还没有想到怎么做

    1.首先在数据库中配置日志归档

    1)创建归档目录

    mkdir -p /var/lib/pgsql/pg10/archive/
    

    2)修改postgresql.conf文件

    wal_level=replica
    archive_mode = on  
    archive_command='test ! -f /var/lib/pgsql/9.10/archive/%f && cp %p /var/lib/pgsql/9.10/archive/%f'
    

     3)重启数据库

    pg_ctl restart
    

    2.对数据库进行全量备份,这里只是为了测试,就简单的对目录进行拷贝即可

    cp -r $PGDATA ~/pg10/full_back
    

    3.对数据库进行操作 并记录对应的日志号

    select txid_current();
     txid_current
    --------------
              557
    (1 row)
    
                  now
    -------------------------------
     2018-09-03 18:07:14.288787+08
    (1 row)
    
    delete first 100 tuple of run_command
    DELETE 99
     count
    -------
     99901
    (1 row)
    
    select txid_current();
     txid_current
    --------------
              559
    (1 row)
    
                  now
    -------------------------------
     2018-09-03 18:07:14.500745+08
    (1 row)
    
    delete last 100 tuple of run_command
    DELETE 100
     count
    -------
     99801
    (1 row)
    
    select txid_current();
     txid_current
    --------------
              561
    (1 row)
    
                  now
    -------------------------------
     2018-09-03 18:07:14.571154+08
    (1 row)
    checkpoint
    CHECKPOINT
    pg_switch_wal
     pg_switch_wal
    ---------------
     0/3005FA0
    (1 row)
    
    checkpoint
    CHECKPOINT
    pg_switch_wal
     pg_switch_wal
    ---------------
     0/40000E8
    (1 row)

    4.设置recovery.conf文件

    restore_command = 'cp /var/lib/pgsql/pg10/archive/%f %p' 
    recovery_target_xid = '557' 
    recovery_target_inclusive = false    
    recovery_target_timeline = 'latest'
    

    5.以为恢复成功了,结果发现系统只读,不能写,paused! 后续补充····,日志:

    -bash-4.1$ cat log/postgresql-2018-09-03_181007.log
    2018-09-03 18:10:07.160 CST [850] LOG:  database system was interrupted; last known up at 2018-09-03 18:07:12 CST
    2018-09-03 18:10:07.160 CST [850] LOG:  creating missing WAL directory "pg_wal/archive_status"
    cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory
    2018-09-03 18:10:07.431 CST [850] LOG:  starting point-in-time recovery to 2018-09-03 18:07:14.500745+08
    2018-09-03 18:10:07.448 CST [850] LOG:  restored log file "000000010000000000000002" from archive
    2018-09-03 18:10:07.596 CST [850] LOG:  redo starts at 0/2000028
    2018-09-03 18:10:07.613 CST [850] LOG:  consistent recovery state reached at 0/2003C30
    2018-09-03 18:10:07.613 CST [848] LOG:  database system is ready to accept read only connections
    

    看其他人使用过程中遇到文件不存在时,会自动创建一个新的时间线,然后恢复完成,而他们都是用的10以前版本,可能因此造成的吧。 

    6.经过多次分析,在data目录的pg_wal中也没有发现 “00000002.history”文件,于是尝试重新回放日志,终于成功:

    postgres=# select pg_wal_replay_resume();
     pg_wal_replay_resume
    ----------------------
    
    (1 row)
    
    postgres=# select pg_wal_replay_resume();
    ERROR:  recovery is not in progress
    HINT:  Recovery control functions can only be executed during recovery
    
    postgres=# select count(*) from run_command ;
     count
    -------
     99901
    (1 row)
    
    postgres=# insert into run_command values (1, 'test new');
    INSERT 0 1
    postgres=# q
    

    执行pg_wal_replay_resume()的日志:

    -bash-4.1$ cat log/postgresql-2018-09-03_181007.log
    2018-09-03 18:10:07.160 CST [850] LOG:  database system was interrupted; last known up at 2018-09-03 18:07:12 CST
    2018-09-03 18:10:07.160 CST [850] LOG:  creating missing WAL directory "pg_wal/archive_status"
    cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory
    2018-09-03 18:10:07.431 CST [850] LOG:  starting point-in-time recovery to 2018-09-03 18:07:14.500745+08
    2018-09-03 18:10:07.448 CST [850] LOG:  restored log file "000000010000000000000002" from archive
    2018-09-03 18:10:07.596 CST [850] LOG:  redo starts at 0/2000028
    2018-09-03 18:10:07.613 CST [850] LOG:  consistent recovery state reached at 0/2003C30
    2018-09-03 18:10:07.613 CST [848] LOG:  database system is ready to accept read only connections
    2018-09-03 18:10:07.646 CST [850] LOG:  restored log file "000000010000000000000003" from archive
    2018-09-03 18:10:07.779 CST [866] LOG:  duration: 28.273 ms  statement: select count(*) from run_command
    2018-09-03 18:10:07.797 CST [868] ERROR:  cannot execute INSERT in a read-only transaction
    2018-09-03 18:10:07.797 CST [868] STATEMENT:  insert into run_command values(1, 'test new')
    2018-09-03 18:10:07.804 CST [850] LOG:  recovery stopping before commit of transaction 560, time 2018-09-03 18:07:14.52735+08
    2018-09-03 18:10:07.804 CST [850] LOG:  recovery has paused
    2018-09-03 18:10:07.804 CST [850] HINT:  Execute pg_wal_replay_resume() to continue.
    2018-09-03 18:10:21.263 CST [870] LOG:  duration: 0.697 ms  statement: select pg_wal_replay_resume();
    2018-09-03 18:10:21.818 CST [850] LOG:  redo done at 0/3005E90
    2018-09-03 18:10:21.818 CST [850] LOG:  last completed transaction was at log time 2018-09-03 18:07:14.496615+08
    cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory
    2018-09-03 18:10:21.886 CST [850] LOG:  selected new timeline ID: 2
    cp: cannot stat `/var/lib/pgsql/pg10/archive/00000001.history': No such file or directory
    2018-09-03 18:10:22.145 CST [850] LOG:  archive recovery complete
    2018-09-03 18:10:22.476 CST [848] LOG:  database system is ready to accept connections
    2018-09-03 18:10:22.775 CST [870] ERROR:  recovery is not in progress
    2018-09-03 18:10:22.775 CST [870] HINT:  Recovery control functions can only be executed during recovery.
    2018-09-03 18:10:22.775 CST [870] STATEMENT:  select pg_wal_replay_resume();
    

    7.然后又尝试了使用时间和恢复点来回放,都没问题。

    8.附上recovery.conf文件的配置:

    在恢复过程中,用户可以通过使用recovery.conf文件来指定恢复的各个参数,如下:
    
    归档恢复设置
    restore_command:用于获取一个已归档段的XLOG日志文件的命令
    archive_cleanup_command:清除不在需要的XLOG日志文件的命令
    recovery_end_command:归档恢复结束后执行的命令
    
    恢复目标设置(默认情况下,数据库将会一直恢复到 WAL 日志的末尾)
    recovery_target = ’immediate’:在从一个在线备 份中恢复时,这意味着备份结束的那个点
    recovery_target_name (string):这个参数指定(pg_create_restore_point()所创建)的已命名的恢复点,将恢复到该恢复点
    recovery_target_time (timestamp):这个参数指定恢复到的时间戳
    recovery_target_xid (string):这个参数指定恢复到的事务 ID
    recovery_target_inclusive (boolean):指定是否在指定的恢复目标之后停止(true),或者在恢复目标之前停止 (false);适用于recovery_target_time或者recovery_target_xid被指定的情况;这个设置分别控制事务是否有准确的目标提交时间或 ID 是否将被包括在该恢复中;默认值为 true
    recovery_target_timeline (string):指定恢复到一个特定的时间线
    recovery_target_action (enum):指定在达到恢复目标时服务器应该立刻采取的动作,包括pause(暂停)、promote(接受连接)、shutdown(停止服务器),其中pause为默认动作
    
    备库参数设置
    standby_mode(boolean):为on表示作为一个备库,否则不为备库
    primary_conninfo (string):指定备库连接主库的连接字符串
    primary_slot_name (string):通过流复制指定主库的一个复制槽来复制主库数据,如果没有设置primary_conninfo,则此参数无效
    trigger_file (string):指定一个触发器文件,该文件存在可以结束备库的恢复,即升级备库为一个独立的主库
    recovery_min_apply_delay (integer):这个参数允许将恢复延迟一段固定的时间,如果没有指定单位则以毫秒为单位。
    如果recovery.conf中同时指定了recoveryTargetXid、recoveryTargetName、recoveryTargetTime时,PostgreSQL会按照RECOVERY_TARGET_XID> RECOVERY_TARGET_NAME > RECOVERY_TARGET_TIME的优先级来获取最终的目标恢复位点。
    
    如果在recovery.conf指定recovery_targetTimeLine为latest,则可以基于当前TimeLineID为起点寻找最新时间线:
    
    寻找当前TimeLineID的时间线历史文件“XXX.history”,如果存在则继续寻找,否则错误退出
    TimeLineID是线性增长的,将当前TimeLineID自增1寻找是否存在时间线历史文件,直到不存在对应的时间线历史文件为止,即可找到最新的时间线。
    

    后续准备找找如何在没有备份的情况下,恢复删除数据。。。。。。

  • 相关阅读:
    tomcat 远程部署项目
    Maven的使用
    FTP服务器的使用
    2017.10-2018.10工作思考
    Java面试题准备(二)
    移植数据库的心得
    重新安装开发环境之安装使用plsql碰到的两个问题
    Redis学习笔记(一)
    开发之没沟通误删客户数据库数据之找数据路
    Java面试准备(一)
  • 原文地址:https://www.cnblogs.com/kuang17/p/9566657.html
Copyright © 2011-2022 走看看