zoukankan      html  css  js  c++  java
  • 人为删除控制文件故障模拟

    对于linux和unix环境,当前数据处于run的时候,某个controlfile人为删除是不影响数据库运行的,如下:

    #### 删除controlfile

    $ rm control01.ctl
    

    删除后,alert日志并没有报错,数据库正常运行

    在数据库执行以下操作:

    SQL> alter system checkpoint;
    SQL> alter system switch logfile;
    SQL> alter system switch logfile;
    SQL> alter system switch logfile;
    

    alert日志对应的内容,数据库仍然能正常运行:

    Sat May 14 07:52:39 2016
    Thread 1 advanced to log sequence 64 (LGWR switch)
      Current log# 1 seq# 64 mem# 0: /u01/app/oracle/oradata/db11/redo01.log
    Thread 1 advanced to log sequence 65 (LGWR switch)
      Current log# 2 seq# 65 mem# 0: /u01/app/oracle/oradata/db11/redo02.log
    Thread 1 advanced to log sequence 66 (LGWR switch)
      Current log# 3 seq# 66 mem# 0: /u01/app/oracle/oradata/db11/redo03.log
    

    因为其进程持有的句柄并有释放,如下:

    $ ps -ef|grep ckpt|grep -v grep
    ora11     4616     1  0 07:51 ?        00:00:00 ora_ckpt_db11
    $ cd /proc/4616/fd
    $ ls -ltr |grep control
    lrwx------ 1 ora11 oinstall 64 May 14 07:55 257 -> /u01/app/oracle/oradata/db11/control02.ctl
    lrwx------ 1 ora11 oinstall 64 May 14 07:55 256 -> /u01/app/oracle/oradata/db11/control01.ctl (deleted)
    

    #### session 1 trace跟踪

    $ strace -fr -o /tmp/4616.log -p 4616
    Process 4616 attached - interrupt to quit
    进程会一直hang在这个状态
    

    #### session 2 进行redo切换

    SQL> alter system switch logfile;
    SQL> alter system switch logfile;
    
    日志切换正常完成
    Sat May 14 07:58:33 2016
    Thread 1 advanced to log sequence 67 (LGWR switch)
      Current log# 1 seq# 67 mem# 0: /u01/app/oracle/oradata/db11/redo01.log
    Thread 1 advanced to log sequence 68 (LGWR switch)
      Current log# 2 seq# 68 mem# 0: /u01/app/oracle/oradata/db11/redo02.log
    

    #### 终止session 1 trace跟踪(crtl+c)

    $ strace -fr -o /tmp/4616.log -p 4616
    Process 4616 attached - interrupt to quit
    
    Process 4616 detached
    

    #### 下面观察session 1产生的日志/tmp/4616.log

    ...
    4616       0.000036 gettimeofday({1463183881, 895560}, NULL) = 0
    4616       0.000035 pwrite(256, "25302314214C232"..., 16384, 49152) = 16384
    4616       0.040894 gettimeofday({1463183881, 936492}, NULL) = 0
    4616       0.000044 gettimeofday({1463183881, 936533}, NULL) = 0
    4616       0.000079 pwrite(257, "25302314214C232"..., 16384, 49152) = 16384
    4616       0.003029 gettimeofday({1463183881, 939643}, NULL) = 0
    4616       0.000042 gettimeofday({1463183881, 939697}, NULL) = 0
    4616       0.000057 gettimeofday({1463183881, 939740}, NULL) = 0
    4616       0.000071 gettimeofday({1463183881, 939815}, NULL) = 0
    4616       0.000076 gettimeofday({1463183881, 939888}, NULL) = 0
    4616       0.000035 gettimeofday({1463183881, 939922}, NULL) = 0
    4616       0.000038 pread(256, "253021142123434 v~227300U"..., 16384, 16384) = 16384
    ...
    

    其中:

    4616是对应的进程号

    第二列是时间,如0.000036

    在看下面这行:

    pread(256, "253021142123434 v~227300U"..., 16384, 16384) = 16384

    256表示文件描述符

    $ ls -ltr |grep control
    lrwx------ 1 ora11 oinstall 64 May 14 07:55 257 -> /u01/app/oracle/oradata/db11/control02.ctl
    lrwx------ 1 ora11 oinstall 64 May 14 07:55 256 -> /u01/app/oracle/oradata/db11/control01.ctl (deleted)
    

    第一个16384表示块大小 第二个16384表示偏移量 第三个16384表示写入数据的大小

    通过上面的进程跟踪,我们可以得到什么:

    1. 进程信息可以在/proc下看到,例如: /proc/4616/stat

    2. 对于linux,对于文件的读写,是通过调用函数read,pwrite64 来实现的。

    3. 对于pwrite64的操作,是通过写fd (256,257)2个文件来完成的

  • 相关阅读:
    与开发团队高效协作的8个小技巧
    9本java程序员必读的书(附下载地址)
    NPOI导出饼图到Excel
    EF6不支持sqlite Code First解决方案
    C#程序访问底层网络
    如何自己开发软件测试工具?
    .Net mvc 根据前台参数动态绑定对象
    在SSM框架里新增一个功能
    2018-10-12 例会总结
    2018-10-11 java从入门到放弃--方法
  • 原文地址:https://www.cnblogs.com/abclife/p/5492798.html
Copyright © 2011-2022 走看看