问题背景:
客户反馈应用无法使用,重启数据库后正常,协助排查原因
1> 查看问题时段的alert日志
Thread <number> cannot allocate new log, sequence <number> Checkpoint not complete Thread 1 cannot allocate new log, sequence 279334 Checkpoint not complete Current log# 4 seq# 279333 mem# 0: /u01/oradata/orcl/redo04.log Current log# 4 seq# 279333 mem# 1: /u03/oradata/orcl/redo04.log
也有可能是因为在等待重做日志的归档,出现的是下面这类告警信息
ORACLE Instance <name> - Can not allocate log, archival required Thread <number> cannot allocate new log, sequence <number>
2> 原因分析:
通常来说是因为重做日志(redo log)在写满后就会切换日志组,这个时候就会触发一次检查点事件(checkpoint),
检查点(checkpoint)激活时会触发数据库写进程(DBWR),将数据缓冲区里的脏数据块写回到磁盘的数据文件中,
只要这个脏数据写回磁盘事件没结束,那么数据库就不会释放这个日志组。
在归档模式下,还会伴随着ARCH进程将重做日志进行归档的过程。如果重做日志(redo log)产生的过快,当CPK或归档还没完成,LGWR已经把其余的日志组写满,
又要往当前的日志组里面写redo log的时候,这个时候就会发生冲突,数据库就会被挂起。并且一直会往alert.log中写类似上面的错误信息。
另外,重做日志在不同业务时段的切换频率不一样,所以出现这个错误,一般是业务繁忙或者出现大量DML操作的时候。
3> 解决方法:
1:增大REDO LOG FILE的大小
增大redo log file的大小容易操作,但是redo log file设置为多大才是合理的呢?
1:参考V$INSTANCE_RECOVERY中OPTIMAL_LOGFILE_SIZE字段值,但是这个字段有可能为Null值,除非你调整FAST_START_MTTR_TARGET参数的值大于0
Redo log file size (in megabytes) that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. It is recommended that the user configure all online redo logs to be at least this value.
官方文档的建议如下:
You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to determine the size of your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online logs to be at least this size.
Note, however, that the redo log file size affects the MTTR. In some cases, you may be able to refine your choice of the optimal FAST_START_MTTR_TARGET value by re-running the MTTR Advisor with your suggested optimal log file size.
SQL> SELECT OPTIMAL_LOGFILE_SIZE FROM V$INSTANCE_RECOVERY;
2:根据重做日志切换次数和重做日志生成的量来判断
可以用awr_redo_size_history脚本统计分析一下,每个小时、每天生成的归档日志的大小,然后可以某些时间段(切换频繁的时间段)的归档日志大小和15~ 20分钟(如果某个时间段切换非常频繁,几乎无法使用这个规则,因为重组日志会非常大)切换一次计算重做日志大小。当然这个不是放之四海而皆准的规则,需要根据实际业务判断,大部分情况下还是可以参考这个
clip_image001
计算重做日志的一个脚本,仅供参考
SELECT (SELECT ROUND(AVG(BYTES) / 1024 / 1024, 2) FROM V$LOG) AS "Redo size (MB)", ROUND((20 / AVERAGE_PERIOD) * (SELECT AVG(BYTES) FROM V$LOG) / 1024 / 1024, 2) AS "Recommended Size (MB)" FROM (SELECT AVG((NEXT_TIME - FIRST_TIME) * 24 * 60) AS AVERAGE_PERIOD FROM V$ARCHIVED_LOG WHERE FIRST_TIME > SYSDATE - 3 AND TO_CHAR(FIRST_TIME, 'HH24:MI') BETWEEN &START_OF_PEAK_HOURS AND &END_OF_PEAK_HOURS );
2:增加REDO LOG Group的数量
增加日志组的数量,其实并不能解决“Thread <number> cannot allocate new log, sequence <number> Checkpoint not complete” 这个问题,但是他能解决下面这个问题:
ORACLE Instance <name> - Can not allocate log, archival required
Thread <number> cannot allocate new log, sequence <number>
这个是因为ARCH进程,尚未完成将重做日志文件复制到归档目标(需要存档),而此时由于重做日志切换太快或日志组过少,必须等待ARCR进程完成归档后,才能循环覆盖日志组。
3:Tune checkpoint
这个比较难,参考官方文档:Note 147468.1 Checkpoint Tuning and Troubleshooting Guide
4:Increase I/O speed for writing online REDO log/Archived REDO
This applies to Thread <number> cannot allocate new log, sequence <number>
Checkpoint not complete
- use ASYNC I/O if not already so
- use DBWR I/O slaves or multiple DBWR processes
Reference:
Oracle Database Performance Tuning Guide
Instance Tuning Using Performance Views
Consider Multiple Database Writer (DBWR) Processes or I/O Slaves
- consider the generic recommendations for REDO log files:
If the high I/O files are redo log files, then consider splitting the redo log files from the other files. Possible configurations can include the following:
1. Placing all redo logs on one disk without any other files. Also consider availability; members of the same group should be on different physical disks and controllers for recoverability purposes.
2. Placing each redo log group on a separate disk that does not store any other files.
3. Striping the redo log files across several disks, using an operating system striping tool. (Manual striping is not possible in this situation.)
4. Avoiding the use of RAID 5 for redo logs.
Reference:
Oracle Database Performance Tuning Guide
Redo Log Files
For
ORACLE Instance <name> - Can not allocate log, archival required
Thread <number> cannot allocate new log, sequence <number
In the above document you may check section "Archived Redo Logs"
5: 找到产生大量重做日志的SQL,如果这个SQL有业务或逻辑上不合理的地方,就要修改,或者将相关表设置为NOLOGGING,减少重做日志的产生
关于如何定位那些SQL产生了大量的重做日志,可以使用LogMiner工具,也可以参考我这篇博客“如何定位那些SQL产生了大量的redo日志”