zoukankan      html  css  js  c++  java
  • ORACLE等待事件: log file parallel write

    log file parallel write概念介绍

    log file parallel write 事件是LGWR进程专属的等待事件,发生在LGWR将日志缓冲区(log_buffer)中的重做日志信息写入联机重做日志文件组的成员文件,LGWR在该事件上等待该写入过程的完成。该事件的等待表示重做日志所处的磁盘设备缓慢或存在争用。下面看看官方一些资料是如何解释log file parallel write等待事件的。

     

    log file parallel write

     

    Writing redo records to the redo log files from the log buffer.

     

    Wait Time: Time it takes for the I/Os to complete. Even though redo records are written in parallel, the parallel write is not complete until the last I/O is on disk.

     

     

    Parameters:

     

     

    ·         P1 = files

    ·         P2 = blocks

    ·         P3 = requests

    ·         files Number of files written to.

            If you have more than one log member per group then the files are written to in parallel (if possible). This is the number of redo log members (files) that the writes apply to.

    ·         blocks Number of blocks to be written.

            The number of REDO blocks being written to each log member. eg: A value of 10 indicates 10 redo blocks for each log member.

    ·         requests Number of I/O requests.

            The number of distinct IO requests. The number of blocks to be written is divided up into a number of IO requests.

     

     

    Wait Time:

     

     

            The actual wait time is the time taken for all the outstanding I/O requests to complete. Even though the writes may be issued in parallel, LGWR needs to wait for the last I/O to be on disk before the parallel write is considered complete. Hence the wait time depends on the time it takes the OS to complete all requests.

     

    注意:自己看到这个等待事件parallel write,以及网上有“LGWR进程对同一组多个重做日志文件‘同时’写,是通过异步I/O来实现的,因此等待事件log file parallel write应该是在同一组下有多个重做日志文件时才会出现 ”这种说法,其实不管重做日志是一个或多个成员,都会出现log file parallel write等待事件。只要有buffer log写到redo log里面就有这个等待,它不是说并行写多个redo文件,单个文件也是这样的。

     

    log file parallel write出现原因

     

    如果数据库出现了这个等待事件,意味着重做日志(redo log)所处的磁盘设备I/O缓慢或存在争用

     

    · 磁盘I/O性能比较差

    · REDO文件的分布导致了I/O争用,例如,同一组的REDO成员文件放在相同的磁盘上。

     

     

    查看log file parallel write等待事件

     

    SQL> select s.event            as event_name
      2        ,s.time_waited/100  as time_waited
      3        ,s.average_wait     as averge_wait
      4  from v$system_event s
      5  where s.event in ('log file parallel write','log file sync');
     
    EVENT_NAME                                                       TIME_WAITED AVERGE_WAIT
    ---------------------------------------------------------------- ----------- -----------
    log file parallel write                                             45650.75         .03
    log file sync                                                        54045.4         .04
     
    SQL> 

     

    上面time_waited、average_wait的单位为百分之一秒(in hundredths of a second), 'log file parallel write'事件的平均等待时间大于 10ms (or 1cs),这通常意味着存在较慢的I/O吞吐量。 LGWR进程写入慢会影响用户事务提交的时间。

     

    If the log file parallel write average wait time is greater than 10ms (or 1cs), this normally indicates slow I/O throughput.The cure is the same as for the db file parallel write waits. Enable asynchronous writes if your redo logs are on raw devices and the operating system supports asynchronous I/O. For redo logs on file systems, use synchronous direct writes.

     

    其它一些查看redo log相关信息的SQL

    SELECT b.name, 
           a.value, 
           Round(SYSDATE - c.startup_time) days_old 
    FROM   v$sysstat a, 
           v$statname b, 
           v$instance c 
    WHERE  a.statistic# = b.statistic# 
           AND b.name IN ( 'redo wastage', 'redo size' ); 
     
     
    SELECT sid, 
           value 
    FROM   v$sesstat 
    WHERE  statistic# = (SELECT statistic# 
                         FROM   v$statname 
                         WHERE  name = 'user commits') 
    ORDER  BY value DESC; 

     

    如何减少log file parallel write等待时间

     

    要减少log file parallerl write等待时间,可以从下面两个方面入手解决(其它细节参考下面英文资料):

    (1)将重做日志文件组放置到高速I/O磁盘上(不要将日志组成员文件放置IO不一样的磁盘上)。不要将重做日志放在RAID5的磁盘上。

    (2)尽可能的降低重做数量:

           1:尽可能使用nologging选项,例如索引的重建、重组。包括create table...as select...等操作

           2:热备可能生成大量的重做信息,所以热备份应该在非高峰时间运行。

           3:应用程序的commit的频率小一些也能有一些改善。

     

    Reducing Waits / Wait times:

    You might want to reduce "log file parallel write" wait times in order to reduce user waits which depend on LGWR.

    · Ensure tablespaces are NOT left in HOT BACKUP mode longer than needed.

    · Tablespaces in HOT BACKUP mode cause more redo to be generated for each change which can vastly increase the rate of redo generarion.

    · NOLOGGING / UNRECOVERABLE operations may be possible for certain operations to reduce the overall rate of redo generation

    · Redo log members should ideally be on high speed disks

    · eg: Historically RAID5 was not a good candidate for redo log members, although some of the RAID 5 drawbacks have been addressed more recently through larger intelligent disk caches, etc.

    · Redo log members should be on disks with little/no IO activity from other sources.

    · (including low activity from other sources against the same disk controller)

    · With file systems like ASM and ZFS, you may also have to experiment with the disk stripe size to make more friendly with the type of IO LGWR usually performs.

    · Check that the system has not become CPU (or memory) bound otherwise it may take LGWR more time to recognize IO has been completed.

    · The higher the value for CPU_COUNT, the more strands can be used for writing redo. Having an increased number of strands can lead to an increase the average waits on "log file parallel write" (which in turn can affect "log file sync" waits). Filesystems that do not support Concurrent IO (CIO) are more likely to be affected by this, compared to ASM or raw devices. However, it can also impact filesystems that support CIO. On servers with CPU_COUNT greater than 128 LGWR behaviour may need manually adjusting through underscore parameters. If you think you are encountering long "log file parallel write" due to high CPU counts then contact Oracle Support.

     

    ---------------------------------------------------------------------------------------------------------

    Unfortunately, you cannot spawn more than one LGWR process. In this case, it is critical that nothing else is sharing the mount point of the redo log files. Make sure the controller that services the mount point is not overloaded. Moving the redo logs to higher speed disks will also help.

    We strongly suggest that you avoid putting redo logs on RAID5 disks, but we also understand that many times you don’t have a choice or a say in it.

    Besides improving the I/O throughput, you can also work on lowering the amount of redo entries. This will provide some relief, but not the cure. Whenever possible, use the NOLOGGING option. Indexes should be created or rebuilt with the NOLOGGING option. CTAS operations should also use this option.

    Note:

           The NOLOGGING option doesn’t apply to normal DML operations such as inserts, updates, and deletes. Objects created with the NOLOGGING option are unrecoverable unless a backup is taken prior to the corruption. If you have to take an additional backup, then the I/Os that you save by not logging will be spent on backup.        Database in FORCE LOGGING mode will log all changes (except for changes in temporary tablespaces), regardless of the tablespace and object settings.

           A lower commit frequency at the expense of higher rollback segment usage can also provide some relief.

           A high commit frequency causes the LGWR process to be overactive and when coupled with slow I/O throughput will only magnify the log file parallel write waits.

           The application may be processing a large set of data in a loop and committing each change, which causes the log buffer to be flushed too frequently. In this case, modify the application to commit at a lower frequency. There could also be many short sessions that log in to the database, perform. a quick DML operation, and log out.

     

    下面提供一个用户做log file sync waits、log file parallel write troubleshoot的一个脚本。官方文档ID为1064487.1

    NAME: LFSDIAG.SQL
    -- ------------------------------------------------------------------------
    -- AUTHOR: Michael Polaski - Oracle Support Services
    -- ------------------------------------------------------------------------
    -- PURPOSE:
    -- This script is intended to provide a user friendly guide to troubleshoot
    -- log file sync waits. The script will look at important parameters involved
    -- in log file sync waits, log file sync wait histogram data, and the script
    -- will look at the worst average log file sync times in the active session
    -- history data and AWR data and dump information to help determine why those
    -- times were the highest. The script will create a file called
    -- lfsdiag_<timestamp>.out in your local directory.
     
    set echo off
    set feedback off
    column timecol new_value timestamp
    column spool_extension new_value suffix
    select to_char(sysdate,'Mondd_hh24mi') timecol,
    '.out' spool_extension from sys.dual;
    column output new_value dbname
    select value || '_' output
    from v$parameter where name = 'db_name';
    spool lfsdiag_&&dbname&&timestamp&&suffix
    set trim on
    set trims on
    set lines 140
    set pages 100
    set verify off
    alter session set optimizer_features_enable = '10.2.0.4';
     
    PROMPT LFSDIAG DATA FOR &&dbname&&timestamp
    PROMPT Note: All timings are in milliseconds (1000 milliseconds = 1 second)
     
    PROMPT
    PROMPT IMPORTANT PARAMETERS RELATING TO LOG FILE SYNC WAITS:
    column name format a40 wra
    column value format a40 wra
    select inst_id, name, value from gv$parameter
    where ((value is not null and name like '%log_archive%') or
    name like '%commit%' or name like '%event=%' or name like '%lgwr%')
    and name not in (select name from gv$parameter where (name like '%log_archive_dest_state%'
    and value = 'enable') or name = 'log_archive_format')
    order by 1,2,3;
     
    PROMPT
    PROMPT ASH THRESHOLD...
    PROMPT
    PROMPT This will be the threshold in milliseconds for average log file sync
    PROMPT times. This will be used for the next queries to look for the worst
    PROMPT 'log file sync' minutes. Any minutes that have an average log file
    PROMPT sync time greater than the threshold will be analyzed further.
    column threshold_in_ms new_value threshold format 999999999.999
    select min(threshold_in_ms) threshold_in_ms
    from (select inst_id, to_char(sample_time,'Mondd_hh24mi') minute,
    avg(time_waited)/1000 threshold_in_ms
    from gv$active_session_history
    where event = 'log file sync'
    group by inst_id,to_char(sample_time,'Mondd_hh24mi')
    order by 3 desc)
    where rownum <= 10;
     
    PROMPT
    PROMPT ASH WORST MINUTES FOR LOG FILE SYNC WAITS:
    PROMPT
    PROMPT APPROACH: These are the minutes where the avg log file sync time
    PROMPT was the highest (in milliseconds).
    column minute format a12 tru
    column event format a30 tru
    column program format a40 tru
    column total_wait_time format 999999999999.999
    column avg_time_waited format 999999999999.999
    select to_char(sample_time,'Mondd_hh24mi') minute, inst_id, event,
    sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
    avg(time_waited)/1000 AVG_TIME_WAITED
    from gv$active_session_history
    where event = 'log file sync'
    group by to_char(sample_time,'Mondd_hh24mi'), inst_id, event
    having avg(time_waited)/1000 > &&threshold
    order by 1,2;
     
    PROMPT
    PROMPT ASH LFS BACKGROUND PROCESS WAITS DURING WORST MINUTES:
    PROMPT
    PROMPT APPROACH: What is LGWR doing when 'log file sync' waits
    PROMPT are happening? LMS info may be relevent for broadcast
    PROMPT on commit and LNS data may be relevant for dataguard.
    PROMPT If more details are needed see the ASH DETAILS FOR WORST
    PROMPT MINUTES section at the bottom of the report.
    column inst format 999
    column minute format a12 tru
    column event format a30 tru
    column program format a40 wra
    select to_char(sample_time,'Mondd_hh24mi') minute, inst_id inst, program, event,
    sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
    avg(time_waited)/1000 AVG_TIME_WAITED
    from gv$active_session_history
    where to_char(sample_time,'Mondd_hh24mi') in (select to_char(sample_time,'Mondd_hh24mi')
    from gv$active_session_history
    where event = 'log file sync'
    group by to_char(sample_time,'Mondd_hh24mi'), inst_id
    having avg(time_waited)/1000 > &&threshold and sum(time_waited)/1000 > 1)
    and (program like '%LGWR%' or program like '%LMS%' or
    program like '%LNS%' or event = 'log file sync')
    group by to_char(sample_time,'Mondd_hh24mi'), inst_id, program, event
    order by 1,2,3,5 desc, 4;
     
    PROMPT
    PROMPT HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
    PROMPT
    PROMPT APPROACH: Look at the wait distribution for log file sync waits
    PROMPT by looking at "wait_time_milli". Look at the high wait times then
    PROMPT see if you can correlate those with other related wait events.
    column event format a40 wra
    select inst_id, event, wait_time_milli, wait_count
    from gv$event_histogram
    where event in ('log file sync','gcs log flush sync',
    'log file parallel write','wait for scn ack',
    'log file switch completion','gc cr grant 2-way',
    'gc buffer busy','gc current block 2-way') or
    event like '%LGWR%' or event like '%LNS%'
    order by 2 desc,1,3;
     
    PROMPT
    PROMPT ORDERED BY WAIT_TIME_MILLI
    select inst_id, event, wait_time_milli, wait_count
    from gv$event_histogram
    where event in ('log file sync','gcs log flush sync',
    'log file parallel write','wait for scn ack',
    'log file switch completion','gc cr grant 2-way',
    'gc buffer busy','gc current block 2-way')
    or event like '%LGWR%' or event like '%LNS%'
    order by 3,1,2 desc;
     
    PROMPT
    PROMPT REDO WRITE STATS
    PROMPT
    PROMPT "redo write time" in centiseconds (100 per second)
    PROMPT 11.1: "redo write broadcast ack time" in centiseconds (100 per second)
    PROMPT 11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
    column value format 99999999999999999999
    column milliseconds format 99999999999999.999
    select v.version, ss.inst_id, ss.name, ss.value,
    decode(substr(version,1,4),
    '11.1',decode (name,'redo write time',value*10,
    'redo write broadcast ack time',value*10),
    '11.2',decode (name,'redo write time',value*10,
    'redo write broadcast ack time',value/1000),
    decode (name,'redo write time',value*10)) milliseconds
    from gv$sysstat ss, v$instance v
    where name like 'redo write%' and value > 0
    order by 1,2,3;
     
    PROMPT
    PROMPT AWR WORST AVG LOG FILE SYNC SNAPS:
    PROMPT
    PROMPT APPROACH: These are the AWR snaps where the average 'log file sync'
    PROMPT times were the highest.
    column begin format a12 tru
    column end format a12 tru
    column name format a13 tru
    select dhs.snap_id, dhs.instance_number inst, to_char(dhs.begin_interval_time,'Mondd_hh24mi') BEGIN,
    to_char(dhs.end_interval_time,'Mondd_hh24mi') END,
    en.name, se.time_waited_micro/1000 total_wait_time, se.total_waits,
    se.time_waited_micro/1000 / se.total_waits avg_time_waited
    from dba_hist_snapshot dhs, wrh$_system_event se, v$event_name en
    where (dhs.snap_id = se.snap_id and dhs.instance_number = se.instance_number)
    and se.event_id = en.event_id and en.name = 'log file sync' and
    dhs.snap_id in (select snap_id from (
    select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
    from wrh$_system_event se, v$event_name en
    where se.event_id = en.event_id and en.name = 'log file sync'
    order by avg_time_waited desc)
    where rownum < 4)
    order by 1,2;
     
    PROMPT
    PROMPT AWR REDO WRITE STATS
    PROMPT
    PROMPT "redo write time" in centiseconds (100 per second)
    PROMPT 11.1: "redo write broadcast ack time" in centiseconds (100 per second)
    PROMPT 11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
    column stat_name format a30 tru
    select v.version, ss.snap_id, ss.instance_number inst, sn.stat_name, ss.value,
    decode(substr(version,1,4),
    '11.1',decode (stat_name,'redo write time',value*10,
    'redo write broadcast ack time',value*10),
    '11.2',decode (stat_name,'redo write time',value*10,
    'redo write broadcast ack time',value/1000),
    decode (stat_name,'redo write time',value*10)) milliseconds
    from wrh$_sysstat ss, wrh$_stat_name sn, v$instance v
    where ss.stat_id = sn.stat_id
    and sn.stat_name like 'redo write%' and ss.value > 0
    and ss.snap_id in (select snap_id from (
    select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
    from wrh$_system_event se, v$event_name en
    where se.event_id = en.event_id and en.name = 'log file sync'
    order by avg_time_waited desc)
    where rownum < 4)
    order by 1,2,3;
     
    PROMPT
    PROMPT AWR LFS AND OTHER RELATED WAITS FOR WORST LFS AWRs:
    PROMPT
    PROMPT APPROACH: These are the AWR snaps where the average 'log file sync'
    PROMPT times were the highest. Look at related waits at those times.
    column name format a40 tru
    select se.snap_id, se.instance_number inst, en.name,
    se.total_waits, se.time_waited_micro/1000 total_wait_time,
    se.time_waited_micro/1000 / se.total_waits avg_time_waited
    from wrh$_system_event se, v$event_name en
    where se.event_id = en.event_id and (en.name in
    ('log file sync','gcs log flush sync',
    'log file parallel write','wait for scn ack',
    'log file switch completion','gc cr grant 2-way',
    'gc buffer busy','gc current block 2-way')
    or en.name like '%LGWR%' or en.name like '%LNS%')
    and se.snap_id in (select snap_id from (
    select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
    from wrh$_system_event se, v$event_name en
    where se.event_id = en.event_id and en.name = 'log file sync'
    order by avg_time_waited desc)
    where rownum < 4)
    order by 1, 6 desc;
     
    PROMPT
    PROMPT AWR HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS FOR WORST LFS AWRs:
    PROMPT Note: This query won't work on 10.2 - ORA-942
    PROMPT
    PROMPT APPROACH: Look at the wait distribution for log file sync waits
    PROMPT by looking at "wait_time_milli". Look at the high wait times then
    PROMPT see if you can correlate those with other related wait events.
    select eh.snap_id, eh.instance_number inst, en.name, eh.wait_time_milli, eh.wait_count
    from wrh$_event_histogram eh, v$event_name en
    where eh.event_id = en.event_id and
    (en.name in ('log file sync','gcs log flush sync',
    'log file parallel write','wait for scn ack',
    'log file switch completion','gc cr grant 2-way',
    'gc buffer busy','gc current block 2-way')
    or en.name like '%LGWR%' or en.name like '%LNS%')
    and snap_id in (select snap_id from (
    select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
    from wrh$_system_event se, v$event_name en
    where se.event_id = en.event_id and en.name = 'log file sync'
    order by avg_time_waited desc)
    where rownum < 4)
    order by 1,3 desc,2,4;
     
    PROMPT
    PROMPT ORDERED BY WAIT_TIME_MILLI
    PROMPT Note: This query won't work on 10.2 - ORA-942
    select eh.snap_id, eh.instance_number inst, en.name, eh.wait_time_milli, eh.wait_count
    from wrh$_event_histogram eh, v$event_name en
    where eh.event_id = en.event_id and
    (en.name in ('log file sync','gcs log flush sync',
    'log file parallel write','wait for scn ack',
    'log file switch completion','gc cr grant 2-way',
    'gc buffer busy','gc current block 2-way')
    or en.name like '%LGWR%' or en.name like '%LNS%')
    and snap_id in (select snap_id from (
    select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
    from wrh$_system_event se, v$event_name en
    where se.event_id = en.event_id and en.name = 'log file sync'
    order by avg_time_waited desc)
    where rownum < 4)
    order by 1,4,2,3 desc;
     
    PROMPT
    PROMPT ASH DETAILS FOR WORST MINUTES:
    PROMPT
    PROMPT APPROACH: If you cannot determine the problem from the data
    PROMPT above, you may need to look at the details of what each session
    PROMPT is doing during each 'bad' snap. Most likely you will want to
    PROMPT note the times of the high log file sync waits, look at what
    PROMPT LGWR is doing at those times, and go from there...
    column program format a45 wra
    column sample_time format a25 tru
    column event format a30 tru
    column time_waited format 999999.999
    column p1 format a40 tru
    column p2 format a40 tru
    column p3 format a40 tru
    select sample_time, inst_id inst, session_id, program, event, time_waited/1000 TIME_WAITED,
    p1text||': '||p1 p1,p2text||': '||p2 p2,p3text||': '||p3 p3
    from gv$active_session_history
    where to_char(sample_time,'Mondd_hh24mi') in (select
    to_char(sample_time,'Mondd_hh24mi')
    from gv$active_session_history
    where event = 'log file sync'
    group by to_char(sample_time,'Mondd_hh24mi'), inst_id
    having avg(time_waited)/1000 > &&threshold)
    and time_waited > 0.5
    order by 1,2,3,4,5;
     
    select to_char(sysdate,'Mondd hh24:mi:ss') TIME from dual;
     
    spool off
     
    PROMPT
    PROMPT OUTPUT FILE IS: lfsdiag_&&dbname&&timestamp&&suffix
    PROMPT

    参考资料:

    http://www.oraclecommunity.net/profiles/blogs/wait-events-log-file-sync-and-log-file-parallel-write
    http://blog.itpub.net/15880878/viewspace-722994/

  • 相关阅读:
    Kubernetes如何支持有状态服务的部署?
    Web 安全入门-书籍及建议
    docker kubernetes swarm spring cloud结合学习资源
    docker-compose 完整打包发布, 多服务,多节点SPRING CLOUD ,EUREKA 集群
    Kubernetes权威指南学习笔记(一)
    解决k8s出现pod服务一直处于ContainerCreating状态的问题的过程
    Spark操作—aggregate、aggregateByKey详解
    maven设置------settings.xml文件学习
    Scala详细环境安装与配置
    内存泄露
  • 原文地址:https://www.cnblogs.com/kerrycode/p/5849564.html
Copyright © 2011-2022 走看看