zoukankan      html  css  js  c++  java
  • Redundant data in update statements

    Q:
     
    Hibernate generates UPDATE statements, which include all columns, regardless of whether I'm changing the value in that columns, eg:
    tx.begin();
    Item i = em.find(Item.class, 12345);
    i.setA("a-value");
    tx.commit();

    issues this UPDATE statement:

    update Item set A = $1, B = $2, C = $3, D = $4 where id = $5

    so columns B, C, D are updated, while I didn't change them.

    Say, Items are updated frequently and all columns are indexed. The question is: does it make sense to optimize the Hibernate part to something like this:

    tx.begin();
    em.createQuery("update Item i set i.a = :a where i.id = :id")
        .setParameter("a", "a-value")
        .setParameter("id", 12345)
        .executeUpdate();
    tx.commit();
    What confuses me most is that the EXPLAIN plans of the 'unoptimized' and the 'optimized' query version are identical!

     

    A:
     

    Due to PostgreSQL MVCC, an UPDATE is effectively a DELETE plus an INSERT. (To be precise, the "deleted" row is just invisible to any transaction starting after the delete and vacuumed later.) Therefore, on the database side, including index manipulation, there is in effect no difference between the two statements. It increases network traffic a bit (depending on your data) and needs a bit of parsing.

    I studied HOT updates after araqnid's input and ran some tests. Updates on columns that don't actually change the value make no difference whatsoever as far as HOT updates are concerned. My answer holds. See details below.

    However, if you use per-column triggers (introduced with v9.0), this my have undesired side effects!

    I quote the manual on triggers:

    ... a command such as UPDATE ... SET x = x ... will fire a trigger on column x, even though the column's value did not change.

    Abstraction layers are for convenience. They are useful for SQL-illiterate developers or if the application needs to be portable between different RDBMS. On the downside, they can butcher performance and introduce additional points of failure. I avoid them wherever possible.

    Concerning HOT (Heap-only tuple) updates

    Heap-Only Tuples were introduced with Postgres 8.3, with important improvements in 8.3.4 and 8.4.9.
    The release notes for Postgres 8.3:

    UPDATEs and DELETEs leave dead tuples behind, as do failed INSERTs. Previously only VACUUM could reclaim space taken by dead tuples. 
    With HOT dead tuple space can be automatically reclaimed at the time of INSERT or UPDATE if no
    changes are made to indexed columns. This allows for more consistent performance. Also, HOT avoids adding duplicate index entries.

    Emphasis mine. And "no changes" includes cases where columns are updated with the same value as they already hold. I actually tested that just now, as I wasn't sure.

    You don't have to take my word for it. See for yourself, Postgres provides a couple of functions to check statistics. Run your UPDATE with and without all columns and check if it makes any difference.

    -- Number of rows HOT-updated in table:
    SELECT pg_stat_get_tuples_hot_updated('table_name'::regclass::oid)
    
    -- Number of rows HOT-updated in table, in the current transaction:
    SELECT pg_stat_get_xact_tuples_hot_updated('table_name'::regclass::oid)

    Or use pgAdmin. Select your table and inspect the "Statistics" tab in the main window.

    Be aware that HOT updates are only when there is room for the new tuple version on the same page. One simple way to force that condition is to test with a small table that holds only a few rows. Page size is typically 8k, so there must be free space on the page.

    其中araqnid论证的过程如下:

    create temp table t1(t1_id serial primary key, reference varchar(16) not null unique, value varchar(16) not null);
    copy t1(reference, value) from stdin;
    FOO    foo
    BAR    bar
    QUUX    quux
    .
    
    create temp view t1_combined as
        select t1_id, reference, value, ctid, lp_flags, lp_off, case when t_ctid <> ctid then t_ctid end as t_ctid,
               t_xmin, xmin_visible, case when t_xmax::text <> '0' then t_xmax end as t_xmax, xmax_visible,
               xmin_visible and (xmax_visible is null or not xmax_visible or t_locked <> '') as visible, t_hot_updated, t_heap_only
        from (select *,
                     t_xmin_valid and txid_visible_in_snapshot(t_xmin::text::bigint, txid_current_snapshot()) as xmin_visible,
                     t_xmax_valid and txid_visible_in_snapshot(t_xmax::text::bigint, txid_current_snapshot()) as xmax_visible
              from (select ('(' || 0 || ',' || lp || ')')::tid as ctid,
                           lp, lp_off, case lp_flags when 0 then 'UNUSED' when 1 then 'NORMAL' when 2 then 'REDIRECT' when 3 then 'DEAD' end as lp_flags,
                           lp_len, t_xmin, t_xmax, t_field3, t_ctid, (t_infomask&1)<>0 as t_hasnull, (t_infomask&2)<>0 as t_hasvarwidth,
                           (t_infomask&4)<>0 as t_hasexternal, (t_infomask&8)<>0 as t_hasoid, (t_infomask&32)<>0 as t_combocid,
                           case t_infomask & 192 when 64 then 'EXCL' when 128 then 'SHARE' when 0 then '' when 192 then 'INVALID' end as t_locked,
                           (t_infomask&256)<>0 as t_xmin_committed, (t_infomask&512)=0 as t_xmin_valid,
                           (t_infomask&1024)<>0 as t_xmax_committed, (t_infomask&2048)=0 as t_xmax_valid,
                           (t_infomask&4096)<>0 as t_xmax_is_multi, (t_infomask&8192)<>0 as t_updated,
                           (t_infomask&16384)<>0 as t_moved_off, (t_infomask&32768)<>0 as t_moved_in,
                           t_infomask2&2047 as t_natts, (t_infomask2&16384)<>0 as t_hot_updated,
                           (t_infomask2&32768)<>0 as t_heap_only,
                           t_hoff, t_bits, t_oid
                    from heap_page_items(get_raw_page('t1', 0))) format_heap_page_items
             ) heap
             full outer join (select ctid, * from t1) t1 using (ctid);
    
    create temp view t1_indices as
        select ctid, pkey_content.itemoffset as pkey_itemoffset, pkey_content.data as pkey_data, auxkey_content.itemoffset as auxkey_itemoffset, auxkey_content.data as auxkey_data
        from bt_page_items('t1_pkey', 1) pkey_content
             full outer join bt_page_items('t1_reference_key', 1) auxkey_content using (ctid);
    
    echo ********************************************************************************
    echo * Initial table
    echo
    select * from t1_combined;
    select * from t1_indices;
    
    echo ********************************************************************************
    echo * Update non-indexed column
    echo * - index entries untouched
    echo * - old tuple at ctid (0,1) has t_hot_updated set
    echo * - new tuple at ctid (0,4) has t_heap_only set
    echo * - t_ctid of (0,1) points to (0,4)
    echo
    
    begin;
    update t1 set value = 'mumble' where t1_id = 1;
    end;
    
    select * from t1_combined;
    select * from t1_indices;
    
    echo ********************************************************************************
    echo * Update non-indexed column again
    echo * - tuple at ctid (0,4) now just points to ctid (0,5) and is redundant
    echo
    
    begin;
    update t1 set value = 'womble' where t1_id = 1;
    end;
    
    select * from t1_combined;
    select * from t1_indices;
    
    echo ********************************************************************************
    echo * Vacuum table
    echo * - line pointer ctid (0,1) converted to REDIRECT since index entries still point to it
    echo * - redundant tuple at ctid (0,4) reclaimed for reuse
    echo
    
    vacuum t1;
    
    select * from t1_combined;
    select * from t1_indices;
    
    echo ********************************************************************************
    echo * Update indexed column
    echo * - New index entries written for new tuple at ctid (0,4) which is now reused
    echo
    
    update t1 set reference = 'WOMBLE' where t1_id = 1;
    
    select * from t1_combined;
    select * from t1_indices;
    
    echo ********************************************************************************
    echo * Update indexed column to contain same value
    echo * - even though indexed column is mentioned in update, this makes a heap-only change
    echo * - current version is now (0,6) but indices still indicate (0,4)
    echo
    
    update t1 set reference = 'WOMBLE', value = 'womble2' where t1_id = 1;
    
    select * from t1_combined;
    select * from t1_indices;
    
    echo ********************************************************************************
    echo * Vacuum table
    echo * - ctid (0,1) now reclaimed, index entries pointing to it removed
    echo * - ctid (0,5) reclaimed too, it never had index entries pointing to it
    echo
    
    vacuum t1;
    
    select * from t1_combined;
    select * from t1_indices;

    执行结果可以根据脚本自测。在此不再列出。

    注:
    HOT中,即使是更新加有索引的一列,如果更新的数值不变,也不会产生新的index 记录的。

    参考:https://stackoverflow.com/questions/7806058/redundant-data-in-update-statements/7806610#7806610

  • 相关阅读:
    解决:std::ostream operator<< should have been declared inside 'xxx'
    c++ friend 遇到 namespace 无法访问 private 成员的问题
    Compiler Error C2872: ambiguous symbol
    【持续更新】总结:C++开发时积累的一些零碎的东西
    陷阱:C++模块之间的”直接依赖“和”间接依赖“与Makefile的撰写
    ZThread::ThreadLocal:ERROR C4716 must return a value的解决
    java值传递
    iframe与父页面传值
    iframe父子兄弟之间调用传值(contentWindow && parent)
    MySQL返回影响行数的测试示例
  • 原文地址:https://www.cnblogs.com/xiaotengyi/p/7793827.html
Copyright © 2011-2022 走看看