zoukankan      html  css  js  c++  java
  • PostgreSQL中index only scan并不总是仅扫描索引

    postgresql从9.2开始就引入了仅索引扫描(index only scans)。但不幸的是,并不是所有的index only scans都不会再访问表。

    postgres=# create table t1(a int,b int,c int);
    CREATE TABLE
    postgres=# insert into t1 select a.*,a.*,a.* from generate_series(1,1000000) a;
    INSERT 0 1000000
    postgres-# d+ t1
                                        Table "public.t1"
     Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
    --------+---------+-----------+----------+---------+---------+--------------+-------------
     a      | integer |           |          |         | plain   |              | 
     b      | integer |           |          |         | plain   |              | 
     c      | integer |           |          |         | plain   |              | 
    
    postgres-# 
    

    执行下面这种没有索引可用的查询,需要读取整个表获取数据:

    postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                    QUERY PLAN                                 
    ---------------------------------------------------------------------------
     Gather (actual time=1.069..70.557 rows=1 loops=1)
       Workers Planned: 2
       Workers Launched: 2
       Buffers: shared hit=5406
       ->  Parallel Seq Scan on t1 (actual time=11.805..34.050 rows=0 loops=3)
             Filter: (b = 5)
             Rows Removed by Filter: 333333
             Buffers: shared hit=5406
     Planning Time: 0.414 ms
     Execution Time: 70.612 ms
    (10 rows)
    
    postgres=# 
    

    这里,postgresql决定使用并行顺序扫描(parallel sequential scan)是对的。当然在没有索引的情况下,还有另一个选择是使用串行顺序扫描(serial sequential scan)。通常,我们会在表上创建索引。

    postgres=# create index i1 on t1(b);
    CREATE INDEX
    postgres=# d t1
                     Table "public.t1"
     Column |  Type   | Collation | Nullable | Default
    --------+---------+-----------+----------+---------
     a      | integer |           |          | 
     b      | integer |           |          | 
     c      | integer |           |          | 
    Indexes:
        "i1" btree (b)
    

    这样就可以使用索引返回数据:

    postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                 QUERY PLAN                              
    ---------------------------------------------------------------------
     Index Scan using i1 on t1 (actual time=0.066..0.068 rows=1 loops=1)
       Index Cond: (b = 5)
       Buffers: shared hit=1 read=3
     Planning Time: 0.773 ms
     Execution Time: 0.128 ms
    (5 rows)
    
    postgres=# 
    

    从执行计划就可以看到,使用了索引,但是postgresql仍然需要访问表获取列a的值。我们还可以创建一个索引,包含我们需要的所有列:

    postgres=# create index i2 on t1(b,a);
    CREATE INDEX
    postgres=# d+ t1
                                        Table "public.t1"
     Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
    --------+---------+-----------+----------+---------+---------+--------------+-------------
     a      | integer |           |          |         | plain   |              | 
     b      | integer |           |          |         | plain   |              | 
     c      | integer |           |          |         | plain   |              | 
    Indexes:
        "i1" btree (b)
        "i2" btree (b, a)
    
    postgres=# 
    

    再来看看刚才的查询语句的执行情况:

    postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     Index Only Scan using i2 on t1 (actual time=0.346..0.353 rows=1 loops=1)
       Index Cond: (b = 5)
       Heap Fetches: 1
       Buffers: shared hit=1 read=3
     Planning Time: 0.402 ms
     Execution Time: 0.401 ms
    (6 rows)
    
    postgres=# 
    

    但是仍然有一个Heap Fetches:1

    为什么呢?为了回答这个问题,我们先看看t1表在磁盘上的文件:

    postgres=# select pg_relation_filepath('t1');
     pg_relation_filepath 
    ----------------------
     base/13878/74982
    (1 row)
    
    postgres=# ! ls -l /pg/11/data/base/13878/74982*
    -rw------- 1 postgres postgres 44285952 Oct 31 15:12 /pg/11/data/base/13878/74982
    -rw------- 1 postgres postgres    32768 Oct 31 15:08 /pg/11/data/base/13878/74982_fsm
    postgres=# 
    

    这个表有个free space map文件,但是还没有visibility map文件。没有visibility map,postgresql就不知道是否所有的行对当前事务都是可见的,因此需要去访问表获取数据。当创建了visibility map之后:

    postgres=# vacuum t1;
    VACUUM
    postgres=# ! ls -l /pg/11/data/base/13878/74982*
    -rw------- 1 postgres postgres 44285952 Oct 31 15:12 /pg/11/data/base/13878/74982
    -rw------- 1 postgres postgres    32768 Oct 31 15:08 /pg/11/data/base/13878/74982_fsm
    -rw------- 1 postgres postgres     8192 Oct 31 15:39 /pg/11/data/base/13878/74982_vm
    postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     Index Only Scan using i2 on t1 (actual time=0.044..0.045 rows=1 loops=1)
       Index Cond: (b = 5)
       Heap Fetches: 0
       Buffers: shared hit=4
     Planning Time: 0.230 ms
     Execution Time: 0.102 ms
    (6 rows)
    
    postgres=# 
    

    这里,Heap Fetches:0

    说明没有从表获取数据,真正做到了仅索引扫描(不过或扫描visiblity map)

     

    为了描述更清楚点,来看看行的物理位置:

    postgres=# select ctid,* from t1 where b=5;
     ctid  | a | b | c 
    -------+---+---+---
     (0,5) | 5 | 5 | 5
    (1 row)
    
    postgres=# 
    

    可以看到,行位于block 0,且是第五行。我们来看看block中的行是否对所有事务都可见:

    postgres=# create extension pg_visibility;
    CREATE EXTENSION
    postgres=# select pg_visibility_map('t1'::regclass, 0);
     pg_visibility_map 
    -------------------
     (t,f)
    (1 row)
    
    postgres=# 
    

    t表示所有可见。如果,我们在另一个会话中更新一行会怎么样?

    在session2中执行:

    postgres=# update t1 set a=8 where b=5;
    UPDATE 1
    postgres=# 
    

    回来原来的会话,再次查看:

    postgres=# select pg_visibility_map('t1'::regclass, 0);
     pg_visibility_map 
    -------------------
     (f,f)
    (1 row)
    
    postgres=# 
    

    这里可以看到:

    1.对页的修改清除了visibility map

    2.仅索引扫描需要回表

    postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     Index Only Scan using i2 on t1 (actual time=0.080..0.082 rows=1 loops=1)
       Index Cond: (b = 5)
       Heap Fetches: 2
       Buffers: shared hit=6 dirtied=3
     Planning Time: 0.132 ms
     Execution Time: 0.120 ms
    (6 rows)
    
    postgres=# 
    

    现在的问题是:为什么Heap Fetches:2

    首先,postgresql中每个update都会创建一个新行:

    postgres=# select ctid,* from t1 where b=5;
       ctid    | a | b | c 
    -----------+---+---+---
     (5405,76) | 8 | 5 | 5
    (1 row)
    
    postgres=# 
    

    现在,这行数据在新的block中(即使是在同一个block中,也是在另一个地方),这当然也会影响指向该行的索引条目。索引仍然指向该行的老版本,同时有一个指针指向行的当前版本,因此需要两次Heap Fetches(当你更新的列不在索引中时,被称作hot update)。

    下一次执行,我们可以看到只有一次访问表:

    postgres=# explain (analyze,buffers,costs off) select a from t1 where b = 5;
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     Index Only Scan using i2 on t1 (actual time=0.039..0.042 rows=1 loops=1)
       Index Cond: (b = 5)
       Heap Fetches: 1
       Buffers: shared hit=5
     Planning Time: 0.112 ms
     Execution Time: 0.071 ms
    (6 rows)
    
    postgres=# 
    

    这里,还不清楚为什么变成了一次!!!

     

    需要明白的是,index only scans并不总是仅扫描索引。

     

  • 相关阅读:
    HDU 5135(再思考)
    HDU 5105
    HDU 5135
    Codeforces 985E
    Codeforces 985D
    Codeforces 975D
    Codeforces 975C
    Codeforces 976D
    HDU 1024 Max Sum Plus Plus (DP,水题)
    HDU 1003 Max Sum(DP,水题)
  • 原文地址:https://www.cnblogs.com/abclife/p/13906623.html
Copyright © 2011-2022 走看看