zoukankan      html  css  js  c++  java
  • postgresql-从表中随机获取一条记录

    postgresql如何从表中高效的随机获取一条记录

    select C_BH from db_scld.t_scld_cprscjl  order by `random()` LIMIT 1;
    select  c_jdrybm from db_scld.t_jdry 
          where c_bmbm = v_scdd and c_sfyx ='1' and c_ryzszt not in ('05','12','11','07','09','13')  order by `random()` limit 1 
          
     db_jdsjpt=# explain analyze select C_BH from db_scld.t_scld_cprscjl  order by random() LIMIT 1;
                                                                 QUERY PLAN                                                         
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=61029.94..61029.94 rows=1 width=41) (actual time=587.193..587.193 rows=1 loops=1)
       ->  Sort  (cost=61029.94..63172.22 rows=856911 width=41) (actual time=587.185..587.185 rows=1 loops=1)
             Sort Key: (random())
             Sort Method: top-N heapsort  Memory: 25kB
             ->  Seq Scan on t_scld_cprscjl  (cost=0.00..56745.39 rows=856911 width=41) (actual time=0.019..380.139 rows=854682 loop
    s=1)
     Planning time: 1.179 ms
     Execution time: 587.242 ms
    (7 rows)
    --表总数量
     db_jdsjpt=# select count(*) from db_scld.t_scld_cprscjl;
     count  
    --------
     854682
    (1 row)
    
    

    随机获取一条记录random()

    random()耗时:Time: 389.818 ms

    --随机获取一条耗时
    db_jdsjpt=# select C_BH from db_scld.t_scld_cprscjl  order by random() LIMIT 1;
                   c_bh               
    ----------------------------------
     6d861b011c854040bf5b731f49d40b48
    (1 row)
    
    Time: 389.818 ms
    

    改写1

    offset耗时:Time: 60.022 ms

    --offset可以走索引,少了排序操作
    db_jdsjpt=# select C_BH from db_scld.t_scld_cprscjl  offset floor(random()*854682) LIMIT 1;
                   c_bh               
    ----------------------------------
     f90301bd8ac2485196ffae32ee70345c
    (1 row)
    
    Time: 60.022 ms
    
    db_jdsjpt=# explain analyze select C_BH from db_scld.t_scld_cprscjl  offset floor(random()*854682) LIMIT 1;
                                                                             QUERY PLAN                
                                    
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=3747.64..3747.68 rows=1 width=33) (actual time=30.758..30.759 rows=1 loops=1)
       ->  Index Only Scan using i_corscjl_cprscbh_ on t_scld_cprscjl  (cost=0.42..37472.65 rows=854682 width=33) (actual time=0.
    047..25.808 rows=81993 loops=1)
             Heap Fetches: 0
     Planning time: 0.228 ms
     Execution time: 30.802 ms
    (5 rows)
    
    Time: 31.779 ms
    
    

    改写2

    pg从9.5开始提供抽样函数

    使用tablesample抽样的过程中比例不能太低,否则可能获取不到结果,且不能带有过滤条件

    system耗时: Time: 0.639 ms

    system:随机性较差,效率高

    --改写后耗时
    db_jdsjpt=# select c_bh from db_scld.t_scld_cprscjl  tablesample system(0.1) limit 1;
                   c_bh               
    ----------------------------------
     e2fce25399db42f0bf49faf8e7214d5f
    (1 row)
    
    Time: 0.639 ms
    
    --system随机抽样以块为单位所以更快
    db_jdsjpt=# explain analyze  select c_bh from db_scld.t_scld_cprscjl  tablesample system(0.1) limit 1;
                                                          QUERY PLAN                                                      
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..0.23 rows=1 width=33) (actual time=0.105..0.105 rows=1 loops=1)
       ->  Sample Scan on t_scld_cprscjl  (cost=0.00..192.55 rows=855 width=33) (actual time=0.102..0.102 rows=1 loops=1)
             Sampling: system ('0.1'::real)
     Planning time: 0.190 ms
     Execution time: 0.134 ms
    (5 rows)
    
    Time: 1.182 ms
    
    

    改写3

    bernoulli:随机性更好,但效率比system要低

    bernoullih耗时:Time: 0.822 ms

    
    db_jdsjpt=# select c_bh from db_scld.t_scld_cprscjl  tablesample bernoulli(0.1) limit 1;
                   c_bh               
    ----------------------------------
     7ec30761ffd04bd9ad77797a33645a84
    (1 row)
    
    Time: 0.822 ms
    
    --bernoulli以行为单位进行抽样,比system效率低点
    db_jdsjpt=# explain analyze select c_bh from db_scld.t_scld_cprscjl  tablesample bernoulli(0.1) limit 1;
                                                           QUERY PLAN                                                       
    ---------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..53.85 rows=1 width=33) (actual time=1.410..1.411 rows=1 loops=1)
       ->  Sample Scan on t_scld_cprscjl  (cost=0.00..46042.55 rows=855 width=33) (actual time=1.408..1.408 rows=1 loops=1)
             Sampling: bernoulli ('0.1'::real)
     Planning time: 0.446 ms
     Execution time: 1.436 ms
    (5 rows)
    
    Time: 25.770 ms
    
    

    同理另外一条sql也可用同样的方式,且在c_bmbm字段上面加上索引

    当有条件的时候可以使用offset获取,offset的值也可以通过for循环传入

    db_jdsjpt=# select count(*) from db_scld.t_jdry;
     count  
    --------
     214819
    (1 row)
    
    db_jdsjpt=# select  c_jdrybm from db_scld.t_jdry  where c_bmbm = '4402222804' and c_sfyx ='1' and c_ryzszt not in ('05','12','11','07','09','13') offset floor(random()*214819) limit 1; 
     c_jdrybm 
    ----------
    (0 rows)
    
    Time: 1.924 ms
    

    对比

    方法 耗时
    order by random() 389.818 ms
    offset n 60.022 ms-240ms
    system() 0.639 ms
    bernoulli() 0.822 ms

    使用offset的时候和n的大小有关系,当n越大,扫描的索引块越多,就越大,但是相对于order by random()的方式仍然要快。

    注意

    system(0.1)等于百分之零点一,也就是抽样千分之一 854682*0.001=854,大概每次抽取854条记录

    --system
    db_jdsjpt=#  select count(*) from db_scld.t_scld_cprscjl  tablesample system(0.1) ;
     count 
    -------
       592
    (1 row)
    
    Time: 1.499 ms
    
    --bernoulli
    db_jdsjpt=#  select count(*) from db_scld.t_scld_cprscjl  tablesample bernoulli(0.1) ;
     count 
    -------
       840
    (1 row)
    
    Time: 86.037 ms
    这里可以看出bernoulli效率比system要低
    

    结语

    1.随机获取表中的一条数据,当表中数据较小时使用random感觉不明显,当数据量大时random由于每次都需要排序操作,导致随机获取一条的成本较高

    4.随机获取一条记录可以使用limit 1 offset n-1的方式,或者使用随机抽样的方式

    5.无论是使用limit 1 offset n还是使用tablesample随机抽样方式都需要知道表中的数据量,不能超过表数据量

  • 相关阅读:
    如何面试程序员?
    类似猪八戒网的网站
    存储过程
    一个不错的网站(博客制作参考)
    用触发器来实现级联更新级联删除
    用触发器进行级联删除
    数据库触发器详解
    浅谈数据库中的存储过程
    JDBC连接数据库
    Java递归函数
  • 原文地址:https://www.cnblogs.com/zhangfx01/p/10948200.html
Copyright © 2011-2022 走看看