linux提供的基础指标容易产生误导,很多人看了之后也不知道真正衡量的是什么。
wget https://ftp.postgresql.org/pub/source/v11.12/postgresql-11.12.tar.gz ./configure --enable-debug make make install adduser postgres mkdir /usr/local/pgsql/data chown postgres /usr/local/pgsql/data su - postgres /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
下载并安装linux-ftools:
linux-ftools提供的fincore工具可以显示文件是如何映射到文件系统缓存的。
wget https://download.opensuse.org/repositories/home:/bmanojlovic/RHEL_7/x86_64/linux-ftools-1.3-5.1.x86_64.rpm rpm -ivh linux-ftools-1.3-5.1.x86_64.rpm
pg_prewarm extension可以快速地将表加载到cache中。
cd contrib make & & make install /usr/local/pgsql/bin/psql postgres <<<" create extension pg_prewarm; "
准备数据,每个scale=64
$ /usr/local/pgsql/bin/pgbench -i -s 64 postgres dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data... 100000 of 6400000 tuples (1%) done (elapsed 0.08 s, remaining 5.25 s) 200000 of 6400000 tuples (3%) done (elapsed 0.22 s, remaining 6.70 s) 300000 of 6400000 tuples (4%) done (elapsed 0.36 s, remaining 7.22 s) 400000 of 6400000 tuples (6%) done (elapsed 0.46 s, remaining 6.86 s) 500000 of 6400000 tuples (7%) done (elapsed 0.60 s, remaining 7.03 s) 600000 of 6400000 tuples (9%) done (elapsed 0.72 s, remaining 6.99 s) 700000 of 6400000 tuples (10%) done (elapsed 0.81 s, remaining 6.63 s) 800000 of 6400000 tuples (12%) done (elapsed 0.94 s, remaining 6.59 s) 900000 of 6400000 tuples (14%) done (elapsed 1.04 s, remaining 6.35 s) 1000000 of 6400000 tuples (15%) done (elapsed 1.16 s, remaining 6.25 s) 1100000 of 6400000 tuples (17%) done (elapsed 1.30 s, remaining 6.27 s) 1200000 of 6400000 tuples (18%) done (elapsed 1.40 s, remaining 6.06 s) 1300000 of 6400000 tuples (20%) done (elapsed 1.52 s, remaining 5.98 s) 1400000 of 6400000 tuples (21%) done (elapsed 1.67 s, remaining 5.96 s) 1500000 of 6400000 tuples (23%) done (elapsed 1.76 s, remaining 5.76 s) 1600000 of 6400000 tuples (25%) done (elapsed 1.89 s, remaining 5.67 s) 1700000 of 6400000 tuples (26%) done (elapsed 2.05 s, remaining 5.66 s) 1800000 of 6400000 tuples (28%) done (elapsed 2.13 s, remaining 5.45 s) 1900000 of 6400000 tuples (29%) done (elapsed 2.27 s, remaining 5.38 s) 2000000 of 6400000 tuples (31%) done (elapsed 2.41 s, remaining 5.30 s) 2100000 of 6400000 tuples (32%) done (elapsed 2.50 s, remaining 5.12 s) 2200000 of 6400000 tuples (34%) done (elapsed 2.63 s, remaining 5.02 s) 2300000 of 6400000 tuples (35%) done (elapsed 2.73 s, remaining 4.87 s) 2400000 of 6400000 tuples (37%) done (elapsed 2.85 s, remaining 4.76 s) 2500000 of 6400000 tuples (39%) done (elapsed 2.99 s, remaining 4.66 s) 2600000 of 6400000 tuples (40%) done (elapsed 3.09 s, remaining 4.51 s) 2700000 of 6400000 tuples (42%) done (elapsed 3.20 s, remaining 4.38 s) 2800000 of 6400000 tuples (43%) done (elapsed 3.34 s, remaining 4.30 s) 2900000 of 6400000 tuples (45%) done (elapsed 3.43 s, remaining 4.13 s) 3000000 of 6400000 tuples (46%) done (elapsed 3.56 s, remaining 4.04 s) 3100000 of 6400000 tuples (48%) done (elapsed 3.64 s, remaining 3.88 s) 3200000 of 6400000 tuples (50%) done (elapsed 3.78 s, remaining 3.78 s) 3300000 of 6400000 tuples (51%) done (elapsed 3.91 s, remaining 3.67 s) 3400000 of 6400000 tuples (53%) done (elapsed 4.01 s, remaining 3.54 s) 3500000 of 6400000 tuples (54%) done (elapsed 4.14 s, remaining 3.43 s) 3600000 of 6400000 tuples (56%) done (elapsed 4.28 s, remaining 3.33 s) 3700000 of 6400000 tuples (57%) done (elapsed 4.38 s, remaining 3.20 s) 3800000 of 6400000 tuples (59%) done (elapsed 4.50 s, remaining 3.08 s) 3900000 of 6400000 tuples (60%) done (elapsed 4.65 s, remaining 2.98 s) 4000000 of 6400000 tuples (62%) done (elapsed 4.75 s, remaining 2.85 s) 4100000 of 6400000 tuples (64%) done (elapsed 4.89 s, remaining 2.74 s) 4200000 of 6400000 tuples (65%) done (elapsed 4.99 s, remaining 2.61 s) 4300000 of 6400000 tuples (67%) done (elapsed 5.14 s, remaining 2.51 s) 4400000 of 6400000 tuples (68%) done (elapsed 5.29 s, remaining 2.40 s) 4500000 of 6400000 tuples (70%) done (elapsed 5.43 s, remaining 2.29 s) 4600000 of 6400000 tuples (71%) done (elapsed 5.53 s, remaining 2.16 s) 4700000 of 6400000 tuples (73%) done (elapsed 5.67 s, remaining 2.05 s) 4800000 of 6400000 tuples (75%) done (elapsed 5.80 s, remaining 1.93 s) 4900000 of 6400000 tuples (76%) done (elapsed 5.89 s, remaining 1.80 s) 5000000 of 6400000 tuples (78%) done (elapsed 6.02 s, remaining 1.69 s) 5100000 of 6400000 tuples (79%) done (elapsed 6.12 s, remaining 1.56 s) 5200000 of 6400000 tuples (81%) done (elapsed 6.24 s, remaining 1.44 s) 5300000 of 6400000 tuples (82%) done (elapsed 6.37 s, remaining 1.32 s) 5400000 of 6400000 tuples (84%) done (elapsed 6.47 s, remaining 1.20 s) 5500000 of 6400000 tuples (85%) done (elapsed 6.59 s, remaining 1.08 s) 5600000 of 6400000 tuples (87%) done (elapsed 6.68 s, remaining 0.95 s) 5700000 of 6400000 tuples (89%) done (elapsed 6.82 s, remaining 0.84 s) 5800000 of 6400000 tuples (90%) done (elapsed 6.96 s, remaining 0.72 s) 5900000 of 6400000 tuples (92%) done (elapsed 7.10 s, remaining 0.60 s) 6000000 of 6400000 tuples (93%) done (elapsed 7.20 s, remaining 0.48 s) 6100000 of 6400000 tuples (95%) done (elapsed 7.33 s, remaining 0.36 s) 6200000 of 6400000 tuples (96%) done (elapsed 7.42 s, remaining 0.24 s) 6300000 of 6400000 tuples (98%) done (elapsed 7.54 s, remaining 0.12 s) 6400000 of 6400000 tuples (100%) done (elapsed 7.69 s, remaining 0.00 s) vacuuming... creating primary keys... done.
大概1.8G
$ du -sh data/ 1.8G data/ $ grep ^shared data/postgresql.conf shared_buffers = 2048MB
为了做实验,这里我重启了一下操作系统。
$ free -mw total used free shared buffers cache available Mem: 15885 478 14927 9 2 477 15230 Swap: 10239 0 10239
free命令会显示所有的空闲的、已经使用的物理内存和swap内存,包含内核使用的buffers和caches。信息来自/proc/meminfo 其中:
·total:全部的内存(/proc/meminfo中的MemTotal和SwapTotal)
重启后,系统几乎没有使用共享内存。477MB被cache了,14927MB是空闲的。
$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start waiting for server to start.... done server started $ free -mw total used free shared buffers cache available Mem: 15885 480 14853 72 2 549 15165 Swap: 10239 0 10239
启动postgresql实例后,几乎没有什么改变。因为linux采用的是lazy分配SHM的方式:shared buffers在使用的时候才会分配,因此这里只是增加了一点点。
$ /usr/local/pgsql/bin/psql postgres <<<" select pg_size_pretty(current_setting('block_size')::int*pg_prewarm('pgbench_accounts','buffer')); " pg_size_pretty ---------------- 820 MB (1 row) $ free -mw total used free shared buffers cache available Mem: 15885 479 13207 892 2 2196 14344 Swap: 10239 0 10239 $
在postgresql中,会在用户空间(userspace)分配一个共享段用于管理对页的并发写和频繁的读。但是与大多数RDBMS不同,这里没有进一步的优化,比如读的预取,写的重新排序。这意味着我们必须有足够的空间用于文件系统缓存。频繁读取的数据没有放入shared_buffers中,就需要从该共享段进行缓存。这就是postgresql的双缓存,双缓存都被统计在free命令的"cache"部分。
为了更好的预测性能。必须要理解文件系统缓存。可以借助工具linux-ftools。以下是linux-fincore获得的缓存的文件的信息:
$ linux-fincore -s $(find /usr/local/pgsql/data/base -type f) | grep ^total total cached size: 860,930,048
这里是821MB。不幸的是,你需要传递文件列表进行详细检查,所以这里只用了total。
$ linux-fincore -s -S 1048576 $(find /usr/local/pgsql/data/base -type f) filename size total_pages min_cached page cached_pages cached_size cached_perc -------- ---- ----------- --------------- ------------ ----------- ----------- /usr/local/pgsql/data/base/13287/16400 859,496,448 209,838 0 209,838 859,496,448 100.00 /usr/local/pgsql/data/base/13287/16408 143,777,792 35,102 -1 0 0 0.00 --- total cached size: 859,496,448
也可以借助pg_buffercache extension,快速地获得文件系统的大小
$ /usr/local/pgsql/bin/psql postgres <<<" select relname,relkind,current_setting('block_size')::int*relpages/1024/1024 as size_MB,current_setting('block_size')::int*buffers/1024/1024 as shared_mb,relfilenode,current_setting('data_directory')||'/'||pg_relation_filepath(c.oid) as file_path from pg_class c left outer join (select relfilenode, count(*) as buffers from pg_buffercache group by relfilenode) b using(relfilenode) where relpages>100;" | awk '/[/]base[/]/{"linux-fincore -s "$NF"* | grep ^total | cut -d: -f2" | getline f;printf "%6dM %s ",gensub(/,/,"","g",f)/1024/1024,$0;next}{printf "%6s %s ","",$0}' relname | relkind | size_mb | shared_mb | relfilenode | file_path -----------------------+---------+---------+-----------+-------------+---------------------------------------- 819M pgbench_accounts | r | 819 | 819 | 16400 | /usr/local/pgsql/data/base/13287/16400 0M pgbench_accounts_pkey | i | 137 | | 16408 | /usr/local/pgsql/data/base/13287/16408 (2 rows) $
释放刷新cache
# echo 3 > /proc/sys/vm/drop_caches; # free -mw total used free shared buffers cache available Mem: 15885 465 14322 894 0 1096 14362 Swap: 10239 0 10239
shared buffer仍然保持着。
$ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile stop waiting for server to shut down.... done server stopped $ free -mw total used free shared buffers cache available Mem: 15885 463 15210 9 0 211 15251 Swap: 10239 0 10239 $
如果你还想知道,这cache中的211MB是哪些内容
$ find / -type f -exec linux-fincore {} ; 2>/dev/null | awk '/^[/]/{gsub(/,/,"");m=$(NF-1)/1024/1024;gsub(/ */," ");if(m>1)printf "%10d MB %s ",m,$0}' | sort -h | tee all.log | sort -n | tail -10 2 MB /usr/lib64/dri/vmwgfx_dri.so 11038656 2695 0 708 2899968 26.27 2 MB /usr/lib64/gnome-shell/libgnome-shell.so 2844392 695 0 625 2560000 89.93 2 MB /usr/lib64/libc-2.17.so 2156352 527 0 527 2158592 100.00 2 MB /usr/lib64/libgtk-3.so.0.2200.26 7503704 1832 0 592 2424832 32.31 6 MB /usr/lib64/gjs/libmozjs-52.so 24401928 5958 0 1646 6742016 27.63 6 MB /usr/local/pgsql/bin/postgres 33821416 8258 0 1559 6385664 18.88 8 MB /usr/lib/locale/locale-archive 106176928 25923 0 2048 8388608 7.90 12 MB /usr/lib64/libLLVM-5.0-rhel.so 37261600 9098 0 3181 13029376 34.96 13 MB /usr/bin/containerd 49102360 11988 0 3516 14401536 29.33 25 MB /usr/bin/dockerd 105055544 25649 0 6503 26636288 25.35
这需要执行一些时间,但是可以显示cache的都是哪些文件。
到此,就可以理解cache统计信息的含义了。
我还添加了性能指标“tps”——每秒的事务量。在shared_buffers的范围内运行这个仅查询的工作负载会显示出稳定的性能(在单线程中约为13500 tps)。
当需要从文件系统缓存中拷贝一些页面时,性能会缓慢降低。 当到达超出物理RAM容量的范围时,物理读取就会涉及到,tps就会降低很多。 这里有14GB的RAM,当scale达到了570,除了4GB的shared_buffers外,还有8GB用于频繁读取数据。