文章目录
- 环境
- 症状
- 问题原因
- 解决方案
环境
系统平台:N/A
版本:4.1.1
症状
在业务并不繁忙的情况下发现服务器负载较高,执行SQL的速度明显缓慢。检查数据库TOP SQL,发现与表test有关,查看该表的统计值,live_tuple只有91966条,dead tuple为702346条。
highgo=# select n_live_tup,n_dead_tup from pg_stat_all_tables where relname = 'test';n_live_tup|n_dead_tup------------+------------91966|702346(1row)手动对该表执行VACUUM同样无法回收dead tuple,报错如下:
highgo=# vacuum(verbose,analyze) test; INFO: 00000: vacuuming "public.test" INFO: 00000: index "index_test_content_id" now contains 146076 row versions in 698 pages DETAIL: 0 index row versions were removed. 175 index pages have been deleted, 126 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: 00000: index "test_pkey" now contains 146076 row versions in 568 pages DETAIL: 0 index row versions were removed. 77 index pages have been deleted, 68 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: 00000: "test": found 0 removable, 789485 nonremovable row versions in 10178 out of 10276 pages DETAIL: 702346 dead row versions cannot be removed yet. There were 59707 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.04s/0.15u sec elapsed 0.20 sec. INFO: 00000: vacuuming "pg_toast.pg_toast_16737" INFO: 00000: index "pg_toast_16737_index" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.13 sec. INFO: 00000: "pg_toast_16737": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.59 sec. INFO: 00000: analyzing "public.test" INFO: 00000: "test": scanned 10276 of 10276 pages, containing 90959 live rows and 702346 dead rows; 30000 rows in sample, 90959 estimated total row问题原因
当我们开启一个事务后,为了实现多版本并发控制,数据库会阻止vacuum清理未结束的长事务之后产生的所有dead tuple,即使与该事务无关的表也是如此。对于未结束的长事务而言,之后产生的所有dead tuple对于该事务都是可见的,所以即使手动执行vacuum也无法进行清理。
- Live transactions performing a write operation in any table will prevent vacuuming dead rows
generated by commited transactions that started after first live transaction in any other table.
解决方案
检查数据库中存在的长事务:
highgo=#selectusename,datname,state,waiting,xact_start,now(),query,pid,now()-xact_startastimefrompg_stat_activitywherenow()-xact_start>interval'5 sec'andquery!~'^COPY'andstate<>'idle'orderbyxact_start;-[RECORD1]-----------------------------------------------------------------------------------------------------------------------------------------------------------------usename|highgo datname|highgo state|idleintransactionwaiting|f xact_start|2018-07-0211:09:22.607884+08now|2018-07-0416:59:43.212717+08query|selectxxxxxxxx pid|17924time|2day05:50:26.909971检查该SQL执行未结束的原因,如需尽快恢复前台应用,可以在确认安全的情况下使用如下命令,将该长事务终止。
SELECTpg_terminate_backend(17924);再次对该表手动VACUUM:
highgo=# vacuum(verbose,analyze) test;
INFO: 00000: vacuuming “public.test”
INFO: 00000: scanned index “index_test_content_id” to remove 62482 row versions
DETAIL: CPU 0.01s/0.09u sec elapsed 0.26 sec
INFO: 00000: scanned index “test_pkey” to remove 62482 row versions
DETAIL: CPU 0.01s/0.08u sec elapsed 0.19 sec
INFO: 00000: “test”: removed 62482 row versions in 11041 pages
DETAIL: CPU 0.11s/1.30u sec elapsed 1.52 sec
INFO: 00000: index “index_test_content_id” now contains 91106 row versions in 698 pages
DETAIL: 62482 index row versions were removed.
346 index pages have been deleted, 152 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: 00000: index “test_pkey” now contains 91106 row versions in 568 pages
DETAIL: 62482 index row versions were removed.
244 index pages have been deleted, 50 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: 00000: “test”: found 781909 removable, 87305 nonremovable row versions in 11179 out of 11215 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 777839 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.32s/2.97u sec elapsed 3.96 sec.
INFO: 00000: vacuuming “pg_toast.pg_toast_16737”
INFO: 00000: index “pg_toast_16737_index” now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: 00000: “pg_toast_16737”: found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: 00000: analyzing “public.test”
INFO: 00000: “test”: scanned 11215 of 11215 pages, containing 91106 live rows and 0 dead rows; 30000 rows in sample, 91106 estimated total rows
VACUUM
测试问题SQL,确保该问题不会再次发生。