PostgreSQL 高度技術者育成テキスト ch4 (性能監視とパフォーマンス) (2/2)
LPI-Japan OSS-DB Gold 認定教材 PostgreSQL 高度技術者育成テキスト
- 作者:河原 翔
- 発売日: 2014/10/27
- メディア: オンデマンド (ペーパーバック)
性能監視とパフォーマンスチューニング
実行計画のチューニング
- Oracle等に比べるとまだよわい
INDEXチューニング
- 略
ANALYZE
SELECT name,setting,context,unit FROM pg_settings WHERE name='autovacuum_naptime';
name | setting | context | unit --------------------+---------+---------+------ autovacuum_naptime | 60 | sighup | s (1 row)
- autovacuum/autoanalyzeは
autovacuum_naptime
の間隔でしか行われない - ので、大量行の挿入直後などは手動ANALYZEが必要なことも
work_mem
EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=30832.00..59603.01 rows=1000000 width=16) (actual time=162.829..637.762 rows=1000000 loops=1) Hash Cond: (t1.id = t2.id) -> Seq Scan on tbl t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..55.651 rows=1000000 loops=1) -> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=162.626..162.626 rows=1000000 loops=1) Buckets: 131072 Batches: 16 Memory Usage: 3251kB -> Seq Scan on tbl t2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.004..58.547 rows=1000000 loops=1) Planning Time: 0.185 ms Execution Time: 661.598 ms (8 rows)
- Batches: 16とある
Buckets: 131072 Batches: 16 Memory Usage: 3251kB
- HashのBatchが1より大きい場合、一時ファイルが生成されている
work_mem
を増やすことで回避できうる- デフォルト
4MB
- デフォルト
SET work_mem='64MB'; EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=26925.00..43975.01 rows=1000000 width=16) (actual time=217.614..557.272 rows=1000000 loops=1) Hash Cond: (t1.id = t2.id) -> Seq Scan on tbl t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.006..56.497 rows=1000000 loops=1) -> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=216.507..216.507 rows=1000000 loops=1) Buckets: 1048576 Batches: 1 Memory Usage: 43739kB -> Seq Scan on tbl t2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.011..58.557 rows=1000000 loops=1) Planning Time: 0.117 ms Execution Time: 582.205 ms (8 rows)
- 一時ファイルのログ出力
- 指定のサイズよりも大きな一時ファイルをロギングする
- 0でぜんぶ
SELECT * FROM pg_settings WHERE name='log_temp_files';
-[ RECORD 1 ]---+--------------------------------------------------------------------- name | log_temp_files setting | -1 unit | kB category | Reporting and Logging / What to Log short_desc | Log the use of temporary files larger than this number of kilobytes. extra_desc | Zero logs all files. The default is -1 (turning this feature off). context | superuser vartype | integer source | default min_val | -1 max_val | 2147483647 enumvals | boot_val | -1 reset_val | -1 sourcefile | sourceline | pending_restart | f
SET log_temp_files=0; RESET work_mem; -- reset to 4MB (default) EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=30832.00..59603.01 rows=1000000 width=16) (actual time=158.349..681.531 rows=1000000 loops=1) Hash Cond: (t1.id = t2.id) -> Seq Scan on tbl t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..55.873 rows=1000000 loops=1) -> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=158.150..158.150 rows=1000000 loops=1) Buckets: 131072 Batches: 16 Memory Usage: 3251kB -> Seq Scan on tbl t2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.004..56.045 rows=1000000 loops=1) Planning Time: 0.132 ms Execution Time: 706.652 ms (8 rows)
Buckets: 131072 Batches: 16 Memory Usage: 3251kB
- Batchesが16なので、1つはメモリ上、残り15は一時ファイルが書き出されている
- 一時ファイルのログが出力される
2020-02-29 03:57:57.064 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.40", size 1523980 2020-02-29 03:57:57.064 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.083 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.55", size 1523980 2020-02-29 03:57:57.083 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.091 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.42", size 1522212 2020-02-29 03:57:57.091 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.113 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.57", size 1522212 2020-02-29 03:57:57.113 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.116 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.37", size 1520872 2020-02-29 03:57:57.116 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.141 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.52", size 1520872 2020-02-29 03:57:57.141 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.146 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.34", size 1531004 2020-02-29 03:57:57.146 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.167 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.49", size 1531004 2020-02-29 03:57:57.167 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.172 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.39", size 1528456 2020-02-29 03:57:57.172 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.193 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.54", size 1528456 2020-02-29 03:57:57.193 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.197 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.36", size 1529940 2020-02-29 03:57:57.197 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.218 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.51", size 1529940 2020-02-29 03:57:57.218 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.223 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.35", size 1520384 2020-02-29 03:57:57.223 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.246 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.50", size 1520384 2020-02-29 03:57:57.246 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.252 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.44", size 1522900 2020-02-29 03:57:57.252 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.274 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.59", size 1522900 2020-02-29 03:57:57.274 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.279 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.30", size 1520396 2020-02-29 03:57:57.279 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.300 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.45", size 1520396 2020-02-29 03:57:57.300 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.306 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.33", size 1526832 2020-02-29 03:57:57.306 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.327 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.48", size 1526832 2020-02-29 03:57:57.327 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.331 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.41", size 1523340 2020-02-29 03:57:57.331 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.352 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.56", size 1523340 2020-02-29 03:57:57.352 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.358 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.38", size 1522840 2020-02-29 03:57:57.358 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.379 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.53", size 1522840 2020-02-29 03:57:57.379 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.383 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.43", size 1538092 2020-02-29 03:57:57.383 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.401 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.58", size 1538092 2020-02-29 03:57:57.401 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.407 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.32", size 1525620 2020-02-29 03:57:57.407 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.425 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.47", size 1525620 2020-02-29 03:57:57.425 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.429 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.31", size 1514544 2020-02-29 03:57:57.429 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id; 2020-02-29 03:57:57.446 UTC [565] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp565.46", size 1514544 2020-02-29 03:57:57.446 UTC [565] STATEMENT: EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
effective_cache_size
- 利用可能なディスクキャッシュの推定値
Sets the planner's assumption about the total size of the data caches.
That is, the total size of the caches (kernel cache and shared buffers) used for PostgreSQL data files. This is measured in disk pages, which are normally 8 kB each.
プランナコスト定数
postgresql.conf
# - Planner Cost Constants - #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above #parallel_tuple_cost = 0.1 # same scale as above #parallel_setup_cost = 1000.0 # same scale as above #jit_above_cost = 100000 # perform JIT compilation if available # and query more expensive than this; # -1 disables #jit_inline_above_cost = 500000 # inline small functions if query is # more expensive than this; -1 disables #jit_optimize_above_cost = 500000 # use expensive JIT optimizations if # query is more expensive than this; # -1 disables #min_parallel_table_scan_size = 8MB #min_parallel_index_scan_size = 512kB #effective_cache_size = 4GB
#random_page_cost = 4.0 # same scale as above
- SSDなどではもっと1に近づける
プランナメソッド設定の変更
postgresql.conf
# - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_indexonlyscan = on #enable_material = on #enable_mergejoin = on #enable_nestloop = on #enable_parallel_append = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on #enable_partitionwise_join = off #enable_partitionwise_aggregate = off #enable_parallel_hash = on #enable_partition_pruning = on
- プランタイプを個別に有効/無効にする
- Hash Joinが選ばれている例
EXPLAIN ANALYZE SELECT * FROM tbl t1, tbl t2 WHERE t1.id = t2.id;
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=30832.00..59603.01 rows=1000000 width=16) (actual time=160.068..668.456 rows=1000000 loops=1) Hash Cond: (t1.id = t2.id) -> Seq Scan on tbl t1 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.006..54.869 rows=1000000 loops=1) -> Hash (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=159.871..159.871 rows=1000000 loops=1) Buckets: 131072 Batches: 16 Memory Usage: 3251kB -> Seq Scan on tbl t2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.003..56.715 rows=1000000 loops=1) Planning Time: 0.116 ms Execution Time: 693.012 ms (8 rows)
- Hash Joinを無効化してみる
SET enable_hashjoin=off; EXPLAIN ANALYZE SELECT * FROM tbl t1, tbl t2 WHERE t1.id = t2.id;
- Merge Joinになった
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.85..75816.85 rows=1000000 width=16) (actual time=0.011..422.687 rows=1000000 loops=1) Merge Cond: (t1.id = t2.id) -> Index Scan using tbl_pkey on tbl t1 (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.005..115.109 rows=1000000 loops=1) -> Index Scan using tbl_pkey on tbl t2 (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.004..103.403 rows=1000000 loops=1) Planning Time: 0.116 ms Execution Time: 446.814 ms (6 rows)
- 内部的には、当該プランタイプのコストに巨大な値が設定されている
- TID Scanとは
SELECT ctid,* FROM tbl LIMIT 1;
ctid | id | col -------+----+----- (0,1) | 1 | 1 (1 row)
- Oracleの
ROWID
みたいなやつ- これによる検索は速い
- が、VACUUM FULLで変わってしまう
テーブル結合最適化処理の実行制御
- プランナによるクエリ書き換え
- 公式
SELECT * FROM x, y, (SELECT * FROM a, b, c WHERE something) AS ss WHERE somethingelse;
- がこう書き換わったりする
SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
- 結合数が増えてくると、実行計画の作成時間が無視できなくなってくる
EXPLAIN ANALYZE SELECT * FROM a a1, a a2, a a3, a a4, a a5, a a6, a a7, a a8, a a9, a a10, a a11, a a12;
... Planning Time: 8.445 ms
- FROM項目数に上限を設けることでこれを回避する
- 実行計画の質は下がるかも
- 公式
from_collapse_limit
- サブクエリのFROMを外側のクエリのFROMに移動する際の項目数上限 (上述の例)
join_collapse_limi
- JOINをFROMに書き換える際のFROM項目数上限
geqo_threshold
よりも小さな値を設定すること- デフォルト値12
- これを超えると、総当たりからGEQOに切り替わる
EXPLAIN ANALYZE SELECT * FROM a a1, a a2, a a3, a a4, a a5, a a6, a a7, a a8, a a9, a a10, a a11, a a12, a a13;
... Planning Time: 6.791 ms
8.445 ms
から短くなった
EXPLAINからのチューニング例
SET enable_bitmapscan = off; EXPLAIN ANALYZE SELECT * FROM tbl WHERE tbl.id < 10 OR tbl.id > 9999990;
QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Gather (cost=1000.00..11676.00 rows=10 width=8) (actual time=0.325..72.681 rows=9 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on tbl (cost=0.00..10675.00 rows=4 width=8) (actual time=12.758..34.594 rows=3 loops=3) Filter: ((id < 10) OR (id > 9999990)) Rows Removed by Filter: 333330 Planning Time: 0.080 ms Execution Time: 72.695 ms (8 rows)
- 子ノードから順に問題箇所を探す
- 初期ノードと総コストの差が大きい箇所
- costとactual timeの比率がおかしい箇所
- プランナコスト定数に誤りがある可能性
- 上の例では、明らかにSeq Scanが問題箇所
- インデックスが効いたらこんな感じになる
EXPLAIN ANALYZE (SELECT * FROM tbl WHERE tbl.id < 10) UNION ALL (SELECT * FROM tbl WHERE tbl.id > 999990);
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Append (cost=0.42..17.53 rows=21 width=8) (actual time=0.003..0.008 rows=19 loops=1) -> Index Scan using tbl_pkey on tbl (cost=0.42..8.60 rows=10 width=8) (actual time=0.002..0.003 rows=9 loops=1) Index Cond: (id < 10) -> Index Scan using tbl_pkey on tbl tbl_1 (cost=0.42..8.62 rows=11 width=8) (actual time=0.002..0.003 rows=10 loops=1) Index Cond: (id > 999990) Planning Time: 0.089 ms Execution Time: 0.019 ms (7 rows)
- デフォルト設定ならビットマップスキャンが効くところなんですけど
RESET enable_bitmap_scan; EXPLAIN ANALYZE SELECT * FROM tbl WHERE tbl.id < 10 OR tbl.id > 9999990;
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on tbl (cost=8.94..47.66 rows=10 width=8) (actual time=0.007..0.007 rows=9 loops=1) Recheck Cond: ((id < 10) OR (id > 9999990)) Heap Blocks: exact=1 -> BitmapOr (cost=8.94..8.94 rows=10 width=0) (actual time=0.005..0.005 rows=0 loops=1) -> Bitmap Index Scan on tbl_pkey (cost=0.00..4.50 rows=10 width=0) (actual time=0.002..0.003 rows=9 loops=1) Index Cond: (id < 10) -> Bitmap Index Scan on tbl_pkey (cost=0.00..4.43 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (id > 9999990) Planning Time: 0.084 ms Execution Time: 0.023 ms (10 rows)
I/Oチューニング
WALチューニング
WAL書き込み量・頻度
wal_buffers
- WALをディスクに書き込む1回あたりのデータ量の上限
- 大きくするとI/O回数が減る
- デフォルト(
-1
)でshared_buffers
の1/32
wal_writer_delay
- wal writerのwal出力の間隔
- 公式
wal_writer_delay (integer) Specifies how often the WAL writer flushes WAL, in time terms. After flushing WAL the writer sleeps for the length of time given by wal_writer_delay, unless woken up sooner by an asynchronously committing transaction.
WALの同期/非同期書き込み・適用
#synchronous_commit = on # synchronization level; # off, local, remote_write, remote_apply, or on
- ストリーミングレプリケーションとかで設定するやつ
- 非同期 (プライマリはスタンバイのWAL処理を待たない)
off
- クライアントはプライマリのWAL処理を待たない
local
- クライアントはプライマリのWAL処理を待つ
- 同期 (プライマリはスタンバイのWAL処理を待つ)
remote_write
- リモートのメモリに乗るまで待つ
on
- リモートのディスクにWALが書き込まれるまで待つ
remote_apply
- リモートでWALが適用されるまで待つ
- 非同期 (プライマリはスタンバイのWAL処理を待たない)
commitをまとめるやつ
SELECT * FROM pg_settings where name in ('commit_siblings', 'commit_delay');
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart -----------------+---------+------+----------------------------+-------------------------------------------------------------------------------------+------------+-----------+---------+---------+---------+---------+----------+----------+-----------+------------+------------+----------------- commit_delay | 0 | | Write-Ahead Log / Settings | Sets the delay in microseconds between transaction commit and flushing WAL to disk. | | superuser | integer | default | 0 | 100000 | | 0 | 0 | | | f commit_siblings | 5 | | Write-Ahead Log / Settings | Sets the minimum concurrent open transactions before performing commit_delay. | | user | integer | default | 0 | 1000 | | 5 | 5 | | | f (2 rows)
- commitをまとめて書き込む
性能とdurabilityを引き換えにするやつ
fsync
- 有効化すると、OSのfsync()システムコール等を発行し、更新が物理的にディスクに書き込まれたかの確証を得ようと試みる
- OSやH/Wがクラッシュしたときの備え
- 有効化すると、OSのfsync()システムコール等を発行し、更新が物理的にディスクに書き込まれたかの確証を得ようと試みる
full_page_writes
- 有効化すると、チェックポイント後、最初の変更をページに加える際、ページ全体をWALに書き込む
- クラッシュリカバリ用情報
- 有効化すると、チェックポイント後、最初の変更をページに加える際、ページ全体をWALに書き込む
- 無効にするとdurabilityと引き換えに性能向上する可能性がある
wal_sync_method
fsync
有効時、実際に利用するOSのシステムコール
SELECT * FROM pg_settings WHERE name = 'wal_sync_method';
-[ RECORD 1 ]---+--------------------------------------------------------- name | wal_sync_method setting | fdatasync unit | category | Write-Ahead Log / Settings short_desc | Selects the method used for forcing WAL updates to disk. extra_desc | context | sighup vartype | enum source | default min_val | max_val | enumvals | {fsync,fdatasync,open_sync,open_datasync} boot_val | fdatasync reset_val | fdatasync sourcefile | sourceline | pending_restart | f
wal_sync_method (enum) Method used for forcing WAL updates out to disk. If fsync is off then this setting is irrelevant, since WAL file updates will not be forced out at all. Possible values are: * open_datasync (write WAL files with open() option O_DSYNC) * fdatasync (call fdatasync() at each commit) * fsync (call fsync() at each commit) * fsync_writethrough (call fsync() at each commit, forcing write-through of any disk write cache) * open_sync (write WAL files with open() option O_SYNC)
チェックポイントチューニング
SELECT * FROM pg_settings WHERE name LIKE 'checkpoint_%';
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart ------------------------------+---------+------+-------------------------------+------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+---------+---------+------------+----------+----------+-----------+------------+------------+----------------- checkpoint_completion_target | 0.5 | | Write-Ahead Log / Checkpoints | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval. | | sighup | real | default | 0 | 1 | | 0.5 | 0.5 | | | f checkpoint_flush_after | 32 | 8kB | Write-Ahead Log / Checkpoints | Number of pages after which previously performed writes are flushed to disk. | | sighup | integer | default | 0 | 256 | | 32 | 32 | | | f checkpoint_timeout | 300 | s | Write-Ahead Log / Checkpoints | Sets the maximum time between automatic WAL checkpoints. | | sighup | integer | default | 30 | 86400 | | 300 | 300 | | | f checkpoint_warning | 30 | s | Write-Ahead Log / Checkpoints | Enables warnings if checkpoint segments are filled more frequently than this. | Write a message to the server log if checkpoints caused by the filling of checkpoint segment files happens more frequently than this number of seconds. Zero turns off the warning. | sighup | integer | default | 0 | 2147483647 | | 30 | 30 | | | f (4 rows)
max_wal_size
- WALファイルサイズがこの設定値を超えると自動checkpointが実行される
checkpoint_timeout
- 自動チェックポイントの最大間隔時間
checkpoint_completion_target
- 目標完了時間
checkpoint_timeout
に対する割合
checkpoint_warning
- パラメータの設定値未満の間隔で自動checkpointが発生した場合、
max_wal_size
の増加を進めるログを出力する- 昔は
checkpoint_segments
- 昔は
- パラメータの設定値未満の間隔で自動checkpointが発生した場合、
2020-02-29 05:32:28.224 UTC [57] LOG: checkpoints are occurring too frequently (13 seconds apart) 2020-02-29 05:32:28.224 UTC [57] HINT: Consider increasing the configuration parameter "max_wal_size".
バックグラウンドライタチューニング
SELECT * FROM pg_settings WHERE name LIKE 'bgwriter_%';
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart -------------------------+---------+------+------------------------------------+------------------------------------------------------------------------------+------------+---------+---------+---------+---------+------------+----------+----------+-----------+------------+------------+----------------- bgwriter_delay | 200 | ms | Resource Usage / Background Writer | Background writer sleep time between rounds. | | sighup | integer | default | 10 | 10000 | | 200 | 200 | | | f bgwriter_flush_after | 64 | 8kB | Resource Usage / Background Writer | Number of pages after which previously performed writes are flushed to disk. | | sighup | integer | default | 0 | 256 | | 64 | 64 | | | f bgwriter_lru_maxpages | 100 | | Resource Usage / Background Writer | Background writer maximum number of LRU pages to flush per round. | | sighup | integer | default | 0 | 1073741823 | | 100 | 100 | | | f bgwriter_lru_multiplier | 2 | | Resource Usage / Background Writer | Multiple of the average buffer usage to free per round. | | sighup | real | default | 0 | 10 | | 2 | 2 | | | f (4 rows)
bgwriter_delay
- backgroud writerプロセスの起動周期
- 1周期で書き込むdirtyページ数の上限
bgwriter_lru_maxpages, bgwriter_lru_multiplier x 最近の周期で書き込んだページ平均数の平均
bgwriter_lru_multiplier
- 1.0: 必要なぶんちょうど
- ~1.0: スパイク等に備えてクッション
- 1.0~: dirty pageの書き戻しをサーバープロセスに意図的に委ねる
- background writerによる書き込みを増やすと
- checkpointerの負荷は下がる
- システム全体としての書き込み量は増加する可能性がある
- checkpointerのみで済むところが、background writerによる書き込みも生ずる
ディスク先読み
effective_io_concurrency
SELECT * FROM pg_settings WHERE name='effective_io_concurrency';
-[ RECORD 1 ]---+----------------------------------------------------------------------------------------- name | effective_io_concurrency setting | 1 unit | category | Resource Usage / Asynchronous Behavior short_desc | Number of simultaneous requests that can be handled efficiently by the disk subsystem. extra_desc | For RAID arrays, this should be approximately the number of drive spindles in the array. context | user vartype | integer source | default min_val | 0 max_val | 1000 enumvals | boot_val | 1 reset_val | 1 sourcefile | sourceline | pending_restart | f
- 大きくすると、Bitmap Heap Scanが速くなる
For RAID arrays, this should be approximately the number of drive spindles in the array.
ディスクI/Oの分散
テーブル空間
- アクセス頻度の低いデータを、安価で低速なストレージに移す
- アクセス頻度の高いデータを、高価で高速なストレージに移す
統計情報データの出力先変更
SELECT * FROM pg_settings WHERE name='stats_temp_directory';
-[ RECORD 1 ]---+-------------------------------------------------------------- name | stats_temp_directory setting | pg_stat_tmp unit | category | Statistics / Query and Index Statistics Collector short_desc | Writes temporary statistics files to the specified directory. extra_desc | context | sighup vartype | string source | default min_val | max_val | enumvals | boot_val | pg_stat_tmp reset_val | pg_stat_tmp sourcefile | sourceline | pending_restart | f
ls -lA pg_stat_tmp
total 28 -rw------- 1 postgres postgres 2033 Feb 29 06:10 db_0.stat -rw------- 1 postgres postgres 14370 Feb 29 06:10 db_13408.stat -rw------- 1 postgres postgres 639 Feb 29 06:10 global.stat -rw------- 1 postgres postgres 1923 Feb 29 06:10 pgss_query_texts.stat
- 統計情報データ一時的な格納先
- RAMディスク等にすると速い
メモリのチューニング
shared_buffers
- 大きすぎはよくない
- かえってswapが頻発する
- 設定値が小さく「共有バッファのキャッシュミス」した場合でも、ディスクアクセスが生じるとは限らない
- Linuxでは余ったメモリ領域をディスクキャッシュとして利用するため
maintenance_work_mem
- VACUUMとかREINDEXとかで使うやつ
#maintenance_work_mem = 64MB # min 1MB #autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
autovacuum_work_mem
を設定しない場合、自動バキュームでもこの設定値が利用されるautovacuum_max_workers
を掛けたぶんが確保されうることに注意する
ロックのチューニング
pg_locks
SELECT * FROM pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+-----------------+---------+---------- relation | 13408 | 12143 | | | | | | | | 3/86 | 565 | AccessShareLock | t | t virtualxid | | | | | 3/86 | | | | | 3/86 | 565 | ExclusiveLock | t | t (2 rows)
BEGIN; SELECT * FROM tbl WHERE id = 1 FOR UPDATE;
id | col ----+----- 1 | 1 (1 row)
SELECT * FROM pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+-----------------+---------+---------- relation | 13408 | 12143 | | | | | | | | 3/87 | 565 | AccessShareLock | t | t relation | 13408 | 16398 | | | | | | | | 3/87 | 565 | RowShareLock | t | t relation | 13408 | 16395 | | | | | | | | 3/87 | 565 | RowShareLock | t | t virtualxid | | | | | 3/87 | | | | | 3/87 | 565 | ExclusiveLock | t | t transactionid | | | | | | 506 | | | | 3/87 | 565 | ExclusiveLock | t | f (5 rows)
ABORT;
- より詳細な情報を得るためには、他のとビューやテーブルとJOINして使う
BEGIN; SELECT * FROM tbl WHERE id = 1 FOR UPDATE; SELECT class.relname,locks.*,activity.* FROM pg_locks locks JOIN pg_stat_activity activity ON locks.database=activity.datid JOIN pg_class class ON locks.relation = class.oid WHERE class.relname='tbl';
-[ RECORD 1 ]------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- relname | tbl locktype | relation database | 13408 relation | 16395 page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | 3/104 pid | 565 mode | RowShareLock granted | t fastpath | t datid | 13408 datname | postgres pid | 565 usesysid | 10 usename | postgres application_name | psql client_addr | client_hostname | client_port | -1 backend_start | 2020-02-29 03:24:18.912878+00 xact_start | 2020-02-29 05:17:29.375437+00 query_start | 2020-02-29 05:17:40.801137+00 state_change | 2020-02-29 05:17:40.801138+00 wait_event_type | wait_event | state | active backend_xid | 509 backend_xmin | 509 query | select class.relname,locks.*,activity.* from pg_locks locks join pg_stat_activity activity on locks.database=activity.datid join pg_class class on locks.relation = class.oid where class.relname='tbl'; backend_type | client backend
deadlock_timeout
- デッドロック検査は高コストなのでなるべく避けたい
SELECT name,setting,context,unit FROM pg_settings WHERE name='deadlock_timeout';
name | setting | context | unit ------------------+---------+-----------+------ deadlock_timeout | 1000 | superuser | ms (1 row)
スロークエリの検出
log_min_duration_statement
#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements # and their durations, > 0 logs only # statements running at least this number # of milliseconds
- 指定の値よりも時間がかかるSELECT文を投げるとロギングされる
INSERT INTO tbl VALUES (generate_series(1,10000000)); SET log_min_duration_statement='100ms'; SELECT AVG(id) FROM tbl;
2020-02-29 05:34:06.609 UTC [565] LOG: duration: 286.595 ms statement: SELECT AVG(id) FROM tbl;
- INSERTとかは出ない
auto_explain
- モジュールのロードが必要
- 接続中のセッションのみ
LOAD 'auto_explain';
- 全セッション
- #shared_preload_libraries = '' # (change requires restart) + shared_preload_libraries = 'auto_explain';
- スロークエリの実行計画をログ出力する例
LOAD 'auto_explain'; SET auto_explain.log_min_duration = '100ms'; SELECT AVG(id) FROM tbl;
2020-02-29 05:38:04.622 UTC [565] LOG: duration: 279.341 ms plan: Query Text: SELECT AVG(id) FROM tbl; Finalize Aggregate (cost=97331.80..97331.81 rows=1 width=32) -> Gather (cost=97331.58..97331.79 rows=2 width=32) Workers Planned: 2 -> Partial Aggregate (cost=96331.58..96331.59 rows=1 width=32) -> Parallel Seq Scan on tbl (cost=0.00..85914.87 rows=4166687 width=4) 2020-02-29 05:38:04.622 UTC [565] LOG: duration: 279.552 ms statement: SELECT AVG(id) FROM tbl;
pg_stat_statements
shared_preload_libraries
の設定、およびCREATE EXTENSION
で有効化する
ALTER SYSTEM SET shared_preload_libraries='pg_stat_statements';
- PostgreSQL restart
CREATE EXTENSION pg_stat_statements; SELECT extname,extversion FROM pg_extension;
extname | extversion --------------------+------------ plpgsql | 1.0 pg_stat_statements | 1.7 (2 rows)
- 実行された全てのSQL文の実行時の統計情報を記録する
pg_stat_statements
システムビューで取得
\d+ pg_stat_statements;
View "public.pg_stat_statements" Column | Type | Collation | Nullable | Default | Storage | Description ---------------------+------------------+-----------+----------+---------+----------+------------- userid | oid | | | | plain | dbid | oid | | | | plain | queryid | bigint | | | | plain | query | text | | | | extended | calls | bigint | | | | plain | total_time | double precision | | | | plain | min_time | double precision | | | | plain | max_time | double precision | | | | plain | mean_time | double precision | | | | plain | stddev_time | double precision | | | | plain | rows | bigint | | | | plain | shared_blks_hit | bigint | | | | plain | shared_blks_read | bigint | | | | plain | shared_blks_dirtied | bigint | | | | plain | shared_blks_written | bigint | | | | plain | local_blks_hit | bigint | | | | plain | local_blks_read | bigint | | | | plain | local_blks_dirtied | bigint | | | | plain | local_blks_written | bigint | | | | plain | temp_blks_read | bigint | | | | plain | temp_blks_written | bigint | | | | plain | blk_read_time | double precision | | | | plain | blk_write_time | double precision | | | | plain | View definition: SELECT pg_stat_statements.userid, pg_stat_statements.dbid, pg_stat_statements.queryid, pg_stat_statements.query, pg_stat_statements.calls, pg_stat_statements.total_time, pg_stat_statements.min_time, pg_stat_statements.max_time, pg_stat_statements.mean_time, pg_stat_statements.stddev_time, pg_stat_statements.rows, pg_stat_statements.shared_blks_hit, pg_stat_statements.shared_blks_read, pg_stat_statements.shared_blks_dirtied, pg_stat_statements.shared_blks_written, pg_stat_statements.local_blks_hit, pg_stat_statements.local_blks_read, pg_stat_statements.local_blks_dirtied, pg_stat_statements.local_blks_written, pg_stat_statements.temp_blks_read, pg_stat_statements.temp_blks_written, pg_stat_statements.blk_read_time, pg_stat_statements.blk_write_time FROM pg_stat_statements(true) pg_stat_statements(userid, dbid, queryid, query, calls, total_time, min_time, max_time, mean_time, stddev_time, rows, shared_blks_hit, shared_blks_read, shared_blks_dirtied, shared_blks_written, local_blks_hit, local_blks_read, local_blks_dirtied, local_blks_written, temp_blks_read, temp_blks_written, blk_read_time, blk_write_time);
- クエリ実行時間、共有バッファヒット率等がわかる
SELECT * FROM pg_stat_statements WHERE query ~ 'AVG';
-[ RECORD 1 ]-------+------------------------ userid | 10 dbid | 13408 queryid | -2183878846915592520 query | SELECT AVG(id) FROM tbl calls | 1 total_time | 261.81870000000004 min_time | 261.81870000000004 max_time | 261.81870000000004 mean_time | 261.81870000000004 stddev_time | 0 rows | 1 shared_blks_hit | 0 shared_blks_read | 44248 shared_blks_dirtied | 0 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 0 blk_write_time | 0
local_*
- 一時テーブルで使うやつ
CREATE TEMPORARY TABLE tmptbl (col int); INSERT INTO tmptbl VALUES (generate_series(1,100)); SELECT * FROM tmptbl; SELECT * FROM pg_stat_statements WHERE query = 'SELECT * FROM tmptbl';
-[ RECORD 1 ]-------+--------------------- userid | 10 dbid | 13408 queryid | -4296624328652239193 query | SELECT * FROM tmptbl calls | 1 total_time | 0.0196 min_time | 0.0196 max_time | 0.0196 mean_time | 0.0196 stddev_time | 0 rows | 100 shared_blks_hit | 0 shared_blks_read | 0 shared_blks_dirtied | 0 shared_blks_written | 0 local_blks_hit | 1 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 blk_read_time | 0 blk_write_time | 0
temp_blks
- 一時ファイル
- HashのBatchが1より大きい場合とかの
- 一時ファイル
EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=308312.08..595999.70 rows=10000048 width=16) (actual time=1770.283..5999.249 rows=10000000 loops=1) Hash Cond: (t1.id = t2.id) -> Seq Scan on tbl t1 (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.007..624.598 rows=10000000 loops=1) -> Hash (cost=144248.48..144248.48 rows=10000048 width=8) (actual time=1767.762..1767.762 rows=10000000 loops=1) Buckets: 131072 Batches: 256 Memory Usage: 2400kB -> Seq Scan on tbl t2 (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.004..630.673 rows=10000000 loops=1) Planning Time: 0.171 ms JIT: Functions: 10 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.568 ms, Inlining 1.255 ms, Optimization 28.625 ms, Emission 13.454 ms, Total 43.902 ms Execution Time: 6247.816 ms (12 rows)
- 一時ファイル使いまくり
Buckets: 131072 Batches: 256 Memory Usage: 2400kB
SELECT * FROM pg_stat_statements WHERE query='EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id';
-[ RECORD 1 ]-------+-------------------------------------------------------------- userid | 10 dbid | 13408 queryid | -8182901821120542956 query | EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id calls | 2 total_time | 12790.9215 min_time | 6248.0437 max_time | 6542.8778 mean_time | 6395.46075 stddev_time | 147.41705000000002 rows | 0 shared_blks_hit | 17266 shared_blks_read | 159784 shared_blks_dirtied | 0 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_dirtied | 0 local_blks_written | 0 temp_blks_read | 117232 temp_blks_written | 117232 blk_read_time | 0 blk_write_time | 0