勉強日記

チラ裏

PostgreSQL 高度技術者育成テキスト ch4 (性能監視とパフォーマンス) (2/2)

LPI-Japan OSS-DB Gold 認定教材 PostgreSQL 高度技術者育成テキスト

LPI-Japan OSS-DB Gold 認定教材 PostgreSQL 高度技術者育成テキスト

  • 作者:河原 翔
  • 発売日: 2014/10/27
  • メディア: オンデマンド (ペーパーバック)


性能監視とパフォーマンスチューニング

実行計画のチューニング

  • Oracle等に比べるとまだよわい

INDEXチューニング

ANALYZE

SELECT name,setting,context,unit FROM pg_settings WHERE name='autovacuum_naptime';
        name        | setting | context | unit 
--------------------+---------+---------+------
 autovacuum_naptime | 60      | sighup  | s
(1 row)
  • autovacuum/autoanalyzeはautovacuum_naptimeの間隔でしか行われない
  • ので、大量行の挿入直後などは手動ANALYZEが必要なことも

work_mem

EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=30832.00..59603.01 rows=1000000 width=16) (actual time=162.829..637.762 rows=1000000 loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on tbl t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..55.651 rows=1000000 loops=1)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=162.626..162.626 rows=1000000 loops=1)
         Buckets: 131072  Batches: 16  Memory Usage: 3251kB
         ->  Seq Scan on tbl t2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.004..58.547 rows=1000000 loops=1)
 Planning Time: 0.185 ms
 Execution Time: 661.598 ms
(8 rows)
  • Batches: 16とある
          Buckets: 131072  Batches: 16  Memory Usage: 3251kB
  • HashのBatchが1より大きい場合、一時ファイルが生成されている
  • work_memを増やすことで回避できうる
    • デフォルト4MB
SET work_mem='64MB';
EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=26925.00..43975.01 rows=1000000 width=16) (actual time=217.614..557.272 rows=1000000 loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on tbl t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.006..56.497 rows=1000000 loops=1)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=216.507..216.507 rows=1000000 loops=1)
         Buckets: 1048576  Batches: 1  Memory Usage: 43739kB
         ->  Seq Scan on tbl t2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.011..58.557 rows=1000000 loops=1)
 Planning Time: 0.117 ms
 Execution Time: 582.205 ms
(8 rows)
  • 一時ファイルのログ出力
    • 指定のサイズよりも大きな一時ファイルをロギングする
    • 0でぜんぶ
SELECT * FROM pg_settings WHERE name='log_temp_files';
-[ RECORD 1 ]---+---------------------------------------------------------------------
name            | log_temp_files
setting         | -1
unit            | kB
category        | Reporting and Logging / What to Log
short_desc      | Log the use of temporary files larger than this number of kilobytes.
extra_desc      | Zero logs all files. The default is -1 (turning this feature off).
context         | superuser
vartype         | integer
source          | default
min_val         | -1
max_val         | 2147483647
enumvals        | 
boot_val        | -1
reset_val       | -1
sourcefile      | 
sourceline      | 
pending_restart | f
SET log_temp_files=0;
RESET work_mem; -- reset to 4MB (default)

EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=30832.00..59603.01 rows=1000000 width=16) (actual time=158.349..681.531 rows=1000000 loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on tbl t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..55.873 rows=1000000 loops=1)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=158.150..158.150 rows=1000000 loops=1)
         Buckets: 131072  Batches: 16  Memory Usage: 3251kB
         ->  Seq Scan on tbl t2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.004..56.045 rows=1000000 loops=1)
 Planning Time: 0.132 ms
 Execution Time: 706.652 ms
(8 rows)
         Buckets: 131072  Batches: 16  Memory Usage: 3251kB
  • Batchesが16なので、1つはメモリ上、残り15は一時ファイルが書き出されている
  • 一時ファイルのログが出力される
2020-02-29 03:57:57.064 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.40", size 1523980
2020-02-29 03:57:57.064 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.083 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.55", size 1523980
2020-02-29 03:57:57.083 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.091 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.42", size 1522212
2020-02-29 03:57:57.091 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.113 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.57", size 1522212
2020-02-29 03:57:57.113 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.116 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.37", size 1520872
2020-02-29 03:57:57.116 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.141 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.52", size 1520872
2020-02-29 03:57:57.141 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.146 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.34", size 1531004
2020-02-29 03:57:57.146 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.167 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.49", size 1531004
2020-02-29 03:57:57.167 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.172 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.39", size 1528456
2020-02-29 03:57:57.172 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.193 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.54", size 1528456
2020-02-29 03:57:57.193 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.197 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.36", size 1529940
2020-02-29 03:57:57.197 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.218 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.51", size 1529940
2020-02-29 03:57:57.218 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.223 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.35", size 1520384
2020-02-29 03:57:57.223 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.246 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.50", size 1520384
2020-02-29 03:57:57.246 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.252 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.44", size 1522900
2020-02-29 03:57:57.252 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.274 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.59", size 1522900
2020-02-29 03:57:57.274 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.279 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.30", size 1520396
2020-02-29 03:57:57.279 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.300 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.45", size 1520396
2020-02-29 03:57:57.300 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.306 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.33", size 1526832
2020-02-29 03:57:57.306 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.327 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.48", size 1526832
2020-02-29 03:57:57.327 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.331 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.41", size 1523340
2020-02-29 03:57:57.331 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.352 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.56", size 1523340
2020-02-29 03:57:57.352 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.358 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.38", size 1522840
2020-02-29 03:57:57.358 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.379 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.53", size 1522840
2020-02-29 03:57:57.379 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.383 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.43", size 1538092
2020-02-29 03:57:57.383 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.401 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.58", size 1538092
2020-02-29 03:57:57.401 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.407 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.32", size 1525620
2020-02-29 03:57:57.407 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.425 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.47", size 1525620
2020-02-29 03:57:57.425 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.429 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.31", size 1514544
2020-02-29 03:57:57.429 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
2020-02-29 03:57:57.446 UTC [565] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp565.46", size 1514544
2020-02-29 03:57:57.446 UTC [565] STATEMENT:  EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;

effective_cache_size

  • 利用可能なディスクキャッシュの推定値

Sets the planner's assumption about the total size of the data caches.
That is, the total size of the caches (kernel cache and shared buffers) used for PostgreSQL data files. This is measured in disk pages, which are normally 8 kB each.

プランナコスト定数

postgresql.conf

# - Planner Cost Constants -

#seq_page_cost = 1.0            # measured on an arbitrary scale
#random_page_cost = 4.0         # same scale as above
#cpu_tuple_cost = 0.01          # same scale as above
#cpu_index_tuple_cost = 0.005       # same scale as above
#cpu_operator_cost = 0.0025     # same scale as above
#parallel_tuple_cost = 0.1      # same scale as above
#parallel_setup_cost = 1000.0   # same scale as above

#jit_above_cost = 100000        # perform JIT compilation if available
                    # and query more expensive than this;
                    # -1 disables
#jit_inline_above_cost = 500000     # inline small functions if query is
                    # more expensive than this; -1 disables
#jit_optimize_above_cost = 500000   # use expensive JIT optimizations if
                    # query is more expensive than this;
                    # -1 disables

#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
#random_page_cost = 4.0          # same scale as above
  • SSDなどではもっと1に近づける

プランナメソッド設定の変更

postgresql.conf

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_parallel_append = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
#enable_partitionwise_join = off
#enable_partitionwise_aggregate = off
#enable_parallel_hash = on
#enable_partition_pruning = on
  • プランタイプを個別に有効/無効にする
  • Hash Joinが選ばれている例
EXPLAIN ANALYZE SELECT * FROM tbl t1, tbl t2 WHERE t1.id = t2.id;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=30832.00..59603.01 rows=1000000 width=16) (actual time=160.068..668.456 rows=1000000 loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on tbl t1  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.006..54.869 rows=1000000 loops=1)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8) (actual time=159.871..159.871 rows=1000000 loops=1)
         Buckets: 131072  Batches: 16  Memory Usage: 3251kB
         ->  Seq Scan on tbl t2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.003..56.715 rows=1000000 loops=1)
 Planning Time: 0.116 ms
 Execution Time: 693.012 ms
(8 rows)
  • Hash Joinを無効化してみる
SET enable_hashjoin=off;

EXPLAIN ANALYZE SELECT * FROM tbl t1, tbl t2 WHERE t1.id = t2.id;
  • Merge Joinになった
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.85..75816.85 rows=1000000 width=16) (actual time=0.011..422.687 rows=1000000 loops=1)
   Merge Cond: (t1.id = t2.id)
   ->  Index Scan using tbl_pkey on tbl t1  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.005..115.109 rows=1000000 loops=1)
   ->  Index Scan using tbl_pkey on tbl t2  (cost=0.42..30408.42 rows=1000000 width=8) (actual time=0.004..103.403 rows=1000000 loops=1)
 Planning Time: 0.116 ms
 Execution Time: 446.814 ms
(6 rows)
  • 内部的には、当該プランタイプのコストに巨大な値が設定されている
  • TID Scanとは
SELECT ctid,* FROM tbl LIMIT 1;
 ctid  | id | col 
-------+----+-----
 (0,1) |  1 |   1
(1 row)
  • OracleROWIDみたいなやつ
    • これによる検索は速い
    • が、VACUUM FULLで変わってしまう

テーブル結合最適化処理の実行制御

  • プランナによるクエリ書き換え
  • 公式
SELECT *
FROM x, y,
    (SELECT * FROM a, b, c WHERE something) AS ss
WHERE somethingelse;
  • がこう書き換わったりする
SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
  • 結合数が増えてくると、実行計画の作成時間が無視できなくなってくる
EXPLAIN ANALYZE SELECT * FROM a a1, a a2, a a3, a a4, a a5, a a6, a a7, a a8, a a9, a a10, a a11, a a12;
...
 Planning Time: 8.445 ms
  • FROM項目数に上限を設けることでこれを回避する
    • 実行計画の質は下がるかも
  • 公式
    • from_collapse_limit
      • サブクエリのFROMを外側のクエリのFROMに移動する際の項目数上限 (上述の例)
    • join_collapse_limi
      • JOINをFROMに書き換える際のFROM項目数上限
  • geqo_thresholdよりも小さな値を設定すること
    • デフォルト値12
    • これを超えると、総当たりからGEQOに切り替わる
EXPLAIN ANALYZE SELECT * FROM a a1, a a2, a a3, a a4, a a5, a a6, a a7, a a8, a a9, a a10, a a11, a a12, a a13;
...
 Planning Time: 6.791 ms
  • 8.445 msから短くなった

EXPLAINからのチューニング例

SET enable_bitmapscan = off;
EXPLAIN ANALYZE SELECT * FROM tbl WHERE tbl.id < 10 OR tbl.id > 9999990;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..11676.00 rows=10 width=8) (actual time=0.325..72.681 rows=9 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbl  (cost=0.00..10675.00 rows=4 width=8) (actual time=12.758..34.594 rows=3 loops=3)
         Filter: ((id < 10) OR (id > 9999990))
         Rows Removed by Filter: 333330
 Planning Time: 0.080 ms
 Execution Time: 72.695 ms
(8 rows)
  • 子ノードから順に問題箇所を探す
    • 初期ノードと総コストの差が大きい箇所
    • costとactual timeの比率がおかしい箇所
      • プランナコスト定数に誤りがある可能性
  • 上の例では、明らかにSeq Scanが問題箇所
  • インデックスが効いたらこんな感じになる
EXPLAIN ANALYZE (SELECT * FROM tbl WHERE tbl.id < 10) UNION ALL (SELECT * FROM tbl WHERE tbl.id > 999990);
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.42..17.53 rows=21 width=8) (actual time=0.003..0.008 rows=19 loops=1)
   ->  Index Scan using tbl_pkey on tbl  (cost=0.42..8.60 rows=10 width=8) (actual time=0.002..0.003 rows=9 loops=1)
         Index Cond: (id < 10)
   ->  Index Scan using tbl_pkey on tbl tbl_1  (cost=0.42..8.62 rows=11 width=8) (actual time=0.002..0.003 rows=10 loops=1)
         Index Cond: (id > 999990)
 Planning Time: 0.089 ms
 Execution Time: 0.019 ms
(7 rows)
  • デフォルト設定ならビットマップスキャンが効くところなんですけど
RESET enable_bitmap_scan;
EXPLAIN ANALYZE SELECT * FROM tbl WHERE tbl.id < 10 OR tbl.id > 9999990;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl  (cost=8.94..47.66 rows=10 width=8) (actual time=0.007..0.007 rows=9 loops=1)
   Recheck Cond: ((id < 10) OR (id > 9999990))
   Heap Blocks: exact=1
   ->  BitmapOr  (cost=8.94..8.94 rows=10 width=0) (actual time=0.005..0.005 rows=0 loops=1)
         ->  Bitmap Index Scan on tbl_pkey  (cost=0.00..4.50 rows=10 width=0) (actual time=0.002..0.003 rows=9 loops=1)
               Index Cond: (id < 10)
         ->  Bitmap Index Scan on tbl_pkey  (cost=0.00..4.43 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
               Index Cond: (id > 9999990)
 Planning Time: 0.084 ms
 Execution Time: 0.023 ms
(10 rows)

I/Oチューニング

WALチューニング

WAL書き込み量・頻度

  • wal_buffers
    • WALをディスクに書き込む1回あたりのデータ量の上限
    • 大きくするとI/O回数が減る
    • デフォルト(-1)でshared_buffersの1/32
  • wal_writer_delay
    • wal writerのwal出力の間隔
  • 公式
wal_writer_delay (integer) 

 Specifies how often the WAL writer flushes WAL, in time terms. After flushing WAL the writer sleeps for the length of time
 given by wal_writer_delay, unless woken up sooner by an asynchronously committing transaction.

WALの同期/非同期書き込み・適用

#synchronous_commit = on     # synchronization level;
                    # off, local, remote_write, remote_apply, or on
  • ストリーミングレプリケーションとかで設定するやつ
    • 非同期 (プライマリはスタンバイのWAL処理を待たない)
      • off
        • クライアントはプライマリのWAL処理を待たない
      • local
        • クライアントはプライマリのWAL処理を待つ
    • 同期 (プライマリはスタンバイのWAL処理を待つ)
      • remote_write
        • リモートのメモリに乗るまで待つ
      • on
        • リモートのディスクにWALが書き込まれるまで待つ
      • remote_apply
        • リモートでWALが適用されるまで待つ

commitをまとめるやつ

SELECT * FROM pg_settings where name in ('commit_siblings', 'commit_delay');
      name       | setting | unit |          category          |                                     short_desc                                      | extra_desc |  context  | vartype | source  | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart 
-----------------+---------+------+----------------------------+-------------------------------------------------------------------------------------+------------+-----------+---------+---------+---------+---------+----------+----------+-----------+------------+------------+-----------------
 commit_delay    | 0       |      | Write-Ahead Log / Settings | Sets the delay in microseconds between transaction commit and flushing WAL to disk. |            | superuser | integer | default | 0       | 100000  |          | 0        | 0         |            |            | f
 commit_siblings | 5       |      | Write-Ahead Log / Settings | Sets the minimum concurrent open transactions before performing commit_delay.       |            | user      | integer | default | 0       | 1000    |          | 5        | 5         |            |            | f
(2 rows)
  • commitをまとめて書き込む
    • commit_siblings
      • この設定値以上のトランザクションが同時に発生した場合のみcommitをまとめる
      • 同時に複数のトランザクションが発生しなければ、そもそも(ほぼ)同時にcommitが行われる可能性が低いため
    • commit_delay
      • commitまとめ待ち時間

性能とdurabilityを引き換えにするやつ

  • fsync
    • 有効化すると、OSのfsync()システムコール等を発行し、更新が物理的にディスクに書き込まれたかの確証を得ようと試みる
      • OSやH/Wがクラッシュしたときの備え
  • full_page_writes
    • 有効化すると、チェックポイント後、最初の変更をページに加える際、ページ全体をWALに書き込む
  • 無効にするとdurabilityと引き換えに性能向上する可能性がある
  • wal_sync_method
SELECT * FROM pg_settings WHERE name = 'wal_sync_method';
-[ RECORD 1 ]---+---------------------------------------------------------
name            | wal_sync_method
setting         | fdatasync
unit            | 
category        | Write-Ahead Log / Settings
short_desc      | Selects the method used for forcing WAL updates to disk.
extra_desc      | 
context         | sighup
vartype         | enum
source          | default
min_val         | 
max_val         | 
enumvals        | {fsync,fdatasync,open_sync,open_datasync}
boot_val        | fdatasync
reset_val       | fdatasync
sourcefile      | 
sourceline      | 
pending_restart | f
wal_sync_method (enum) 

 Method used for forcing WAL updates out to disk. If fsync is off then this setting is irrelevant, since WAL file updates will not
 be forced out at all. Possible values are:

 * open_datasync (write WAL files with open() option O_DSYNC)

 * fdatasync (call fdatasync() at each commit)

 * fsync (call fsync() at each commit)

 * fsync_writethrough (call fsync() at each commit, forcing write-through of any disk write cache)

 * open_sync (write WAL files with open() option O_SYNC)

チェックポイントチューニング

SELECT * FROM pg_settings WHERE name LIKE 'checkpoint_%';
             name             | setting | unit |           category            |                                        short_desc                                        |                                                                                     extra_desc                                                                                      | context | vartype | source  | min_val |  max_val   | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart 
------------------------------+---------+------+-------------------------------+------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+---------+---------+------------+----------+----------+-----------+------------+------------+-----------------
 checkpoint_completion_target | 0.5     |      | Write-Ahead Log / Checkpoints | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval. |                                                                                                                                                                                     | sighup  | real    | default | 0       | 1          |          | 0.5      | 0.5       |            |            | f
 checkpoint_flush_after       | 32      | 8kB  | Write-Ahead Log / Checkpoints | Number of pages after which previously performed writes are flushed to disk.             |                                                                                                                                                                                     | sighup  | integer | default | 0       | 256        |          | 32       | 32        |            |            | f
 checkpoint_timeout           | 300     | s    | Write-Ahead Log / Checkpoints | Sets the maximum time between automatic WAL checkpoints.                                 |                                                                                                                                                                                     | sighup  | integer | default | 30      | 86400      |          | 300      | 300       |            |            | f
 checkpoint_warning           | 30      | s    | Write-Ahead Log / Checkpoints | Enables warnings if checkpoint segments are filled more frequently than this.            | Write a message to the server log if checkpoints caused by the filling of checkpoint segment files happens more frequently than this number of seconds. Zero turns off the warning. | sighup  | integer | default | 0       | 2147483647 |          | 30       | 30        |            |            | f
(4 rows)
  • max_wal_size
    • WALファイルサイズがこの設定値を超えると自動checkpointが実行される
  • checkpoint_timeout
    • 自動チェックポイントの最大間隔時間
  • checkpoint_completion_target
    • 目標完了時間
    • checkpoint_timeoutに対する割合
  • checkpoint_warning
    • パラメータの設定値未満の間隔で自動checkpointが発生した場合、max_wal_sizeの増加を進めるログを出力する
      • 昔はcheckpoint_segments
2020-02-29 05:32:28.224 UTC [57] LOG:  checkpoints are occurring too frequently (13 seconds apart)
2020-02-29 05:32:28.224 UTC [57] HINT:  Consider increasing the configuration parameter "max_wal_size".

バックグラウンドライタチューニング

SELECT * FROM pg_settings WHERE name LIKE 'bgwriter_%';
          name           | setting | unit |              category              |                                  short_desc                                  | extra_desc | context | vartype | source  | min_val |  max_val   | enumvals | boot_val | reset_val | sourcefile | sourceline | pending_restart 
-------------------------+---------+------+------------------------------------+------------------------------------------------------------------------------+------------+---------+---------+---------+---------+------------+----------+----------+-----------+------------+------------+-----------------
 bgwriter_delay          | 200     | ms   | Resource Usage / Background Writer | Background writer sleep time between rounds.                                 |            | sighup  | integer | default | 10      | 10000      |          | 200      | 200       |            |            | f
 bgwriter_flush_after    | 64      | 8kB  | Resource Usage / Background Writer | Number of pages after which previously performed writes are flushed to disk. |            | sighup  | integer | default | 0       | 256        |          | 64       | 64        |            |            | f
 bgwriter_lru_maxpages   | 100     |      | Resource Usage / Background Writer | Background writer maximum number of LRU pages to flush per round.            |            | sighup  | integer | default | 0       | 1073741823 |          | 100      | 100       |            |            | f
 bgwriter_lru_multiplier | 2       |      | Resource Usage / Background Writer | Multiple of the average buffer usage to free per round.                      |            | sighup  | real    | default | 0       | 10         |          | 2        | 2         |            |            | f
(4 rows)
  • bgwriter_delay
    • backgroud writerプロセスの起動周期
  • 1周期で書き込むdirtyページ数の上限
bgwriter_lru_maxpages,
bgwriter_lru_multiplier x 最近の周期で書き込んだページ平均数の平均
  • bgwriter_lru_multiplier
    • 1.0: 必要なぶんちょうど
    • ~1.0: スパイク等に備えてクッション
    • 1.0~: dirty pageの書き戻しをサーバープロセスに意図的に委ねる
  • background writerによる書き込みを増やすと
    • checkpointerの負荷は下がる
    • システム全体としての書き込み量は増加する可能性がある
      • checkpointerのみで済むところが、background writerによる書き込みも生ずる

ディスク先読み

effective_io_concurrency

SELECT * FROM pg_settings WHERE name='effective_io_concurrency';
-[ RECORD 1 ]---+-----------------------------------------------------------------------------------------
name            | effective_io_concurrency
setting         | 1
unit            | 
category        | Resource Usage / Asynchronous Behavior
short_desc      | Number of simultaneous requests that can be handled efficiently by the disk subsystem.
extra_desc      | For RAID arrays, this should be approximately the number of drive spindles in the array.
context         | user
vartype         | integer
source          | default
min_val         | 0
max_val         | 1000
enumvals        | 
boot_val        | 1
reset_val       | 1
sourcefile      | 
sourceline      | 
pending_restart | f
  • 大きくすると、Bitmap Heap Scanが速くなる

For RAID arrays, this should be approximately the number of drive spindles in the array.

  • RAIDの場合は個々のドライブ数

ディスクI/Oの分散

テーブル空間

  • アクセス頻度の低いデータを、安価で低速なストレージに移す
  • アクセス頻度の高いデータを、高価で高速なストレージに移す

統計情報データの出力先変更

SELECT * FROM pg_settings WHERE name='stats_temp_directory';
-[ RECORD 1 ]---+--------------------------------------------------------------
name            | stats_temp_directory
setting         | pg_stat_tmp
unit            | 
category        | Statistics / Query and Index Statistics Collector
short_desc      | Writes temporary statistics files to the specified directory.
extra_desc      | 
context         | sighup
vartype         | string
source          | default
min_val         | 
max_val         | 
enumvals        | 
boot_val        | pg_stat_tmp
reset_val       | pg_stat_tmp
sourcefile      | 
sourceline      | 
pending_restart | f
ls -lA pg_stat_tmp
total 28
-rw------- 1 postgres postgres  2033 Feb 29 06:10 db_0.stat
-rw------- 1 postgres postgres 14370 Feb 29 06:10 db_13408.stat
-rw------- 1 postgres postgres   639 Feb 29 06:10 global.stat
-rw------- 1 postgres postgres  1923 Feb 29 06:10 pgss_query_texts.stat
  • 統計情報データ一時的な格納先
  • RAMディスク等にすると速い

メモリのチューニング

shared_buffers

  • 大きすぎはよくない
    • かえってswapが頻発する
  • 設定値が小さく「共有バッファのキャッシュミス」した場合でも、ディスクアクセスが生じるとは限らない
    • Linuxでは余ったメモリ領域をディスクキャッシュとして利用するため

maintenance_work_mem

  • VACUUMとかREINDEXとかで使うやつ
#maintenance_work_mem = 64MB     # min 1MB
#autovacuum_work_mem = -1       # min 1MB, or -1 to use maintenance_work_mem
  • autovacuum_work_memを設定しない場合、自動バキュームでもこの設定値が利用される
    • autovacuum_max_workersを掛けたぶんが確保されうることに注意する

ロックのチューニング

pg_locks

SELECT * FROM pg_locks;
  locktype  | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid |      mode       | granted | fastpath 
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+-----------------+---------+----------
 relation   |    13408 |    12143 |      |       |            |               |         |       |          | 3/86               | 565 | AccessShareLock | t       | t
 virtualxid |          |          |      |       | 3/86       |               |         |       |          | 3/86               | 565 | ExclusiveLock   | t       | t
(2 rows)
BEGIN;
SELECT * FROM tbl WHERE id = 1 FOR UPDATE;
 id | col 
----+-----
  1 |   1
(1 row)
SELECT * FROM pg_locks;
   locktype    | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid |      mode       | granted | fastpath 
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+-----------------+---------+----------
 relation      |    13408 |    12143 |      |       |            |               |         |       |          | 3/87               | 565 | AccessShareLock | t       | t
 relation      |    13408 |    16398 |      |       |            |               |         |       |          | 3/87               | 565 | RowShareLock    | t       | t
 relation      |    13408 |    16395 |      |       |            |               |         |       |          | 3/87               | 565 | RowShareLock    | t       | t
 virtualxid    |          |          |      |       | 3/87       |               |         |       |          | 3/87               | 565 | ExclusiveLock   | t       | t
 transactionid |          |          |      |       |            |           506 |         |       |          | 3/87               | 565 | ExclusiveLock   | t       | f
(5 rows)
ABORT;
  • より詳細な情報を得るためには、他のとビューやテーブルとJOINして使う
BEGIN;
SELECT * FROM tbl WHERE id = 1 FOR UPDATE;

SELECT class.relname,locks.*,activity.*
  FROM pg_locks locks
  JOIN pg_stat_activity activity 
    ON locks.database=activity.datid
  JOIN pg_class class 
    ON locks.relation = class.oid
 WHERE class.relname='tbl';
-[ RECORD 1 ]------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname            | tbl
locktype           | relation
database           | 13408
relation           | 16395
page               | 
tuple              | 
virtualxid         | 
transactionid      | 
classid            | 
objid              | 
objsubid           | 
virtualtransaction | 3/104
pid                | 565
mode               | RowShareLock
granted            | t
fastpath           | t
datid              | 13408
datname            | postgres
pid                | 565
usesysid           | 10
usename            | postgres
application_name   | psql
client_addr        | 
client_hostname    | 
client_port        | -1
backend_start      | 2020-02-29 03:24:18.912878+00
xact_start         | 2020-02-29 05:17:29.375437+00
query_start        | 2020-02-29 05:17:40.801137+00
state_change       | 2020-02-29 05:17:40.801138+00
wait_event_type    | 
wait_event         | 
state              | active
backend_xid        | 509
backend_xmin       | 509
query              | select class.relname,locks.*,activity.* from pg_locks locks join pg_stat_activity activity on locks.database=activity.datid join pg_class class on locks.relation = class.oid where class.relname='tbl';
backend_type       | client backend

deadlock_timeout

SELECT name,setting,context,unit FROM pg_settings WHERE name='deadlock_timeout';
       name       | setting |  context  | unit 
------------------+---------+-----------+------
 deadlock_timeout | 1000    | superuser | ms
(1 row)

スロークエリの検出

log_min_duration_statement

#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
                    # and their durations, > 0 logs only
                    # statements running at least this number
                    # of milliseconds
  • 指定の値よりも時間がかかるSELECT文を投げるとロギングされる
INSERT INTO tbl VALUES (generate_series(1,10000000));
SET log_min_duration_statement='100ms';

SELECT AVG(id) FROM tbl;
2020-02-29 05:34:06.609 UTC [565] LOG:  duration: 286.595 ms  statement: SELECT AVG(id) FROM tbl;
  • INSERTとかは出ない

auto_explain

  • モジュールのロードが必要
  • 接続中のセッションのみ
LOAD 'auto_explain';
  • 全セッション
- #shared_preload_libraries = ''  # (change requires restart)
+ shared_preload_libraries = 'auto_explain';
  • スロークエリの実行計画をログ出力する例
LOAD 'auto_explain';
SET auto_explain.log_min_duration = '100ms';
SELECT AVG(id) FROM tbl;
2020-02-29 05:38:04.622 UTC [565] LOG:  duration: 279.341 ms  plan:
    Query Text: SELECT AVG(id) FROM tbl;
    Finalize Aggregate  (cost=97331.80..97331.81 rows=1 width=32)
      ->  Gather  (cost=97331.58..97331.79 rows=2 width=32)
            Workers Planned: 2
            ->  Partial Aggregate  (cost=96331.58..96331.59 rows=1 width=32)
                  ->  Parallel Seq Scan on tbl  (cost=0.00..85914.87 rows=4166687 width=4)
2020-02-29 05:38:04.622 UTC [565] LOG:  duration: 279.552 ms  statement: SELECT AVG(id) FROM tbl;

pg_stat_statements

  • shared_preload_librariesの設定、およびCREATE EXTENSIONで有効化する
ALTER SYSTEM SET shared_preload_libraries='pg_stat_statements';
CREATE EXTENSION pg_stat_statements;
SELECT extname,extversion FROM pg_extension;
      extname       | extversion 
--------------------+------------
 plpgsql            | 1.0
 pg_stat_statements | 1.7
(2 rows)
  • 実行された全てのSQL文の実行時の統計情報を記録する
  • pg_stat_statementsシステムビューで取得
 \d+ pg_stat_statements;
                                 View "public.pg_stat_statements"
       Column        |       Type       | Collation | Nullable | Default | Storage  | Description 
---------------------+------------------+-----------+----------+---------+----------+-------------
 userid              | oid              |           |          |         | plain    | 
 dbid                | oid              |           |          |         | plain    | 
 queryid             | bigint           |           |          |         | plain    | 
 query               | text             |           |          |         | extended | 
 calls               | bigint           |           |          |         | plain    | 
 total_time          | double precision |           |          |         | plain    | 
 min_time            | double precision |           |          |         | plain    | 
 max_time            | double precision |           |          |         | plain    | 
 mean_time           | double precision |           |          |         | plain    | 
 stddev_time         | double precision |           |          |         | plain    | 
 rows                | bigint           |           |          |         | plain    | 
 shared_blks_hit     | bigint           |           |          |         | plain    | 
 shared_blks_read    | bigint           |           |          |         | plain    | 
 shared_blks_dirtied | bigint           |           |          |         | plain    | 
 shared_blks_written | bigint           |           |          |         | plain    | 
 local_blks_hit      | bigint           |           |          |         | plain    | 
 local_blks_read     | bigint           |           |          |         | plain    | 
 local_blks_dirtied  | bigint           |           |          |         | plain    | 
 local_blks_written  | bigint           |           |          |         | plain    | 
 temp_blks_read      | bigint           |           |          |         | plain    | 
 temp_blks_written   | bigint           |           |          |         | plain    | 
 blk_read_time       | double precision |           |          |         | plain    | 
 blk_write_time      | double precision |           |          |         | plain    | 
View definition:
 SELECT pg_stat_statements.userid,
    pg_stat_statements.dbid,
    pg_stat_statements.queryid,
    pg_stat_statements.query,
    pg_stat_statements.calls,
    pg_stat_statements.total_time,
    pg_stat_statements.min_time,
    pg_stat_statements.max_time,
    pg_stat_statements.mean_time,
    pg_stat_statements.stddev_time,
    pg_stat_statements.rows,
    pg_stat_statements.shared_blks_hit,
    pg_stat_statements.shared_blks_read,
    pg_stat_statements.shared_blks_dirtied,
    pg_stat_statements.shared_blks_written,
    pg_stat_statements.local_blks_hit,
    pg_stat_statements.local_blks_read,
    pg_stat_statements.local_blks_dirtied,
    pg_stat_statements.local_blks_written,
    pg_stat_statements.temp_blks_read,
    pg_stat_statements.temp_blks_written,
    pg_stat_statements.blk_read_time,
    pg_stat_statements.blk_write_time
   FROM pg_stat_statements(true) pg_stat_statements(userid, dbid, queryid, query, calls, total_time, min_time, max_time, mean_time, stddev_time, rows, shared_blks_hit, shared_blks_read, shared_blks_dirtied, shared_blks_written, local_blks_hit, local_blks_read, local_blks_dirtied, local_blks_written, temp_blks_read, temp_blks_written, blk_read_time, blk_write_time);
  • クエリ実行時間、共有バッファヒット率等がわかる
SELECT * FROM pg_stat_statements WHERE query ~ 'AVG';
-[ RECORD 1 ]-------+------------------------
userid              | 10
dbid                | 13408
queryid             | -2183878846915592520
query               | SELECT AVG(id) FROM tbl
calls               | 1
total_time          | 261.81870000000004
min_time            | 261.81870000000004
max_time            | 261.81870000000004
mean_time           | 261.81870000000004
stddev_time         | 0
rows                | 1
shared_blks_hit     | 0
shared_blks_read    | 44248
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 0
blk_write_time      | 0
  • local_*
    • 一時テーブルで使うやつ
CREATE TEMPORARY TABLE tmptbl (col int);
INSERT INTO tmptbl VALUES (generate_series(1,100));
SELECT * FROM tmptbl;

SELECT * FROM pg_stat_statements WHERE query = 'SELECT * FROM tmptbl';
-[ RECORD 1 ]-------+---------------------
userid              | 10
dbid                | 13408
queryid             | -4296624328652239193
query               | SELECT * FROM tmptbl
calls               | 1
total_time          | 0.0196
min_time            | 0.0196
max_time            | 0.0196
mean_time           | 0.0196
stddev_time         | 0
rows                | 100
shared_blks_hit     | 0
shared_blks_read    | 0
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 1
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 0
blk_write_time      | 0
  • temp_blks
    • 一時ファイル
      • HashのBatchが1より大きい場合とかの
EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=308312.08..595999.70 rows=10000048 width=16) (actual time=1770.283..5999.249 rows=10000000 loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on tbl t1  (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.007..624.598 rows=10000000 loops=1)
   ->  Hash  (cost=144248.48..144248.48 rows=10000048 width=8) (actual time=1767.762..1767.762 rows=10000000 loops=1)
         Buckets: 131072  Batches: 256  Memory Usage: 2400kB
         ->  Seq Scan on tbl t2  (cost=0.00..144248.48 rows=10000048 width=8) (actual time=0.004..630.673 rows=10000000 loops=1)
 Planning Time: 0.171 ms
 JIT:
   Functions: 10
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.568 ms, Inlining 1.255 ms, Optimization 28.625 ms, Emission 13.454 ms, Total 43.902 ms
 Execution Time: 6247.816 ms
(12 rows)
  • 一時ファイル使いまくり
         Buckets: 131072  Batches: 256  Memory Usage: 2400kB
SELECT * FROM pg_stat_statements WHERE query='EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id';
-[ RECORD 1 ]-------+--------------------------------------------------------------
userid              | 10
dbid                | 13408
queryid             | -8182901821120542956
query               | EXPLAIN ANALYZE SELECT * FROM tbl t1,tbl t2 WHERE t1.id=t2.id
calls               | 2
total_time          | 12790.9215
min_time            | 6248.0437
max_time            | 6542.8778
mean_time           | 6395.46075
stddev_time         | 147.41705000000002
rows                | 0
shared_blks_hit     | 17266
shared_blks_read    | 159784
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 117232
temp_blks_written   | 117232
blk_read_time       | 0
blk_write_time      | 0