gihyo.jp
最適な実行計画が選ばれない
PstgreSQLが原因となる場合
コスト基準値の設定
# - Planner Cost Constants -
#seq_page_cost = 1.0 # measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
postgres=# SELECT name,setting,context FROM pg_settings WHERE name SIMILAR TO '%_page_cost';
name | setting | context
------------------+---------+---------
random_page_cost | 4 | user
seq_page_cost | 1 | user
(2 rows)
- PostgreSQLはHDDへのランダムアクセスが最もコストが高いと想定している
- SSDなどではパフォーマンス特性が変わってくるので適宜調整する
統計の取得頻度
統計情報の取得粒度
- サンプル数と精度のトレードオフ
- データの母数に対してサンプル数が少なすぎると精度は落ちる
- サンプリングしすぎるとオーバヘッドが大きくなる
ディスク性能
- H/W構成によっては、PostgreSQLの想定とは異なるパフォーマンス特性になる
- I/O分散が適切でないと特定のディスクに負荷が偏る
ネットワーク性能
- 実行計画作成においてネットワーク性能は一切考慮されない
- 工夫が求められる
アプリケーション
- プリペアドステートメントは「一般的な実行計画」が保存される
- 実行時のパラメータとの相性が悪いと遅くなる
バッチ処理や瞬間的な大量アクセス
実行計画の取得方法
EXPLAIN文
- 公式
- プランナが作成した実行計画を表示する
- プランナが考慮しないもの
postgres=# CREATE TABLE tbl (col int);
CREATE TABLE
postgres=# INSERT INTO tbl VALUES(generate_series(1,10000000));
INSERT 0 10000000
postgres=# EXPLAIN SELECT * FROM tbl;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on tbl (cost=0.00..157080.40 rows=11283240 width=4)
(1 row)
ANALYZEオプション
- 実行時間の実測も行うやつ
- トランザクション外で更新クエリについて行うと、実際にデータが書き換わってしまうので注意する
postgres=# EXPLAIN ANALYSE SELECT * FROM tbl;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on tbl (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.049..587.751 rows=10000000 loops=1)
Planning Time: 0.020 ms
Execution Time: 826.740 ms
(3 rows)
postgres=# EXPLAIN ANALYSE DELETE FROM tbl where col % 2 = 0;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Delete on tbl (cost=0.00..194247.65 rows=50000 width=6) (actual time=5721.469..5721.469 rows=0 loops=1)
-> Seq Scan on tbl (cost=0.00..194247.65 rows=50000 width=6) (actual time=38.069..1014.578 rows=5000000 loops=1)
Filter: ((col % 2) = 0)
Rows Removed by Filter: 5000000
Planning Time: 0.065 ms
JIT:
Functions: 3
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.627 ms, Inlining 0.000 ms, Optimization 7.159 ms, Emission 30.389 ms, Total 38.175 ms
Execution Time: 5837.949 ms
(10 rows)
postgres=# SELECT COUNT(*) FROM tbl;
count
---------
5000000
(1 row)
postgres=# TRUNCATE tbl;
TRUNCATE TABLE
postgres=# EXPLAIN ANALYSE INSERT INTO tbl VALUES(generate_series(1,10000000));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Insert on tbl (cost=0.00..50000.02 rows=10000000 width=4) (actual time=7175.631..7175.631 rows=0 loops=1)
-> ProjectSet (cost=0.00..50000.02 rows=10000000 width=4) (actual time=0.002..604.079 rows=10000000 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 0.017 ms
Execution Time: 7175.647 ms
(5 rows)
postgres=# EXPLAIN ANALYSE UPDATE tbl SET col = col+1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Update on tbl (cost=0.00..169247.71 rows=9999977 width=10) (actual time=17314.592..17314.592 rows=0 loops=1)
-> Seq Scan on tbl (cost=0.00..169247.71 rows=9999977 width=10) (actual time=1.721..1261.544 rows=10000000 loops=1)
Planning Time: 0.045 ms
JIT:
Functions: 2
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.274 ms, Inlining 0.000 ms, Optimization 0.166 ms, Emission 1.438 ms, Total 1.878 ms
Execution Time: 17314.926 ms
(8 rows)
postgres=# SELECT MAX(col) FROM tbl;
max
----------
10000001
(1 row)
VERBOSEオプション
postgres=# EXPLAIN VERBOSE SELECT * FROM tbl;
QUERY PLAN
---------------------------------------------------------------------
Seq Scan on public.tbl (cost=0.00..144247.77 rows=9999977 width=4)
Output: col
(2 rows)
COSTSオプション
postgres=# EXPLAIN SELECT * FROM tbl;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on tbl (cost=0.00..144248.48 rows=10000048 width=4)
(1 row)
postgres=# EXPLAIN (COSTS false) SELECT * FROM tbl;
QUERY PLAN
-----------------
Seq Scan on tbl
(1 row)
COSTS
オプション有効で追加される情報
初期コスト
..総コスト
rows
: 行数
w
: 平均行長
BUFFERSオプション
postgres=# EXPLAIN (ANALYSE, BUFFERS) SELECT * FROM tbl;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on tbl (cost=0.00..144248.48 rows=10000048 width=4) (actual time=10.275..612.028 rows=10000000 loops=1)
Buffers: shared hit=16296 read=27952
Planning Time: 0.021 ms
Execution Time: 916.477 ms
(4 rows)
ANALYSE
(クエリを実際に実行して実行時間も計測)と併用
- ページ読み込み元の内訳
TIMIMGオプション
postgres=# EXPLAIN (ANALYSE,TIMING) SELECT count(*),max(col),min(col) FROM tbl;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=118165.24..118165.25 rows=1 width=16) (actual time=445.287..445.287 rows=1 loops=1)
-> Gather (cost=118165.02..118165.23 rows=2 width=16) (actual time=445.168..447.885 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=117165.02..117165.03 rows=1 width=16) (actual time=424.189..424.190 rows=1 loops=3)
-> Parallel Seq Scan on tbl (cost=0.00..85914.87 rows=4166687 width=4) (actual time=0.150..208.200 rows=3333333 loops=3)
Planning Time: 0.040 ms
JIT:
Functions: 11
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.228 ms, Inlining 0.000 ms, Optimization 0.550 ms, Emission 9.848 ms, Total 11.625 ms
Execution Time: 448.540 ms
(12 rows)
postgres=# EXPLAIN (ANALYSE,TIMING false) SELECT count(*),max(col),min(col) FROM tbl;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=118165.24..118165.25 rows=1 width=16) (actual rows=1 loops=1)
-> Gather (cost=118165.02..118165.23 rows=2 width=16) (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=117165.02..117165.03 rows=1 width=16) (actual rows=1 loops=3)
-> Parallel Seq Scan on tbl (cost=0.00..85914.87 rows=4166687 width=4) (actual rows=3333333 loops=3)
Planning Time: 0.041 ms
JIT:
Functions: 11
Options: Inlining false, Optimization false, Expressions true, Deforming true
Execution Time: 315.999 ms
(11 rows)
actual time=0.150..208.200
postgres=# EXPLAIN (SUMMARY) SELECT * FROM tbl;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on tbl (cost=0.00..166385.51 rows=10001351 width=4)
Planning Time: 0.046 ms
(2 rows)
postgres=# EXPLAIN SELECT * FROM tbl;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on tbl (cost=0.00..166385.51 rows=10001351 width=4)
(1 row)
- デフォルト
TEXT
- ほか
XML
,JSON
,YAML
を選択できる
postgres=# EXPLAIN (ANALYSE,FORMAT XML) SELECT * FROM tbl;
QUERY PLAN
----------------------------------------------------------
<explain xmlns="http://www.postgresql.org/2009/explain">+
<Query> +
<Plan> +
<Node-Type>Seq Scan</Node-Type> +
<Parallel-Aware>false</Parallel-Aware> +
<Relation-Name>tbl</Relation-Name> +
<Alias>tbl</Alias> +
<Startup-Cost>0.00</Startup-Cost> +
<Total-Cost>144248.48</Total-Cost> +
<Plan-Rows>10000048</Plan-Rows> +
<Plan-Width>4</Plan-Width> +
<Actual-Startup-Time>0.303</Actual-Startup-Time> +
<Actual-Total-Time>582.801</Actual-Total-Time> +
<Actual-Rows>10000000</Actual-Rows> +
<Actual-Loops>1</Actual-Loops> +
</Plan> +
<Planning-Time>0.020</Planning-Time> +
<Triggers> +
</Triggers> +
<Execution-Time>822.827</Execution-Time> +
</Query> +
</explain>
(1 row)
postgres=# EXPLAIN (ANALYSE,FORMAT JSON) SELECT * FROM tbl;
QUERY PLAN
-------------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Seq Scan", +
"Parallel Aware": false, +
"Relation Name": "tbl", +
"Alias": "tbl", +
"Startup Cost": 0.00, +
"Total Cost": 144248.48, +
"Plan Rows": 10000048, +
"Plan Width": 4, +
"Actual Startup Time": 0.406,+
"Actual Total Time": 590.028,+
"Actual Rows": 10000000, +
"Actual Loops": 1 +
}, +
"Planning Time": 0.035, +
"Triggers": [ +
], +
"Execution Time": 831.016 +
} +
]
(1 row)
postgres=# EXPLAIN (ANALYSE,FORMAT YAML) SELECT * FROM tbl;
QUERY PLAN
--------------------------------
- Plan: +
Node Type: "Seq Scan" +
Parallel Aware: false +
Relation Name: "tbl" +
Alias: "tbl" +
Startup Cost: 0.00 +
Total Cost: 144248.48 +
Plan Rows: 10000048 +
Plan Width: 4 +
Actual Startup Time: 0.312+
Actual Total Time: 593.356+
Actual Rows: 10000000 +
Actual Loops: 1 +
Planning Time: 0.019 +
Triggers: +
Execution Time: 836.354
(1 row)
ANALYZEオプション
- 統計情報取得
- デフォルト
autovacuum
で一緒に実行される
#------------------------------------------------------------------------------
# AUTOVACUUM
#------------------------------------------------------------------------------
#autovacuum = on # Enable autovacuum subprocess? 'on'
# requires track_counts to also be on.
autovacuum
有効にするならtrack_counts
(統計情報の収集)も有効化
#------------------------------------------------------------------------------
# STATISTICS
#------------------------------------------------------------------------------
# - Query and Index Statistics Collector -
#track_activities = on
#track_counts = on
#autovacuum_vacuum_threshold = 50 # min number of row updates before
# vacuum
#autovacuum_analyze_threshold = 50 # min number of row updates before
# analyze
#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
VERBOSEオプション
postgres=# ANALYSE VERBOSE tbl;
INFO: analyzing "public.tbl"
INFO: "tbl": scanned 30000 of 44248 pages, containing 6779952 live rows and 0 dead rows; 30000 rows in sample, 9999977 estimated total rows
ANALYZE
- 進捗が出力される
- 統計情報取得のサンプリング数はデフォルト30000
統計情報取得のためのパラメータ設定
postgres=# SELECT name,setting,context FROM pg_settings WHERE name SIMILAR TO 'autovacuum_(vacuum|analyze)_(scale_factor|threshold)';
name | setting | context
---------------------------------+---------+---------
autovacuum_analyze_scale_factor | 0.1 | sighup
autovacuum_analyze_threshold | 50 | sighup
autovacuum_vacuum_scale_factor | 0.2 | sighup
autovacuum_vacuum_threshold | 50 | sighup
(4 rows)
postgres=# SELECT name,setting,context,min_val,max_val FROM pg_settings WHERE name='default_statistics_target';
name | setting | context | min_val | max_val
---------------------------+---------+---------+---------+---------
default_statistics_target | 100 | user | 1 | 10000
(1 row)
- 統計情報の更新に必要な更新行数設定
autovacuum_analyze_scale_factor
- テーブルに対してどれだけの割合が更新されたら統計情報を更新するか
autovacuum_analyzethreshold
- 例
autovacuum_analyze_scale_factor
= 0.1
autovacuum_analyzethreshold
= 50
- 1000行のテーブルならば、
(50 + 1000 * 0.1) = 150
行以上で統計情報が自動更新されうる
default_statistics_target
- (サンプリング数 / 300)を指定する
- デフォルトの100ならば、サンプリング行数は30000行
Column: システムカタログ「pg_stats」
postgres=# SELECT tablename,attname,n_distinct,correlation,most_common_vals,most_common_freqs FROM pg_stats WHERE tablename='tbl';
tablename | attname | n_distinct | correlation | most_common_vals | most_common_freqs
-----------+---------+------------+-------------+------------------+-------------------
tbl | col | -1 | 1 | |
(1 row)
CREATE TABLE
して連番を入れたてのテーブルだと、correlation
= 1
- 歯抜けに削除して再度挿入してみる
postgres=# DELETE FROM tbl WHERE col%2 = 0;
DELETE 5000000
postgres=# INSERT INTO tbl VALUES (generate_series(2,10000000,2));
INSERT 0 5000000
postgres=# SELECT tablename,attname,n_distinct,correlation,most_common_vals,most_common_freqs FROM pg_stats WHERE tablename='tbl';
tablename | attname | n_distinct | correlation | most_common_vals | most_common_freqs
-----------+---------+------------+-------------+------------------+-------------------
tbl | col | -1 | 1 | |
(1 row)
postgres=# ANALYZE VERBOSE;
INFO: analyzing "public.tbl"
INFO: "tbl": scanned 30000 of 66372 pages, containing 4520476 live rows and 2259524 dead rows; 30000 rows in sample, 10001101 estimated total rows
ANALYZE
postgres=# SELECT tablename,attname,n_distinct,correlation,most_common_vals,most_common_freqs FROM pg_stats WHERE tablename='tbl';
tablename | attname | n_distinct | correlation | most_common_vals | most_common_freqs
-----------+---------+------------+-------------+------------------+-------------------
tbl | col | -1 | 0.5053877 | |
(1 row)
correlation
が1
から約0.5
に下がった
- 行の物理的な並び順と論理的な並び順とがあまり一致しなくなったということ
- 0に近いほどランダムアクセスになりやすい