内部構造から学ぶPostgreSQL 設計・運用計画の鉄則 ch8 監視計画
監視とは
- 公式
- 計画する内容:
- 監視項目
- OS
- PostgreSQLサーバプロセス
- 監視間隔
- SLA等に照らして必要十分な間隔で計測する
- 監視内容により監視間隔も変わってくる
- 死活は数秒とか
- 性能は数分とか
- 閾値
- 想定、実績、閾値をもって、「異常」の定義をする
監視項目の選定
- 最終目的は「データベースが健全に動作しているか」の監視
- そのためにブレークダウン
- サーバに問題が起きていないか
- PostgreSQLに問題が起きていないか
サーバに問題が起きていないか?を監視する
- Unixの監視ツールをつかう
sar
- CPU,メモリ,I/O,ネットワークぜんぶ
- sysstatデーモンが動いていて
/var/log/sysstat/sa<日>
にログを書き出すやつ - sarクライアントで取り出したりする
iostat
- I/O
- CPU
Linux 4.19.84-microsoft-standard (DESKTOP-2PJLLS0) 02/02/20 _x86_64_ (6 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.41 0.00 0.14 0.00 0.00 99.44 Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 0.56 0.01 283.69 165 4195632 sdb 3.89 49.66 43.73 734501 646704
vmstat
- virtual memory
- CPUも
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 24867800 136884 835760 0 0 8 54 16 43 0 0 99 0 0
Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 localhost:33154 localhost:6010 ESTABLISHED tcp 0 0 localhost:6010 localhost:33144 ESTABLISHED ... Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path unix 3 [ ] STREAM CONNECTED 1564 unix 3 [ ] STREAM CONNECTED 1563 ...
PostgreSQLに問題が起きていないか?を監視する
必要なプロセスは正常に動作しているか
ps
コマンドpg_stat_activity
ビュー- 公式
- 1行あたり1サーバプロセス
- 実行中のクエリ情報とか
postgres=# TABLE pg_stat_activity; datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type -------+----------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------+------------------------------ | | 60 | | | | | | | 2020-02-02 05:55:26.707822+00 | | | | Activity | AutoVacuumMain | | | | | autovacuum launcher | | 62 | 10 | postgres | | | | | 2020-02-02 05:55:26.708161+00 | | | | Activity | LogicalLauncherMain | | | | | logical replication launcher 13408 | postgres | 674 | 10 | postgres | psql | | | -1 | 2020-02-02 07:52:49.013246+00 | 2020-02-02 07:53:08.898073+00 | 2020-02-02 07:53:08.898073+00 | 2020-02-02 07:53:08.898074+00 | | | active | | 489 | TABLE pg_stat_activity; | client backend | | 58 | | | | | | | 2020-02-02 05:55:26.707524+00 | | | | Activity | BgWriterHibernate | | | | | background writer | | 57 | | | | | | | 2020-02-02 05:55:26.707363+00 | | | | Activity | CheckpointerMain | | | | | checkpointer | | 59 | | | | | | | 2020-02-02 05:55:26.707688+00 | | | | Activity | WalWriterMain | | | | | walwriter (6 rows)
postgres=# TABLE pg_stat_archiver; archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset ----------------+-------------------+--------------------+--------------+-----------------+------------------+------------------------------- 0 | | | 0 | | | 2020-02-02 05:55:26.500914+00 (1 row)
pg_stat_progress_vacuum
ビュー- 公式
- 1行あたりVACUUMを実行しているバックエンド1プロセス
- autovacuumワーカ含む
postgres=# TABLE pg_stat_progress_vacuum; pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples -----+-------+---------+-------+-------+-----------------+-------------------+--------------------+--------------------+-----------------+----------------- (0 rows)
- C-S接続確認
ディスク容量に問題はないか
df
,du
コマンドpg_database_size(<database_name>)
関数
postgres=# SELECT pg_database_size('postgres'); pg_database_size ------------------ 8135535 (1 row)
pg_total_relation_size(<relation_size>)
関数
postgres=# CREATE TABLE tbl (col int); CREATE TABLE postgres=# INSERT INTO tbl values (1); INSERT 0 1 postgres=# SELECT pg_total_relation_size('tbl'); pg_total_relation_size ------------------------ 8192 (1 row)
- 1ページ(8kB)に収まっている
- テーブル、インデックスの容量やスキャン履歴も監視対象
postgres=# CREATE INDEX idx ON tbl USING btree (col); CREATE INDEX
pg_stat_user_tables
ビュー
postgres=# TABLE pg_stat_user_tables; relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count -------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------------- 16386 | public | tbl | 1 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 0 | 2 | | | | | 0 | 0 | 0 | 0 (1 row)
pg_stat_user_indexes
ビュー
postgres=# TABLE pg_stat_user_indexes; relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch -------+------------+------------+---------+--------------+----------+--------------+--------------- 16386 | 16389 | public | tbl | idx | 0 | 0 | 0 (1 row)
想定通りの性能を出せているか
log_min_duration_statement
でスロークエリをロギングできる
#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements # and their durations, > 0 logs only # statements running at least this number # of milliseconds
- 当該設定項目について
postgres=# SELECT name,setting,context,unit,boot_val FROM pg_settings WHERE name='log_min_duration_statement'; name | setting | context | unit | boot_val ----------------------------+---------+-----------+------+---------- log_min_duration_statement | -1 | superuser | ms | -1 (1 row)
- スーパーユーザのSET文で設定変更可能
- ミリ秒単位
- -1で無効
- 0以上ならば、設定値ミリ秒よりも長時間かかったクエリをロギング
postgres=# SET log_min_duration_statement=0; SET postgres=# SELECT * FROM tbl; col ----- 1 2 (2 rows)
- stderr(デフォルトのログ出力先)を見てみる
2020-02-02 08:20:34.384 UTC [674] LOG: duration: 0.041 ms statement: SET log_min_duration_statement=0; 2020-02-02 08:20:44.917 UTC [674] LOG: duration: 0.101 ms statement: SELECT * FROM tbl;
log_min_duration_statement
の設定を変更したSET
文自体のログも出るよう
PostgreSQLのログに異常を知らせる通知はないか
- デフォルトで最低限なので適宜設定する
PostgreSQLログの設定
- 実運用ではデータベースに触れることが許されないことがある
- システムカタログを参照できない
- のでログが大事
PostgreSQLログをどこに出力するか
postgresql.conf
#------------------------------------------------------------------------------ # REPORTING AND LOGGING #------------------------------------------------------------------------------ ... # - Where to Log - #log_destination = 'stderr' # Valid values are combinations of # stderr, csvlog, syslog, and eventlog, # depending on platform. csvlog # requires logging_collector to be on. # This is used when logging to stderr: #logging_collector = off # Enable capturing of stderr and csvlog # into log files. Required to be on for # csvlogs. # (change requires restart) # These are only used if logging_collector is on: #log_directory = 'log' # directory where log files are written, # can be absolute or relative to PGDATA #log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, # can include strftime() escapes #log_file_mode = 0600 # creation mode for log files, # begin with 0 to use octal notation #log_truncate_on_rotation = off # If on, an existing log file with the # same name as the new log file will be # truncated rather than appended to. # But such truncation only occurs on # time-driven rotation, not on restarts # or size-driven rotation. Default is # off, meaning append to existing files # in all cases. #log_rotation_age = 1d # Automatic rotation of logfiles will # happen after that time. 0 disables. #log_rotation_size = 10MB # Automatic rotation of logfiles will # happen after that much log output. # 0 disables. # These are relevant when logging to syslog: #syslog_facility = 'LOCAL0' #syslog_ident = 'postgres' #syslog_sequence_numbers = on #syslog_split_messages = on # This is only relevant when logging to eventlog (win32): # (change requires restart) #event_source = 'PostgreSQL'
PostgreSQLログをいつ出力するか
postgresql.conf
#------------------------------------------------------------------------------ # CLIENT CONNECTION DEFAULTS #------------------------------------------------------------------------------ # - Statement Behavior - #client_min_messages = notice # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # log # notice # warning # error ... #------------------------------------------------------------------------------ # REPORTING AND LOGGING #------------------------------------------------------------------------------ ... # - When to Log - #log_min_messages = warning # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # info # notice # warning # error # log # fatal # panic #log_min_error_statement = error # values in order of decreasing detail: # debug5 # debug4 # debug3 # debug2 # debug1 # info # notice # warning # error # log # fatal # panic (effectively off) #log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements # and their durations, > 0 logs only # statements running at least this number # of milliseconds #log_transaction_sample_rate = 0.0 # Fraction of transactions whose statements # are logged regardless of their duration. 1.0 logs all # statements from all transactions, 0.0 never logs.
PostgreSQLログに何を出力するか
postgresql.conf
#------------------------------------------------------------------------------ # AUTOVACUUM #------------------------------------------------------------------------------ ... #log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds. ... #------------------------------------------------------------------------------ # REPORTING AND LOGGING #------------------------------------------------------------------------------ # - What to Log - #debug_print_parse = off #debug_print_rewritten = off #debug_print_plan = off #debug_pretty_print = on #log_checkpoints = off #log_connections = off #log_disconnections = off #log_duration = off #log_error_verbosity = default # terse, default, or verbose messages #log_hostname = off #log_line_prefix = '%m [%p] ' # special values: # %a = application name # %u = user name # %d = database name # %r = remote host and port # %h = remote host # %p = process ID # %t = timestamp without milliseconds # %m = timestamp with milliseconds # %n = timestamp with milliseconds (as a Unix epoch) # %i = command tag # %e = SQL state # %c = session ID # %l = session line number # %s = session start timestamp # %v = virtual transaction ID # %x = transaction ID (0 if none) # %q = stop here in non-session # processes # %% = '%' # e.g. '<%u%%%d> ' #log_lock_waits = off # log lock waits >= deadlock_timeout #log_statement = 'none' # none, ddl, mod, all #log_replication_commands = off #log_temp_files = -1 # log temporary files equal or larger # than the specified size in kilobytes; # -1 disables, 0 logs all temp files log_timezone = 'Etc/UTC'
- ログ出力内容例
2020-02-02 08:20:44.917 UTC [674] LOG: duration: 0.101 ms statement: SELECT * FROM tbl;
- prefixはこの部分:
2020-02-02 08:20:44.917 UTC [674]
- デフォルトは
'%m [%p] '
%m
: millisec単位のタイムスタンプ (TZ)%p
: PID
PostgreSQLログをどのように保持するか
- ログローテート
- syslog使用時
- syslogのログローテート機能を使う
- syslog不使用時
- PostgreSQLのログローテート機能を使う
- syslog使用時
このへん
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, # can include strftime() escapes ... #log_truncate_on_rotation = off # If on, an existing log file with the # same name as the new log file will be # truncated rather than appended to. # But such truncation only occurs on # time-driven rotation, not on restarts # or size-driven rotation. Default is # off, meaning append to existing files # in all cases. #log_rotation_age = 1d # Automatic rotation of logfiles will # happen after that time. 0 disables. #log_rotation_size = 10MB # Automatic rotation of logfiles will # happen after that much log output. # 0 disables.
- 間隔、もしくはログファイルサイズ、もしくはその両方でログローテートできる