勉強日記

チラ裏

内部構造から学ぶPostgreSQL 設計・運用計画の鉄則 ch8 監視計画

gihyo.jp


監視とは

  • 公式
  • 計画する内容:
  • 監視項目
  • 監視間隔
    • SLA等に照らして必要十分な間隔で計測する
    • 監視内容により監視間隔も変わってくる
      • 死活は数秒とか
      • 性能は数分とか
  • 閾値
    • 想定、実績、閾値をもって、「異常」の定義をする

監視項目の選定

  • 最終目的は「データベースが健全に動作しているか」の監視
  • そのためにブレークダウン
    • サーバに問題が起きていないか
    • PostgreSQLに問題が起きていないか

サーバに問題が起きていないか?を監視する

  • Unixの監視ツールをつかう
  • sar
    • CPU,メモリ,I/O,ネットワークぜんぶ
    • sysstatデーモンが動いていて/var/log/sysstat/sa<日>にログを書き出すやつ
    • sarクライアントで取り出したりする
  • iostat
    • I/O
    • CPU
Linux 4.19.84-microsoft-standard (DESKTOP-2PJLLS0)   02/02/20    _x86_64_    (6 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.41    0.00    0.14    0.00    0.00   99.44

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.56         0.01       283.69        165    4195632
sdb               3.89        49.66        43.73     734501     646704
  • vmstat
    • virtual memory
    • CPUも
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 24867800 136884 835760    0    0     8    54   16   43  0  0 99  0  0
  • top
    • 各プロセスのCPU、メモリ
  • netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 localhost:33154         localhost:6010          ESTABLISHED
tcp        0      0 localhost:6010          localhost:33144         ESTABLISHED
...
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  3      [ ]         STREAM     CONNECTED     1564     
unix  3      [ ]         STREAM     CONNECTED     1563     
...

PostgreSQLに問題が起きていないか?を監視する

必要なプロセスは正常に動作しているか

  • psコマンド
  • pg_stat_activityビュー
    • 公式
    • 1行あたり1サーバプロセス
      • 実行中のクエリ情報とか
postgres=# TABLE pg_stat_activity;
 datid | datname  | pid | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  | backend_xid | backend_xmin |          query          |         backend_type         
-------+----------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------+------------------------------
       |          |  60 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707822+00 |                               |                               |                               | Activity        | AutoVacuumMain      |        |             |              |                         | autovacuum launcher
       |          |  62 |       10 | postgres |                  |             |                 |             | 2020-02-02 05:55:26.708161+00 |                               |                               |                               | Activity        | LogicalLauncherMain |        |             |              |                         | logical replication launcher
 13408 | postgres | 674 |       10 | postgres | psql             |             |                 |          -1 | 2020-02-02 07:52:49.013246+00 | 2020-02-02 07:53:08.898073+00 | 2020-02-02 07:53:08.898073+00 | 2020-02-02 07:53:08.898074+00 |                 |                     | active |             |          489 | TABLE pg_stat_activity; | client backend
       |          |  58 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707524+00 |                               |                               |                               | Activity        | BgWriterHibernate   |        |             |              |                         | background writer
       |          |  57 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707363+00 |                               |                               |                               | Activity        | CheckpointerMain    |        |             |              |                         | checkpointer
       |          |  59 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707688+00 |                               |                               |                               | Activity        | WalWriterMain       |        |             |              |                         | walwriter
(6 rows)
postgres=# TABLE pg_stat_archiver;
 archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time |          stats_reset          
----------------+-------------------+--------------------+--------------+-----------------+------------------+-------------------------------
              0 |                   |                    |            0 |                 |                  | 2020-02-02 05:55:26.500914+00
(1 row)
  • pg_stat_progress_vacuumビュー
    • 公式
    • 1行あたりVACUUMを実行しているバックエンド1プロセス
      • autovacuumワーカ含む
postgres=# TABLE pg_stat_progress_vacuum;
 pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-----+-------+---------+-------+-------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
(0 rows)
  • C-S接続確認

ディスク容量に問題はないか

  • df, duコマンド
  • pg_database_size(<database_name>)関数
postgres=# SELECT pg_database_size('postgres');
 pg_database_size 
------------------
          8135535
(1 row)
  • pg_total_relation_size(<relation_size>)関数
postgres=# CREATE TABLE tbl (col int);
CREATE TABLE

postgres=# INSERT INTO tbl values (1);
INSERT 0 1

postgres=# SELECT pg_total_relation_size('tbl');
 pg_total_relation_size 
------------------------
                   8192
(1 row)
  • 1ページ(8kB)に収まっている
  • テーブル、インデックスの容量やスキャン履歴も監視対象
postgres=# CREATE INDEX idx ON tbl USING btree (col);
CREATE INDEX
  • pg_stat_user_tablesビュー
postgres=# TABLE pg_stat_user_tables;
 relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count 
-------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
 16386 | public     | tbl     |        1 |            2 |        0 |             0 |         2 |         0 |         0 |             0 |          2 |          0 |                   2 |             |                 |              |                  |            0 |                0 |             0 |                 0
(1 row)
  • pg_stat_user_indexesビュー
postgres=# TABLE pg_stat_user_indexes;
 relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch 
-------+------------+------------+---------+--------------+----------+--------------+---------------
 16386 |      16389 | public     | tbl     | idx          |        0 |            0 |             0
(1 row)

想定通りの性能を出せているか

  • log_min_duration_statementでスロークエリをロギングできる
#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
                    # and their durations, > 0 logs only
                    # statements running at least this number
                    # of milliseconds
  • 当該設定項目について
postgres=# SELECT name,setting,context,unit,boot_val FROM pg_settings WHERE name='log_min_duration_statement';
            name            | setting |  context  | unit | boot_val 
----------------------------+---------+-----------+------+----------
 log_min_duration_statement | -1      | superuser | ms   | -1
(1 row)
  • スーパーユーザのSET文で設定変更可能
  • ミリ秒単位
  • -1で無効
  • 0以上ならば、設定値ミリ秒よりも長時間かかったクエリをロギング
postgres=# SET log_min_duration_statement=0;
SET


postgres=# SELECT * FROM tbl;
 col 
-----
   1
   2
(2 rows)
  • stderr(デフォルトのログ出力先)を見てみる
2020-02-02 08:20:34.384 UTC [674] LOG:  duration: 0.041 ms  statement: SET log_min_duration_statement=0;
2020-02-02 08:20:44.917 UTC [674] LOG:  duration: 0.101 ms  statement: SELECT * FROM tbl;
  • log_min_duration_statementの設定を変更したSET文自体のログも出るよう

PostgreSQLのログに異常を知らせる通知はないか

  • デフォルトで最低限なので適宜設定する

PostgreSQLログの設定

  • 実運用ではデータベースに触れることが許されないことがある
    • システムカタログを参照できない
  • のでログが大事

PostgreSQLログをどこに出力するか

postgresql.conf

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

...

# - Where to Log -

#log_destination = 'stderr'     # Valid values are combinations of
                    # stderr, csvlog, syslog, and eventlog,
                    # depending on platform.  csvlog
                    # requires logging_collector to be on.

# This is used when logging to stderr:
#logging_collector = off        # Enable capturing of stderr and csvlog
                    # into log files. Required to be on for
                    # csvlogs.
                    # (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'log'          # directory where log files are written,
                    # can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'    # log file name pattern,
                    # can include strftime() escapes
#log_file_mode = 0600           # creation mode for log files,
                    # begin with 0 to use octal notation
#log_truncate_on_rotation = off     # If on, an existing log file with the
                    # same name as the new log file will be
                    # truncated rather than appended to.
                    # But such truncation only occurs on
                    # time-driven rotation, not on restarts
                    # or size-driven rotation.  Default is
                    # off, meaning append to existing files
                    # in all cases.
#log_rotation_age = 1d          # Automatic rotation of logfiles will
                    # happen after that time.  0 disables.
#log_rotation_size = 10MB       # Automatic rotation of logfiles will
                    # happen after that much log output.
                    # 0 disables.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on

# This is only relevant when logging to eventlog (win32):
# (change requires restart)
#event_source = 'PostgreSQL'

PostgreSQLログをいつ出力するか

postgresql.conf

#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------

# - Statement Behavior -

#client_min_messages = notice       # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   log
                    #   notice
                    #   warning
                    #   error

...

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

...

# - When to Log -

#log_min_messages = warning     # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic

#log_min_error_statement = error    # values in order of decreasing detail:
                    #   debug5
                    #   debug4
                    #   debug3
                    #   debug2
                    #   debug1
                    #   info
                    #   notice
                    #   warning
                    #   error
                    #   log
                    #   fatal
                    #   panic (effectively off)

#log_min_duration_statement = -1    # -1 is disabled, 0 logs all statements
                    # and their durations, > 0 logs only
                    # statements running at least this number
                    # of milliseconds

#log_transaction_sample_rate = 0.0  # Fraction of transactions whose statements
                    # are logged regardless of their duration. 1.0 logs all
                    # statements from all transactions, 0.0 never logs.

PostgreSQLログに何を出力するか

postgresql.conf

#------------------------------------------------------------------------------
# AUTOVACUUM
#------------------------------------------------------------------------------

...

#log_autovacuum_min_duration = -1   # -1 disables, 0 logs all actions and
                    # their durations, > 0 logs only
                    # actions running at least this number
                    # of milliseconds.

...

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default      # terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = '%m [%p] '       # special values:
                    #   %a = application name
                    #   %u = user name
                    #   %d = database name
                    #   %r = remote host and port
                    #   %h = remote host
                    #   %p = process ID
                    #   %t = timestamp without milliseconds
                    #   %m = timestamp with milliseconds
                    #   %n = timestamp with milliseconds (as a Unix epoch)
                    #   %i = command tag
                    #   %e = SQL state
                    #   %c = session ID
                    #   %l = session line number
                    #   %s = session start timestamp
                    #   %v = virtual transaction ID
                    #   %x = transaction ID (0 if none)
                    #   %q = stop here in non-session
                    #        processes
                    #   %% = '%'
                    # e.g. '<%u%%%d> '
#log_lock_waits = off           # log lock waits >= deadlock_timeout
#log_statement = 'none'         # none, ddl, mod, all
#log_replication_commands = off
#log_temp_files = -1            # log temporary files equal or larger
                    # than the specified size in kilobytes;
                    # -1 disables, 0 logs all temp files
log_timezone = 'Etc/UTC'
  • ログ出力内容例
2020-02-02 08:20:44.917 UTC [674] LOG:  duration: 0.101 ms  statement: SELECT * FROM tbl;
  • prefixはこの部分:
2020-02-02 08:20:44.917 UTC [674]
  • デフォルトは'%m [%p] '
    • %m: millisec単位のタイムスタンプ (TZ)
    • %p: PID

PostgreSQLログをどのように保持するか

  • ログローテート
    • syslog使用時
      • syslogのログローテート機能を使う
    • syslog不使用時
      • PostgreSQLのログローテート機能を使う

このへん

#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
                    # can include strftime() escapes
...
#log_truncate_on_rotation = off     # If on, an existing log file with the
                    # same name as the new log file will be
                    # truncated rather than appended to.
                    # But such truncation only occurs on
                    # time-driven rotation, not on restarts
                    # or size-driven rotation.  Default is
                    # off, meaning append to existing files
                    # in all cases.
#log_rotation_age = 1d          # Automatic rotation of logfiles will
                    # happen after that time.  0 disables.
#log_rotation_size = 10MB       # Automatic rotation of logfiles will
                    # happen after that much log output.
                    # 0 disables.
  • 間隔、もしくはログファイルサイズ、もしくはその両方でログローテートできる

異常時の判断基準

  • 何をもって異常とするか
    • 想定と実績と閾値で判断する
    • そのための閾値を決定する