gihyo.jp
PITRの仕組み
- WALレコード適用によるリカバリが前提
- 【補】ベースバックアップの適用はリストア、その後のWALレコード適用によるロールフォワードはリカバリ…だったはず
WALレコード適用までの流れ
pg_control
ファイルを読み込む
- PostgreSQL起動時に一番最初に読み込まれるやつ
- 情報
- 前回正しくシャットダウンされたか
- その時の各種ID値
- など
recovery.conf
ファイルを読み込む
restore_command
やrestore_target_time
などの読み込み
- 【補】PostgreSQL12では
postgresql.conf
に統合された
backup_label
ファイルを読み込む
pg_control
ファイル更新、backup_label
削除
- 必要なWALを繰り返し適用する
pg_controlファイル
pg_control
ファイルの中身
global/
にある
- バイナリファイルなので、専用のコマンドで読み取る
pg_controldata
pg_control version number: 1201
Catalog version number: 201909212
Database system identifier: 6793684186596712485
Database cluster state: in production
pg_control last modified: Sat 15 Feb 2020 02:52:49 PM UTC
Latest checkpoint location: 0/2000060
Latest checkpoint's REDO location: 0/2000028
Latest checkpoint's REDO WAL file: 000000010000000000000002
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:487
Latest checkpoint's NextOID: 13409
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 480
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 487
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Sat 15 Feb 2020 02:52:49 PM UTC
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: 63ab0ea985b039f171ead9ad040aba73e6d2c3e827582317c5ee7bc6e236676d
recovery.confファイル
- PostgreSQL12で無くなった
- 公式
recovery.conf
があると起動不可能になる
recovery.signal
、standby.signal
ファイルを使おう
スタンバイサーバの設定
standby_mode
設定はもうない
primary_conninfo
- primaryへの接続情報
- ユーザとか認証とか
- secondaryのアプリケーション名とか
promote_trigger_file
- この設定で指定したファイルが存在すればプライマリに昇格
#archive_command = '' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
# %f = file name only
# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#restore_command = '' # command to use to restore an archived logfile segment
# placeholders: %p = path of file to restore
# %f = file name only
# e.g. 'cp /mnt/server/archivedir/%f %p'
# (change requires restart)
- confファイルのコメントに書いていないこととして、
%r
を使える
- 公式
- 「リカバリ中のリスタートポイントのWALレコードを含むWALファイル名」に展開される
If restartwalfile is specified, normally by using the %r
macro, then all WAL files logically preceding this file
will be removed from archivelocation.
archive_cleanup_command
- リスタートポイントごとに実行
pg_archivecleanup
コマンドと組み合わせて不要なアーカイブファイルを削除できる
- 公式
archive_cleanup_command = 'pg_archivecleanup /mnt/server/archivedir %r'
recovery_end_command
- リカバリ完了時に実行するコマンド
- やはり
%r
とか使える
#recovery_end_command = '' # command to execute at completion of recovery
#recovery_target = '' # 'immediate' to end recovery as soon as a
# consistent state is reached
# (change requires restart)
#recovery_target_name = '' # the named restore point to which recovery will proceed
# (change requires restart)
#recovery_target_time = '' # the time stamp up to which recovery will proceed
# (change requires restart)
#recovery_target_xid = '' # the transaction ID up to which recovery will proceed
# (change requires restart)
#recovery_target_lsn = '' # the WAL LSN up to which recovery will proceed
# (change requires restart)
#recovery_target_inclusive = on # Specifies whether to stop:
# just after the specified recovery target (on)
# just before the recovery target (off)
# (change requires restart)
#recovery_target_timeline = 'latest' # 'current', 'latest', or timeline ID
# (change requires restart)
#recovery_target_action = 'pause' # 'pause', 'promote', 'shutdown'
# (change requires restart)
recover_target_name
- あらかじめ作っておいたリストアポイント名を指定する
postgres=# SELECT pg_create_restore_point('restart');
pg_create_restore_point
-------------------------
0/50001B0
(1 row)
pg_waldump ./pg_wal/000000010000000000000005
...
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/05000098, prev 0/05000060, desc: CHECKPOINT_ONLINE redo 0/5000060; tli 1; prev tli 1; fpw true; xid 0:487; oid 13409; multi 1; offset 0; oldest xid 480 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 487; online
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/05000110, prev 0/05000098, desc: RUNNING_XACTS nextXid 487 latestCompletedXid 486 oldestRunningXid 487
rmgr: XLOG len (rec/tot): 98/ 98, tx: 0, lsn: 0/05000148, prev 0/05000110, desc: RESTORE_POINT restart
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/050001B0, prev 0/05000148, desc: RUNNING_XACTS nextXid 487 latestCompletedXid 486 oldestRunningXid 487
...
Column: タイムラインとリカバリ
- タイムラインID: walファイル名の先頭8文字のやつ
ls -F pg_wal/
000000010000000000000004.00000028.backup
000000010000000000000005
000000010000000000000006
2020-02-10 15:53:12.604 UTC [29] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
2020-02-10 15:53:15.994 UTC [30] LOG: connection received: host=172.24.0.5 port=60854
2020-02-10 15:53:15.994 UTC [30] LOG: connection authorized: user=postgres database=postgres application_name=psql
2020-02-10 15:53:15.998 UTC [17] LOG: received promote request
2020-02-10 15:53:15.998 UTC [17] LOG: redo done at 0/3000060
2020-02-10 15:53:16.028 UTC [17] LOG: selected new timeline ID: 2
2020-02-10 15:53:16.139 UTC [17] LOG: archive recovery complete
2020-02-10 15:53:16.155 UTC [16] LOG: database system is ready to accept connections
recovery_target_timeline
- リカバリ手順を誤り、latest以外のタイムライン上の特定の時刻に戻したくなった場合などに利用する
バックアップ/リカバリの運用手順
バックアップ手順
mkdir /backup /archive
psql -c "SELECT pg_start_backup('backup')"
pg_start_backup
-----------------
0/7000028
(1 row)
- walとpostmaster.pidを除いてベースバックアップ
rsync -av --delete --exclude=pg_wal/* --exclude=postmaster.pid $PGDATA/* /backup/data
psql -c "SELECT pg_stop_backup()"
NOTICE: all required WAL segments have been archived
pg_stop_backup
----------------
0/7000138
(1 row)
pg_ctl stop
waiting for server to shut down...2020-02-15 16:37:51.748 UTC [992] LOG: received fast shutdown request
.2020-02-15 16:37:51.757 UTC [992] LOG: aborting any active transactions
2020-02-15 16:37:51.758 UTC [992] LOG: background worker "logical replication launcher" (PID 1000) exited with exit code 1
2020-02-15 16:37:51.758 UTC [994] LOG: shutting down
2020-02-15 16:37:51.801 UTC [992] LOG: database system is shut down
done
server stopped
- ベースバックアップ以降のwalファイル、データベースクラスタを退避する
- walファイル: ロールフォワード用
- データベースクラスタ: 設定ファイルとかテーブルスペースとか
cp -r ${PGDATA}/pg_wal /tmp/
mv ${PGDATA} ${PGDATA}.temp
rsync -av /backup/data/* ${PGDATA}
cp -R /tmp/pg_wal/* ${PGDATA}/pg_wal/
postgresql.conf
(旧recovery.conf
)設定する
restore_command='cp /archive/%f %p'
touch ${PGDATA}/recovery.signal
pg_ctl start
waiting for server to start....2020-02-15 16:47:30.463 UTC [1357] LOG: starting PostgreSQL 12.1 (Debian 12.1-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-02-15 16:47:30.463 UTC [1357] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-02-15 16:47:30.463 UTC [1357] LOG: listening on IPv6 address "::", port 5432
2020-02-15 16:47:30.468 UTC [1357] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-02-15 16:47:30.486 UTC [1358] LOG: database system was interrupted; last known up at 2020-02-15 16:36:53 UTC
cp: cannot stat '/archive/00000002.history': No such file or directory
2020-02-15 16:47:30.595 UTC [1358] LOG: starting archive recovery
2020-02-15 16:47:30.613 UTC [1358] LOG: restored log file "000000010000000000000007" from archive
2020-02-15 16:47:30.636 UTC [1358] LOG: redo starts at 0/7000028
2020-02-15 16:47:30.638 UTC [1358] LOG: consistent recovery state reached at 0/7000138
2020-02-15 16:47:30.638 UTC [1357] LOG: database system is ready to accept read only connections
done
server started
postgres@9c29c6378da7:/$ 2020-02-15 16:47:30.650 UTC [1358] LOG: restored log file "000000010000000000000008" from archive
cp: cannot stat '/archive/000000010000000000000009': No such file or directory
2020-02-15 16:47:30.664 UTC [1358] LOG: invalid record length at 0/90000A0: wanted 24, got 0
2020-02-15 16:47:30.664 UTC [1358] LOG: redo done at 0/9000028
cp: cannot stat '/archive/000000010000000000000009': No such file or directory
cp: cannot stat '/archive/00000002.history': No such file or directory
2020-02-15 16:47:30.671 UTC [1358] LOG: selected new timeline ID: 2
2020-02-15 16:47:30.788 UTC [1358] LOG: archive recovery complete
cp: cannot stat '/archive/00000001.history': No such file or directory
2020-02-15 16:47:30.828 UTC [1357] LOG: database system is ready to accept connections