gihyo.jp
サーバログの確認
2020-02-23 17:50:32.592 UTC [111] LOG: logical replication apply worker for subscription "mysub" has started
2020-02-23 17:50:32.595 UTC [113] LOG: starting logical decoding for slot "mysub"
- 例: パブリッシャの
wal_level
が足りてないとエラーが出てたりする
2020-02-23 17:59:04.958 UTC [12] FATAL: logical replication slot "mysub" exists, but wal_level < logical
2020-02-23 17:59:04.958 UTC [12] HINT: Change wal_level to be logical or higher.
2020-02-23 18:16:43.852 UTC [114] LOG: standby "mysub" is now a synchronous standby with priority 1
プロセスの確認
docker-compose exec pub ps auxfww | grep walsender
postgres 113 0.0 0.0 214648 13376 ? Ss 17:50 0:00 postgres: walsender postgres 192.168.240.3(42344) idle
- サブスクライバ
- logical replication workerの動作を確認
docker-compose exec sub ps auxfww | grep 'logical replication'
postgres 110 0.0 0.0 214100 6796 ? Ss 17:50 0:00 postgres: logical replication launcher
postgres 111 0.0 0.0 215604 14984 ? Ss 17:50 0:00 postgres: logical replication worker for subscription 16390
- パブリッシャ側:
pg_stat_replication
testdb=# SELECT application_name,state,sent_lsn,write_lsn,flush_lsn,replay_lsn,sync_priority,sync_state from pg_stat_replication;
-[ RECORD 1 ]----+----------
application_name | mysub
state | streaming
sent_lsn | 0/1660800
write_lsn | 0/1660800
flush_lsn | 0/1660800
replay_lsn | 0/1660800
sync_priority | 0
sync_state | async
testdb=# SELECT application_name,state,sent_lsn,write_lsn,flush_lsn,replay_lsn,sync_priority,sync_state from pg_stat_replication;
-[ RECORD 1 ]----+----------
application_name | mysub
state | streaming
sent_lsn | 0/16606B8
write_lsn | 0/16606B8
flush_lsn | 0/16606B8
replay_lsn | 0/16606B8
sync_priority | 1
sync_state | sync
- サブスクライバ側:
pg_stat_subscription
testdb=# SELECT * FROM pg_stat_subscription;
-[ RECORD 1 ]---------+------------------------------
subid | 16390
subname | mysub
pid | 112
relid |
received_lsn | 0/16606B8
last_msg_send_time | 2020-02-23 18:22:34.793896+00
last_msg_receipt_time | 2020-02-23 18:22:34.793973+00
latest_end_lsn | 0/16606B8
latest_end_time | 2020-02-23 18:22:34.793896+00
- フェイルオーバー
- 同期レプリケーションなら
synchronous_standby_names
の設定が必要
- 非同期ならそれも不要
- レプリケーションスロットの対処は必要
SELECT pg_drop_replication_slot('mysub');
ERROR: replication slot "mysub" is active for PID 114
docker-compose stop sub
logical_replication_sub_1 exited with code 0
SELECT pg_drop_replication_slot('mysub');
- スロットがない状態でsubを起動(復旧)するとエラーログが出る
docker-compose start sub
2020-02-23 18:40:57.794 UTC [29] LOG: logical replication apply worker for subscription "mysub" has started
2020-02-23 18:40:57.800 UTC [29] ERROR: could not start WAL streaming: ERROR: replication slot "mysub" does not exist
2020-02-23 18:40:57.801 UTC [1] LOG: background worker "logical replication worker" (PID 29) exited with exit code 1
SELECT pg_create_logical_replication_slot('mysub', 'pgoutput');
pg_create_logical_replication_slot
------------------------------------
(mysub,0/16606F0)
(1 row)
2020-02-23 18:41:02.809 UTC [30] LOG: logical replication apply worker for subscription "mysub" has started
コンフリクトの対処
INSERT INTO testtbl VALUES(1,1);
TABLE testtbl;
i | j
---+---
1 | 1
(1 row)
INSERT INTO testtbl VALUES(1,2);
INSERT INTO testtbl VALUES(2,2);
TABLE testtbl;
i | j
---+---
1 | 2
2 | 2
(2 rows)
- コンフリクトがおきる
- 同期レプリケーションだとsubからcommitが帰ってこない
- 以降、コンフリクトを解消するまでpubの変更の一切がsubに反映されない ((2,2)の行)
- subのログ
2020-02-23 18:51:10.560 UTC [30] ERROR: duplicate key value violates unique constraint "testtbl_pkey"
2020-02-23 18:51:10.560 UTC [30] DETAIL: Key (i)=(1) already exists.
2020-02-23 18:51:10.561 UTC [1] LOG: background worker "logical replication worker" (PID 30) exited with exit code 1
- ロジカルレプリケーションのコンフリクトを解消するには、DBAが介入してpub/subどちらかの変更を選択する
解消方法A: pub側をとる
2020-02-23 18:51:10.560 UTC [30] ERROR: duplicate key value violates unique constraint "testtbl_pkey"
2020-02-23 18:51:10.560 UTC [30] DETAIL: Key (i)=(1) already exists.
DELETE FROM testtbl WHERE i=1;
TABLE testtbl;
i | j
---+---
1 | 2
2 | 2
(2 rows)
- コンフリクトは解消され、溜まっていたpubの変更が順次反映される
解消方法B: sub側をとる
- 当該更新のwal適用をスキップする
- 【pub側】現在のLSNを取得
SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/1669ED0
(1 row)
SELECT external_id FROM pg_replication_origin_status;
external_id
-------------
pg_16390
(1 row)
SELECT pg_replication_origin_advance('pg_16390','0/1669ED0');
pg_replication_origin_advance
-------------------------------
(1 row)
TABLE testtbl;
i | j
---+---
1 | 1
(1 row)
(2,2)
挿入後のlsnまでスキップしたので、(2,2)
の追加もスキップされる
- 【宿題】
(2,2)
を失わないスキップ先lsnの調べ方を調べる