開發與維運

PostgreSQL pg_rewind,時間線修復,腦裂修復,flashback – 從庫開啟讀寫後,回退為只讀從庫。異步主從發生角色切換後,主庫rewind為新主庫的從庫

PostgreSQL pg_rewind,時間線修復,腦裂修復,flashback - 從庫開啟讀寫後,回退為只讀從庫。異步主從發生角色切換後,主庫rewind為新主庫的從庫

作者

digoal

日期

2019-01-28

標籤

PostgreSQL , pg_rewind , 主從切換 , 時間線修復 , 腦裂修復 , 從庫開啟讀寫後,回退為只讀從庫 , 異步主從發生角色切換後,主庫rewind為新主庫的從庫


背景

1、PG物理流複製的從庫,當激活後,可以開啟讀寫,使用pg_rewind可以將從庫回退為只讀從庫的角色。而不需要重建整個從庫。

2、當異步主從發生角色切換後,主庫的wal目錄中可能還有沒完全同步到從庫的內容,因此老的主庫無法直接切換為新主庫的從庫。使用pg_rewind可以修復老的主庫,使之成為新主庫的只讀從庫。而不需要重建整個從庫。

3、如果沒有pg_rewind,遇到以上情況,需要完全重建從庫。或者你可以使用存儲層快照,回退回腦裂以前的狀態。又或者可以使用文件系統快照,回退回腦裂以前的狀態。

原理與修復步驟

1、使用pg_rewind功能的前提條件:必須開啟full page write,必須開啟wal hint或者data block checksum。

2、需要被修復的庫:從激活點開始,所有的WAL必須存在pg_wal目錄中。如果WAL已經被覆蓋,只要有歸檔,拷貝到pg_wal目錄即可。

3、新的主庫,從激活點開始,產生的所有WAL必須存在pg_wal目錄中,或者已歸檔,並且被修復的庫可以使用restore_command訪問到這部分WAL。

4、修改(source db)新主庫或老主庫配置,允許連接。

5、修復時,連接新主庫,得到切換點。或連接老主庫,同時比對當前要修復的新主庫的TL與老主庫進行比對,得到切換點。

6、解析需要被修復的庫的從切換點到現在所有的WAL。同時連接source db(新主庫(或老主庫)),進行回退操作(被修改或刪除的BLOCK從source db獲取並覆蓋,新增的BLOCK,直接抹除。)回退到切換點的狀態。

7、修改被修復庫(target db)的recovery.conf, postgresql.conf配置。

8、啟動target db,連接source db接收WAL,或restore_command配置接收WAL,從切換點開始所有WAL,進行apply。

9、target db現在是source db的從庫。

以EDB PG 11為例講解

環境部署

《MTK使用 - PG,PPAS,oracle,mysql,ms sql,sybase 遷移到 PG, PPAS (支持跨版本升級)》

export PS1="$USER@`/bin/hostname -s`-> "      
export PGPORT=4000  
export PGDATA=/data04/ppas11/pg_root4000  
export LANG=en_US.utf8      
export PGHOME=/usr/edb/as11  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH      
export DATE=`date +"%Y%m%d%H%M"`    
export PATH=$PGHOME/bin:$PATH:.      
export MANPATH=$PGHOME/share/man:$MANPATH      
export PGHOST=127.0.0.1  
export PGUSER=postgres   
export PGDATABASE=postgres      
alias rm='rm -i'      
alias ll='ls -lh'      
unalias vi    

1、初始化數據庫集群

initdb -D /data04/ppas11/pg_root4000 -E UTF8 --lc-collate=C --lc-ctype=en_US.UTF8 -U postgres -k --redwood-like   

2、配置recovery.done

cd $PGDATA  
  
cp $PGHOME/share/recovery.conf.sample ./  
  
mv recovery.conf.sample recovery.done  
  
vi recovery.done  
  
restore_command = 'cp /data04/ppas11/wal/%f %p'  
recovery_target_timeline = 'latest'  
standby_mode = on  
primary_conninfo = 'host=localhost port=4000 user=postgres'  

3、配置postgresql.conf

要使用rewind功能:

必須開啟full_page_writes

必須開啟data_checksums或wal_log_hints

postgresql.conf  
  
listen_addresses = '0.0.0.0'  
port = 4000  
max_connections = 8000  
superuser_reserved_connections = 13  
unix_socket_directories = '.,/tmp'  
unix_socket_permissions = 0700  
tcp_keepalives_idle = 60  
tcp_keepalives_interval = 10  
tcp_keepalives_count = 10  
shared_buffers = 16GB  
max_prepared_transactions = 8000  
maintenance_work_mem = 1GB  
autovacuum_work_mem = 1GB  
dynamic_shared_memory_type = posix  
vacuum_cost_delay = 0  
bgwriter_delay = 10ms  
bgwriter_lru_maxpages = 1000  
bgwriter_lru_multiplier = 10.0  
effective_io_concurrency = 0  
max_worker_processes = 128  
max_parallel_maintenance_workers = 8  
max_parallel_workers_per_gather = 8  
max_parallel_workers = 24  
wal_level = replica  
synchronous_commit = off  
full_page_writes = on  
wal_compression = on  
wal_buffers = 32MB  
wal_writer_delay = 10ms  
checkpoint_timeout = 25min  
max_wal_size = 32GB  
min_wal_size = 8GB  
checkpoint_completion_target = 0.2  
archive_mode = on  
archive_command = 'cp -n %p /data04/ppas11/wal/%f'  
max_wal_senders = 16  
wal_keep_segments = 4096  
max_replication_slots = 16  
hot_standby = on  
max_standby_archive_delay = 300s  
max_standby_streaming_delay = 300s  
wal_receiver_status_interval = 1s  
wal_receiver_timeout = 10s  
random_page_cost = 1.1  
effective_cache_size = 400GB  
log_destination = 'csvlog'  
logging_collector = on  
log_directory = 'log'  
log_filename = 'edb-%a.log'  
log_truncate_on_rotation = on  
log_rotation_age = 1d  
log_rotation_size = 0  
log_min_duration_statement = 1s  
log_checkpoints = on  
log_error_verbosity = verbose  
log_line_prefix = '%t '  
log_lock_waits = on  
log_statement = 'ddl'  
log_timezone = 'PRC'  
autovacuum = on  
log_autovacuum_min_duration = 0  
autovacuum_max_workers = 6  
autovacuum_freeze_max_age = 1200000000  
autovacuum_multixact_freeze_max_age = 1400000000  
autovacuum_vacuum_cost_delay = 0  
statement_timeout = 0  
lock_timeout = 0  
idle_in_transaction_session_timeout = 0  
vacuum_freeze_table_age = 1150000000  
vacuum_multixact_freeze_table_age = 1150000000  
datestyle = 'redwood,show_time'  
timezone = 'PRC'  
lc_messages = 'en_US.utf8'  
lc_monetary = 'en_US.utf8'  
lc_numeric = 'en_US.utf8'  
lc_time = 'en_US.utf8'  
default_text_search_config = 'pg_catalog.english'  
shared_preload_libraries = 'auto_explain,pg_stat_statements,$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq'  
edb_redwood_date = on  
edb_redwood_greatest_least = on  
edb_redwood_strings = on  
db_dialect = 'redwood'                
edb_dynatune = 66  
edb_dynatune_profile = oltp  
timed_statistics = off  

4、配置pg_hba.conf,允許流複製

local   all             all                                     trust  
host    all             all             127.0.0.1/32            trust  
host    all             all             ::1/128                 trust  
local   replication     all                                     trust  
host    replication     all             127.0.0.1/32            trust  
host    replication     all             ::1/128                 trust  
host all all 0.0.0.0/0 md5  

5、配置歸檔目錄

mkdir /data04/ppas11/wal    
chown enterprisedb:enterprisedb /data04/ppas11/wal  

6、創建從庫

pg_basebackup -h 127.0.0.1 -p 4000 -D /data04/ppas11/pg_root4001 -F p -c fast  

7、配置從庫

cd /data04/ppas11/pg_root4001  
  
mv recovery.done recovery.conf  
vi postgresql.conf  
  
port = 4001  

8、啟動從庫

pg_ctl start -D /data04/ppas11/pg_root4001  

9、壓測主庫

pgbench -i -s 1000  
  
pgbench -M prepared -v -r -P 1 -c 24 -j 24 -T 300  

10、檢查歸檔

postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            240 | 0000000100000000000000F0 | 28-JAN-19 15:08:43.276965 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)  
  
postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            248 | 0000000100000000000000F8 | 28-JAN-19 15:08:45.120134 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)  

11、檢查從庫延遲

postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+---------------------------------  
pid              | 8124  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 62988  
backend_start    | 28-JAN-19 15:07:34.084542 +08:00  
backend_xmin     |   
state            | streaming  
sent_lsn         | 1/88BC2000  
write_lsn        | 1/88BC2000  
flush_lsn        | 1/88BC2000  
replay_lsn       | 1/88077D48  
write_lag        | 00:00:00.001417  
flush_lag        | 00:00:00.002221  
replay_lag       | 00:00:00.097657  
sync_priority    | 0  
sync_state       | async  

例子1,從庫激活後產生讀寫,使用pg_rewind修復從庫,回退到只讀從庫

1、激活從庫

pg_ctl promote -D /data04/ppas11/pg_root4001  

2、寫從庫

pgbench -M prepared -v -r -P 1 -c 4 -j 4 -T 120 -p 4001  

此時從庫已經和主庫不在一個時間線,無法直接變成當前主庫的從庫

enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i time  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Time of latest checkpoint:            Mon 28 Jan 2019 03:56:38 PM CST  
Min recovery ending loc's timeline:   2  
track_commit_timestamp setting:       off  
Date/time type storage:               64-bit integers  
  
enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i time  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Time of latest checkpoint:            Mon 28 Jan 2019 05:11:38 PM CST  
Min recovery ending loc's timeline:   0  
track_commit_timestamp setting:       off  
Date/time type storage:               64-bit integers  

3、修復從庫,使之繼續成為當前主庫的從庫

4、查看切換點

cd /data04/ppas11/pg_root4001  
  
ll pg_wal/*.history  
-rw------- 1 enterprisedb enterprisedb 42 Jan 28 17:15 pg_wal/00000002.history  
  
cat pg_wal/00000002.history  
1       6/48C62000      no recovery target specified  

5、從庫激活時間開始產生的WAL必須全部在pg_wal目錄中。

-rw------- 1 enterprisedb enterprisedb   42 Jan 28 17:15 00000002.history  
-rw------- 1 enterprisedb enterprisedb  16M Jan 28 17:16 000000020000000600000048  
............  

000000020000000600000048開始,所有的wal必須存在從庫pg_wal目錄中。如果已經覆蓋了,必須從歸檔目錄拷貝到從庫pg_wal目錄中。

6、從庫激活時,主庫從這個時間點開始所有的WAL還在pg_wal目錄,或者從庫可以使用restore_command獲得(recovery.conf)。

recovery.conf  
  
restore_command = 'cp /data04/ppas11/wal/%f %p'  

7、pg_rewind命令幫助

https://www.postgresql.org/docs/11/app-pgrewind.html

pg_rewind --help  
pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.  
  
Usage:  
  pg_rewind [OPTION]...  
  
Options:  
  -D, --target-pgdata=DIRECTORY  existing data directory to modify  
      --source-pgdata=DIRECTORY  source data directory to synchronize with  
      --source-server=CONNSTR    source server to synchronize with  
  -n, --dry-run                  stop before modifying anything  
  -P, --progress                 write progress messages  
      --debug                    write a lot of debug messages  
  -V, --version                  output version information, then exit  
  -?, --help                     show this help, then exit  
  
Report bugs to <[email protected]>.  

8、停庫(被修復的庫,停庫)

pg_ctl stop -m fast -D /data04/ppas11/pg_root4001  

9、嘗試修復

pg_rewind -n -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"  
  
servers diverged at WAL location 6/48C62000 on timeline 1  
rewinding from last common checkpoint at 5/5A8CD30 on timeline 1  
Done!  

10、嘗試正常,說明可以修復,實施修復

pg_rewind -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"  
  
servers diverged at WAL location 6/48C62000 on timeline 1  
rewinding from last common checkpoint at 5/5A8CD30 on timeline 1  
Done!  

11、已修復,改配置

cd /data04/ppas11/pg_root4001  
  
vi postgresql.conf  
port = 4001  
mv recovery.done recovery.conf  
  
vi recovery.conf  
  
restore_command = 'cp /data04/ppas11/wal/%f %p'   
recovery_target_timeline = 'latest'   
standby_mode = on   
primary_conninfo = 'host=localhost port=4000 user=postgres'   

12、刪除歸檔中錯誤時間線上產生的文件否則會在啟動修復後的從庫後,走到00000002時間線上,這是不想看到的。

mkdir /data04/ppas11/wal/error_tl_2  
  
mv /data04/ppas11/wal/00000002* /data04/ppas11/wal/error_tl_2  

13、啟動從庫

pg_ctl start -D /data04/ppas11/pg_root4001  

14、建議對主庫做一個檢查點,從庫收到檢查點後,重啟後不需要應用太多WAL,而是從新檢查點開始恢復

psql  
checkpoint;  

15、壓測主庫

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000  

16、查看歸檔狀態

postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
           1756 | 0000000100000006000000DC | 28-JAN-19 17:41:57.562425 +08:00 |            0 |                 |                  | 28-JAN-19 15:01:17.883338 +08:00  
(1 row)  

17、查看從庫健康、延遲,觀察修復後的情況

postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+--------------------------------  
pid              | 13179  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 63198  
backend_start    | 28-JAN-19 17:47:29.85308 +08:00  
backend_xmin     |   
state            | catchup  
sent_lsn         | 7/DDE80000  
write_lsn        | 7/DC000000  
flush_lsn        | 7/DC000000  
replay_lsn       | 7/26A8DCB0  
write_lag        | 00:00:18.373263  
flush_lag        | 00:00:18.373263  
replay_lag       | 00:00:18.373263  
sync_priority    | 0  
sync_state       | async  

例子2,從庫激活成為新主庫後,老主庫依舊有讀寫,使用pg_rewind修復老主庫,將老主庫降級為新主庫的從庫

1、激活從庫

pg_ctl promote -D /data04/ppas11/pg_root4001  

2、寫從庫

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001  

3、寫主庫

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000  

此時老主庫已經和新的主庫不在一個時間線

enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i timeline  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Min recovery ending loc's timeline:   0  
enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i timeline  
Latest checkpoint's TimeLineID:       1  
Latest checkpoint's PrevTimeLineID:   1  
Min recovery ending loc's timeline:   2  
  
  
enterprisedb@pg11-test-> cd /data04/ppas11/pg_root4001/pg_wal  
enterprisedb@pg11-test-> cat 00000002.history   
1       8/48DE2318      no recovery target specified  
  
enterprisedb@pg11-test-> ll *.partial  
-rw------- 1 enterprisedb enterprisedb 16M Jan 28 17:48 000000010000000800000048.partial  

4、修復老主庫,變成從庫

4.1、從庫激活時,老主庫從這個時間點開始所有的WAL,必須全部在pg_wal目錄中。

000000010000000800000048 開始的所有WAL必須存在pg_wal,如果已經覆蓋了,必須從WAL歸檔拷貝到pg_wal目錄

4.2、從庫激活時間開始產生的所有WAL,老主庫必須可以使用restore_command獲得(recovery.conf)。

recovery.conf  
  
restore_command = 'cp /data04/ppas11/wal/%f %p'  

5、關閉老主庫

pg_ctl stop -m fast -D /data04/ppas11/pg_root4000  

6、嘗試修復老主庫

pg_rewind -n -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"  
  
servers diverged at WAL location 8/48DE2318 on timeline 1  
rewinding from last common checkpoint at 6/CCCEF770 on timeline 1  
Done!  

7、嘗試成功,可以修復,實施修復

pg_rewind -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"  

8、修復完成後,改配置

cd /data04/ppas11/pg_root4000  
  
vi postgresql.conf  
port = 4000  
mv recovery.done recovery.conf  
  
vi recovery.conf  
  
restore_command = 'cp /data04/ppas11/wal/%f %p'   
recovery_target_timeline = 'latest'   
standby_mode = on   
primary_conninfo = 'host=localhost port=4001 user=postgres'    

9、啟動老主庫

pg_ctl start -D /data04/ppas11/pg_root4000  

10、建議對新主庫做一個檢查點,從庫收到檢查點後,重啟後不需要應用太多WAL,而是從新檢查點開始恢復

checkpoint;  

11、壓測新主庫

pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001  

12、查看歸檔狀態

psql -p 4001  
  
  
postgres=# select * from pg_stat_archiver ;  
 archived_count |    last_archived_wal     |        last_archived_time        | failed_count | last_failed_wal | last_failed_time |           stats_reset              
----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------  
            406 | 0000000200000009000000DB | 28-JAN-19 21:18:22.976118 +08:00 |            0 |                 |                  | 28-JAN-19 17:47:29.847488 +08:00  
(1 row)  

13、查看從庫健康、延遲

psql -p 4001  
  
postgres=# select * from pg_stat_replication ;  
-[ RECORD 1 ]----+---------------------------------  
pid              | 17675  
usesysid         | 10  
usename          | postgres  
application_name | walreceiver  
client_addr      | 127.0.0.1  
client_hostname  |   
client_port      | 60530  
backend_start    | 28-JAN-19 21:18:36.472197 +08:00  
backend_xmin     |   
state            | streaming  
sent_lsn         | 9/E8361C18  
write_lsn        | 9/E8361C18  
flush_lsn        | 9/E8361C18  
replay_lsn       | 9/D235B520  
write_lag        | 00:00:00.000101  
flush_lag        | 00:00:00.000184  
replay_lag       | 00:00:03.028098  
sync_priority    | 0  
sync_state       | async  

小結

1 適合場景

1、PG物理流複製的從庫,當激活後,可以開啟讀寫,使用pg_rewind可以將從庫回退為只讀從庫的角色。而不需要重建整個從庫。

2、當異步主從發生角色切換後,主庫的wal目錄中可能還有沒完全同步到從庫的內容,因此老的主庫無法直接切換為新主庫的從庫。使用pg_rewind可以修復老的主庫,使之成為新主庫的只讀從庫。而不需要重建整個從庫。

如果沒有pg_rewind,遇到以上情況,需要完全重建從庫,如果庫佔用空間很大,重建非常耗時,也非常耗費上游數據庫的資源(讀)。

2 前提

要使用rewind功能:

1、必須開啟full_page_writes

2、必須開啟data_checksums或wal_log_hints

initdb -k 開啟data_checksums  

3 原理與修復流程

1、使用pg_rewind功能的前提條件:必須開啟full page write,必須開啟wal hint或者data block checksum。

2、需要被修復的庫:從激活點開始,所有的WAL必須存在pg_wal目錄中。如果WAL已經被覆蓋,只要有歸檔,拷貝到pg_wal目錄即可。

3、新的主庫,從激活點開始,產生的所有WAL必須存在pg_wal目錄中,或者已歸檔,並且被修復的庫可以使用restore_command訪問到這部分WAL。

4、修改(source db)新主庫或老主庫配置,允許連接。

5、修復時,連接新主庫,得到切換點。或連接老主庫,同時比對當前要修復的新主庫的TL與老主庫進行比對,得到切換點。

6、解析需要被修復的庫的從切換點到現在所有的WAL。同時連接source db(新主庫(或老主庫)),進行回退操作(被修改或刪除的BLOCK從source db獲取並覆蓋,新增的BLOCK,直接抹除。)回退到切換點的狀態。

7、修改被修復庫(target db)的recovery.conf, postgresql.conf配置。

8、啟動target db,連接source db接收WAL,或restore_command配置接收WAL,從切換點開始所有WAL,進行apply。

9、target db現在是source db的從庫。

參考

https://www.postgresql.org/docs/11/app-pgrewind.html

《PostgreSQL primary-standby failback tools : pg_rewind》

《PostgreSQL 9.5 new feature - pg_rewind fast sync Split Brain Primary & Standby》

《PostgreSQL 9.5 add pg_rewind for Fast align for PostgreSQL unaligned primary & standby》

《MTK使用 - PG,PPAS,oracle,mysql,ms sql,sybase 遷移到 PG, PPAS (支持跨版本升級)》

免費領取阿里雲RDS PostgreSQL實例、ECS虛擬機

大量阿里雲PG解決方案: 任意維度實時圈人; 時序數據實時處理; 時間、空間、業務 多維數據實時透視; 獨立事件相關性分析; 海量關係實時圖式搜索; 社交業務案例; 流式數據實時處理案例; 物聯網; 全文檢索; 模糊、正則查詢案例; 圖像識別; 向量相似檢索; 數據清洗、採樣、脫敏、批處理、合併; GIS 地理信息空間數據應用; 金融業務; 異步消息應用案例; 海量數據 冷熱分離; 倒排索引案例; 海量數據OLAP處理應用;

德哥的 / digoal's PostgreSQL文章入口 - 努力做成PG資源最豐富的個人blog

德哥的微信 / digoal's wechat

Leave a Reply

Your email address will not be published. Required fields are marked *