pg_auto_failover 之三 automated failover[通俗易懂]

pg_auto_failover 之三 automated failover[通俗易懂]os:ubuntu16.04db:postgresql10.9pg_auto_failover是citus开源的一款postgresql高可用软件,目前只支持postgresql10及以上.pg_auto_failoverisanextensionandserviceforPostgreSQLthatmonitorsandmanagesauto…

大家好,欢迎来到IT知识分享网。

os: ubuntu 16.04
db: postgresql 10.9

pg_auto_failover 是 citus 开源的一款 postgresql 高可用软件,目前只支持 postgresql 10 及以上.

pg_auto_failover is an extension and service for PostgreSQL that monitors and manages automated failover for a Postgres cluster. It is optimized for simplicity and correctness and supports Postgres 10 and newer.

ip 规划

192.168.56.101 node1
192.168.56.102 node2
192.168.56.103 node3

IT知识分享网

node1 为 monitor 节点,node2 node3 为 postgresql 主从

monitor

node1 上查看状态

IT知识分享网$ pg_autoctl show state --pgdata /data/pg10/main/
 Name |   Port | Group |  Node |     Current State |    Assigned State
------+--------+-------+-------+-------------------+------------------
node2 |   5432 |     0 |    10 |           primary |           primary
node3 |   5432 |     0 |    11 |         secondary |         secondary


$ pg_autoctl show uri --formation default --pgdata /data/pg10/main/
postgres://node2:5432,node3:5432/postgres?target_session_attrs=read-write

postgresql primary 节点 node2 关闭电源,模拟宕机

观察 monitor node1 节点的输出

18:49:13 INFO  Setting goal state of node2:5432 to draining and node3:5432 to prepare_promotion after node2:5432 became unhealthy.
18:49:13 INFO  New state for node3:5432 in formation "default": secondary/prepare_promotion
18:49:13 INFO  New state for node2:5432 in formation "default": primary/draining

18:49:14 INFO  Node node3:5432 reported new state prepare_promotion
18:49:14 INFO  New state for node3:5432 in formation "default": prepare_promotion/prepare_promotion
18:49:14 INFO  Setting goal state of node2:5432 to demote_timeout and node3:5432 to stop_replication after node3:5432 converged to prepare_promotion.
18:49:14 INFO  New state for node3:5432 in formation "default": prepare_promotion/stop_replication
18:49:14 INFO  New state for node2:5432 in formation "default": primary/demote_timeout
18:49:14 INFO  Node node3:5432 reported new state stop_replication
18:49:14 INFO  New state for node3:5432 in formation "default": stop_replication/stop_replication

18:49:43 INFO  Setting goal state of node3:5432 to wait_primary and node2:5432 to demoted after the demote timeout expired.
18:49:43 INFO  New state for node3:5432 in formation "default": stop_replication/wait_primary
18:49:43 INFO  New state for node2:5432 in formation "default": primary/demoted

18:49:44 INFO  Node node3:5432 reported new state wait_primary
18:49:44 INFO  New state for node3:5432 in formation "default": wait_primary/wait_primary
IT知识分享网$ pg_autoctl show state --pgdata /data/pg10/main/
 Name |   Port | Group |  Node |     Current State |    Assigned State
------+--------+-------+-------+-------------------+------------------
node2 |   5432 |     0 |    10 |           primary |           demoted
node3 |   5432 |     0 |    11 |      wait_primary |      wait_primary

node3 为 wait_primary 是因为是同步复制,必须至少要有一个slave可用.

观察 keeper node3 节点的输出

18:48:52 INFO  Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
18:48:57 INFO  Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
18:49:02 INFO  Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
18:49:07 INFO  Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.

18:49:12 ERROR PostgreSQL cannot reach the primary server: the system view pg_stat_wal_receiver has no rows.
18:49:12 INFO  Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:12 INFO  FSM transition from "secondary" to "prepare_promotion": Stop traffic to primary, wait for it to finish draining.
18:49:12 INFO  Transition complete: current state is now "prepare_promotion"
18:49:13 INFO  Calling node_active for node default/11/0 with current state: prepare_promotion, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:13 INFO  FSM transition from "prepare_promotion" to "stop_replication": Prevent against split-brain situations.
18:49:13 INFO  Prevent writes to the promoted standby while the primary is not demoted yet, by making the service incompatible with target_session_attrs = read-write
18:49:13 INFO  Setting default_transaction_read_only to on

18:49:13 INFO  Promoting postgres

18:49:13 INFO  Other node in the HA group is node2:5432
18:49:13 INFO  Create replication slot "pgautofailover_standby"

18:49:13 INFO  Disabling synchronous replication

18:49:13 INFO  Transition complete: current state is now "stop_replication"
18:49:13 INFO  Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:18 INFO  Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:23 INFO  Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:28 INFO  Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:33 INFO  Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:38 INFO  Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:43 INFO  Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:43 INFO  FSM transition from "stop_replication" to "wait_primary": Confirmed promotion with the monitor
18:49:43 INFO  Setting default_transaction_read_only to off
18:49:43 INFO  Transition complete: current state is now "wait_primary"

18:49:43 INFO  Calling node_active for node default/11/0 with current state: wait_primary, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:48 INFO  Calling node_active for node default/11/0 with current state: wait_primary, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:53 INFO  Calling node_active for node default/11/0 with current state: wait_primary, PostgreSQL is running, sync_state is "", WAL delta is -1.

postgresql primary 节点 node2 开机

启动 keeper node2

$ pg_autoctl run --pgdata /data/pg10/main

18:57:29 INFO  Managing PostgreSQL installation at "/data/pg10/main"
18:57:29 ERROR Failed to signal pid 2351, read from Postgres pid file.
18:57:29 INFO  Is PostgreSQL at "/data/pg10/main" up and running?
18:57:29 ERROR Given --pgport 5432 doesn't match PostgreSQL port 0 from "/data/pg10/main/postmaster.pid"
18:57:29 FATAL Failed to discover PostgreSQL setup, please fix previous errors.
18:57:29 INFO  pg_autoctl service is starting
18:57:29 ERROR Failed to signal pid 2351, read from Postgres pid file.
18:57:29 INFO  Is PostgreSQL at "/data/pg10/main" up and running?
18:57:29 INFO  Calling node_active for node default/10/0 with current state: primary, PostgreSQL is not running, sync_state is "", WAL delta is -1.
18:57:29 INFO  Postgres is not running, starting postgres
18:57:29 INFO   /usr/bin/pg_ctl --pgdata /data/pg10/main --options "-p 5432" --options "-h *" --wait start
18:57:30 WARN  PostgreSQL was not running, restarted with pid 1961
18:57:30 INFO  FSM transition from "primary" to "demoted": A failover occurred, no longer primary
18:57:31 INFO  Transition complete: current state is now "demoted"
18:57:31 INFO  Calling node_active for node default/10/0 with current state: demoted, PostgreSQL is not running, sync_state is "", WAL delta is -1.
18:57:31 INFO  FSM transition from "demoted" to "catchingup": A new primary is available. First, try to rewind. If that fails, do a pg_basebackup.
18:57:31 INFO  The primary node returned by the monitor is node3:5432
18:57:31 INFO  Rewinding PostgreSQL to follow new primary node3:5432
18:57:31 INFO  pg_ctl: no server running

18:57:31 INFO  pg_ctl stop failed, but PostgreSQL is not running anyway
18:57:31 INFO  Running /usr/bin/pg_rewind --target-pgdata "/data/pg10/main" --source-server " host='node3' port=5432 user='pgautofailover_replicator' dbname='postgres'" --progress ...

观察 monitor node1 节点的输出

18:57:31 INFO  Node node2:5432 reported new state demoted
18:57:31 INFO  New state for node2:5432 in formation "default": demoted/demoted
18:57:31 INFO  Setting goal state of node2:5432 to catchingup after it converged to demotion and node3:5432 converged to wait_primary.
18:57:31 INFO  New state for node2:5432 in formation "default": demoted/catchingup
18:57:34 INFO  Node node2:5432 reported new state catchingup
18:57:34 INFO  New state for node2:5432 in formation "default": catchingup/catchingup
18:57:39 INFO  Setting goal state of node3:5432 to primary and node2:5432 to secondary after node2:5432 caught up.
18:57:39 INFO  New state for node2:5432 in formation "default": catchingup/secondary
18:57:39 INFO  New state for node3:5432 in formation "default": wait_primary/primary
18:57:40 INFO  Node node2:5432 reported new state secondary
18:57:40 INFO  New state for node2:5432 in formation "default": secondary/secondary
18:57:41 INFO  Node node3:5432 reported new state primary
18:57:41 INFO  New state for node3:5432 in formation "default": primary/primary

$ pg_autoctl show state --pgdata /data/pg10/main/
 Name |   Port | Group |  Node |     Current State |    Assigned State
------+--------+-------+-------+-------------------+------------------
node2 |   5432 |     0 |    10 |         secondary |         secondary
node3 |   5432 |     0 |    11 |           primary |           primary

异常宕机的老 master 通过 pg_rewind 变为 新 master 的 slave.
新 master 的状态也由 wait_primary 变为 primary

符合预期

参考:
https://github.com/citusdata/pg_auto_failover
https://pg-auto-failover.readthedocs.io/en/latest/quickstart.html

https://www.citusdata.com/blog/2019/05/30/introducing-pg-auto-failover/
https://cloudblogs.microsoft.com/opensource/2019/05/06/introducing-pg_auto_failover-postgresql-open-source-extension-automated-failover-high-availability/

免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/12092.html

(0)

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信