大家好,欢迎来到IT知识分享网。
os: ubuntu 16.04
db: postgresql 10.9
pg_auto_failover 是 citus 开源的一款 postgresql 高可用软件,目前只支持 postgresql 10 及以上.
pg_auto_failover is an extension and service for PostgreSQL that monitors and manages automated failover for a Postgres cluster. It is optimized for simplicity and correctness and supports Postgres 10 and newer.
ip 规划
192.168.56.101 node1
192.168.56.102 node2
192.168.56.103 node3
IT知识分享网
node1 为 monitor 节点,node2 node3 为 postgresql 主从
monitor
node1 上查看状态
IT知识分享网$ pg_autoctl show state --pgdata /data/pg10/main/
Name | Port | Group | Node | Current State | Assigned State
------+--------+-------+-------+-------------------+------------------
node2 | 5432 | 0 | 10 | primary | primary
node3 | 5432 | 0 | 11 | secondary | secondary
$ pg_autoctl show uri --formation default --pgdata /data/pg10/main/
postgres://node2:5432,node3:5432/postgres?target_session_attrs=read-write
postgresql primary 节点 node2 关闭电源,模拟宕机
观察 monitor node1 节点的输出
18:49:13 INFO Setting goal state of node2:5432 to draining and node3:5432 to prepare_promotion after node2:5432 became unhealthy.
18:49:13 INFO New state for node3:5432 in formation "default": secondary/prepare_promotion
18:49:13 INFO New state for node2:5432 in formation "default": primary/draining
18:49:14 INFO Node node3:5432 reported new state prepare_promotion
18:49:14 INFO New state for node3:5432 in formation "default": prepare_promotion/prepare_promotion
18:49:14 INFO Setting goal state of node2:5432 to demote_timeout and node3:5432 to stop_replication after node3:5432 converged to prepare_promotion.
18:49:14 INFO New state for node3:5432 in formation "default": prepare_promotion/stop_replication
18:49:14 INFO New state for node2:5432 in formation "default": primary/demote_timeout
18:49:14 INFO Node node3:5432 reported new state stop_replication
18:49:14 INFO New state for node3:5432 in formation "default": stop_replication/stop_replication
18:49:43 INFO Setting goal state of node3:5432 to wait_primary and node2:5432 to demoted after the demote timeout expired.
18:49:43 INFO New state for node3:5432 in formation "default": stop_replication/wait_primary
18:49:43 INFO New state for node2:5432 in formation "default": primary/demoted
18:49:44 INFO Node node3:5432 reported new state wait_primary
18:49:44 INFO New state for node3:5432 in formation "default": wait_primary/wait_primary
IT知识分享网$ pg_autoctl show state --pgdata /data/pg10/main/
Name | Port | Group | Node | Current State | Assigned State
------+--------+-------+-------+-------------------+------------------
node2 | 5432 | 0 | 10 | primary | demoted
node3 | 5432 | 0 | 11 | wait_primary | wait_primary
node3 为 wait_primary 是因为是同步复制,必须至少要有一个slave可用.
观察 keeper node3 节点的输出
18:48:52 INFO Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
18:48:57 INFO Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
18:49:02 INFO Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
18:49:07 INFO Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is 0.
18:49:12 ERROR PostgreSQL cannot reach the primary server: the system view pg_stat_wal_receiver has no rows.
18:49:12 INFO Calling node_active for node default/11/0 with current state: secondary, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:12 INFO FSM transition from "secondary" to "prepare_promotion": Stop traffic to primary, wait for it to finish draining.
18:49:12 INFO Transition complete: current state is now "prepare_promotion"
18:49:13 INFO Calling node_active for node default/11/0 with current state: prepare_promotion, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:13 INFO FSM transition from "prepare_promotion" to "stop_replication": Prevent against split-brain situations.
18:49:13 INFO Prevent writes to the promoted standby while the primary is not demoted yet, by making the service incompatible with target_session_attrs = read-write
18:49:13 INFO Setting default_transaction_read_only to on
18:49:13 INFO Promoting postgres
18:49:13 INFO Other node in the HA group is node2:5432
18:49:13 INFO Create replication slot "pgautofailover_standby"
18:49:13 INFO Disabling synchronous replication
18:49:13 INFO Transition complete: current state is now "stop_replication"
18:49:13 INFO Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:18 INFO Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:23 INFO Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:28 INFO Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:33 INFO Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:38 INFO Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:43 INFO Calling node_active for node default/11/0 with current state: stop_replication, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:43 INFO FSM transition from "stop_replication" to "wait_primary": Confirmed promotion with the monitor
18:49:43 INFO Setting default_transaction_read_only to off
18:49:43 INFO Transition complete: current state is now "wait_primary"
18:49:43 INFO Calling node_active for node default/11/0 with current state: wait_primary, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:48 INFO Calling node_active for node default/11/0 with current state: wait_primary, PostgreSQL is running, sync_state is "", WAL delta is -1.
18:49:53 INFO Calling node_active for node default/11/0 with current state: wait_primary, PostgreSQL is running, sync_state is "", WAL delta is -1.
postgresql primary 节点 node2 开机
启动 keeper node2
$ pg_autoctl run --pgdata /data/pg10/main
18:57:29 INFO Managing PostgreSQL installation at "/data/pg10/main"
18:57:29 ERROR Failed to signal pid 2351, read from Postgres pid file.
18:57:29 INFO Is PostgreSQL at "/data/pg10/main" up and running?
18:57:29 ERROR Given --pgport 5432 doesn't match PostgreSQL port 0 from "/data/pg10/main/postmaster.pid"
18:57:29 FATAL Failed to discover PostgreSQL setup, please fix previous errors.
18:57:29 INFO pg_autoctl service is starting
18:57:29 ERROR Failed to signal pid 2351, read from Postgres pid file.
18:57:29 INFO Is PostgreSQL at "/data/pg10/main" up and running?
18:57:29 INFO Calling node_active for node default/10/0 with current state: primary, PostgreSQL is not running, sync_state is "", WAL delta is -1.
18:57:29 INFO Postgres is not running, starting postgres
18:57:29 INFO /usr/bin/pg_ctl --pgdata /data/pg10/main --options "-p 5432" --options "-h *" --wait start
18:57:30 WARN PostgreSQL was not running, restarted with pid 1961
18:57:30 INFO FSM transition from "primary" to "demoted": A failover occurred, no longer primary
18:57:31 INFO Transition complete: current state is now "demoted"
18:57:31 INFO Calling node_active for node default/10/0 with current state: demoted, PostgreSQL is not running, sync_state is "", WAL delta is -1.
18:57:31 INFO FSM transition from "demoted" to "catchingup": A new primary is available. First, try to rewind. If that fails, do a pg_basebackup.
18:57:31 INFO The primary node returned by the monitor is node3:5432
18:57:31 INFO Rewinding PostgreSQL to follow new primary node3:5432
18:57:31 INFO pg_ctl: no server running
18:57:31 INFO pg_ctl stop failed, but PostgreSQL is not running anyway
18:57:31 INFO Running /usr/bin/pg_rewind --target-pgdata "/data/pg10/main" --source-server " host='node3' port=5432 user='pgautofailover_replicator' dbname='postgres'" --progress ...
观察 monitor node1 节点的输出
18:57:31 INFO Node node2:5432 reported new state demoted
18:57:31 INFO New state for node2:5432 in formation "default": demoted/demoted
18:57:31 INFO Setting goal state of node2:5432 to catchingup after it converged to demotion and node3:5432 converged to wait_primary.
18:57:31 INFO New state for node2:5432 in formation "default": demoted/catchingup
18:57:34 INFO Node node2:5432 reported new state catchingup
18:57:34 INFO New state for node2:5432 in formation "default": catchingup/catchingup
18:57:39 INFO Setting goal state of node3:5432 to primary and node2:5432 to secondary after node2:5432 caught up.
18:57:39 INFO New state for node2:5432 in formation "default": catchingup/secondary
18:57:39 INFO New state for node3:5432 in formation "default": wait_primary/primary
18:57:40 INFO Node node2:5432 reported new state secondary
18:57:40 INFO New state for node2:5432 in formation "default": secondary/secondary
18:57:41 INFO Node node3:5432 reported new state primary
18:57:41 INFO New state for node3:5432 in formation "default": primary/primary
$ pg_autoctl show state --pgdata /data/pg10/main/
Name | Port | Group | Node | Current State | Assigned State
------+--------+-------+-------+-------------------+------------------
node2 | 5432 | 0 | 10 | secondary | secondary
node3 | 5432 | 0 | 11 | primary | primary
异常宕机的老 master 通过 pg_rewind 变为 新 master 的 slave.
新 master 的状态也由 wait_primary 变为 primary
符合预期
参考:
https://github.com/citusdata/pg_auto_failover
https://pg-auto-failover.readthedocs.io/en/latest/quickstart.html
https://www.citusdata.com/blog/2019/05/30/introducing-pg-auto-failover/
https://cloudblogs.microsoft.com/opensource/2019/05/06/introducing-pg_auto_failover-postgresql-open-source-extension-automated-failover-high-availability/
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/12092.html