MySQL 5.7.23 で Group Replicationの復旧などを確認
前回はGroup Replicationを構築しました。
今回は、Primary Node死亡時の流れや復旧の流れを確認します。
my.cnfへの書き出しができていれば、mysqlプロセスを再起動するだけでONLINE状態まで戻ることが確認できました。
環境
環境は前回と同じ以下の環境です。
OS | アプリ | IP | ホスト名 |
---|---|---|---|
CentOS7 | MySQL 5.7.23, MySQL Shell 8.0.16 | 192.168.10.101 | node01 |
CentOS7 | MySQL 5.7.23, MySQL Shell 8.0.16 | 192.168.10.102 | node02 |
CentOS7 | MySQL 5.7.23, MySQL Shell 8.0.16 | 192.168.10.103 | node03 |
Primary Nodeの死
Primary Nodeが死亡したときの動作を確認します。
node2でmysqlshを実行してclusterの状態を見てみましょう。
どのnodeも正常に動作していて、node1がPrimary Nodeになっています。
あと、node2で実行してもclusterの状態はnode1から取得しているようです。
[node2] $ mysqlsh --log-level=DEBUG3 --uri repl@node2:3306 MySQL node2:3306 JS > var cluster = dba.getCluster() MySQL node2:3306 JS > cluster.status() { "clusterName": "myCluster", "defaultReplicaSet": { "name": "default", "primary": "node1:3306", "ssl": "DISABLED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "node1:3306": { "address": "node1:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node2:3306": { "address": "node2:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node3:3306": { "address": "node3:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "node1:3306" }
続いて、node1のmysqlを止めます。
[node1] $ sudo systemctl stop mysqld
node2で状態を確認します。
このとき、node2で実行しているmysqlshでcluster.status()
を実行すると、node1のmysqlに紐付いているのでCluster.status: MySQL server has gone away (MySQL Error 2006)
と出て情報が取れません。
再度getClusterして確認します。
node1の情報が取得できない状態になり、Primary Nodeがnode2になっています。
[node2] $ mysqlsh --log-level=DEBUG3 --uri repl@node2:3306 MySQL node2:3306 JS > var cluster = dba.getCluster() MySQL node2:3306 JS > cluster.status() { "clusterName": "myCluster", "defaultReplicaSet": { "name": "default", "primary": "node2:3306", "ssl": "DISABLED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", "topology": { "node1:3306": { "address": "node1:3306", "mode": "n/a", "readReplicas": {}, "role": "HA", "status": "(MISSING)" }, "node2:3306": { "address": "node2:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node3:3306": { "address": "node3:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "node2:3306" }
Nodeの復旧
node1を復旧してみましょう。
mysqlを再起動します。
[node1] $ sudo systemctl start mysqld
node2でcluster.status()を確認してみましょう。
node1がONLINEになり復旧できていることが確認できます。
[node2] $ mysqlsh --log-level=DEBUG3 --uri repl@node2:3306 MySQL node2:3306 JS > var cluster = dba.getCluster() MySQL node2:3306 JS > cluster.status() { "clusterName": "myCluster", "defaultReplicaSet": { "name": "default", "primary": "node2:3306", "ssl": "DISABLED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "node1:3306": { "address": "node1:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node2:3306": { "address": "node2:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node3:3306": { "address": "node3:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "node2:3306" }
続いて、node3を止めて復旧してみましょう。
[node3] $ sudo systemctl stop mysqld [node3] $ sudo systemctl start mysqld
statusを確認してみると、node3はmissingのままになっています。
[node2] $ mysqlsh --log-level=DEBUG3 --uri repl@node2:3306 MySQL node2:3306 JS > var cluster = dba.getCluster() MySQL node2:3306 JS > cluster.status() { "clusterName": "myCluster", "defaultReplicaSet": { "name": "default", "primary": "node2:3306", "ssl": "DISABLED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", "topology": { "node1:3306": { "address": "node1:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node2:3306": { "address": "node2:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node3:3306": { "address": "node3:3306", "mode": "n/a", "readReplicas": {}, "role": "HA", "status": "(MISSING)" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "node2:3306" }
journalを確認します。
[node3] $ sudo journalctl -xe ... ...[ERROR] Plugin group_replication reported: 'The group name '' is not a valid UUID' ...[ERROR] Plugin group_replication reported: 'Unable to start Group Replication on boot' ...
group nameが空っぽだから起動できないようですね?
addInstanceしたあとで、my.cnfに書き出したはずですが、my.cnfを確認してみるとgroup_replication_group_name =
と空っぽになっていました。
もう一度設定してみましょう。
node3でgroup replicationが実行されたままで、addInstanceができないため、停止してからaddInstanceする必要があります。
追加したら、dba.configureLocalInstance()
を実行して設定を永続化させましょう。
[node3] $ sudo mysql -u root mysql > STOP GROUP_REPLICATION; [node2] $ mysqlsh --log-level=DEBUG3 --uri repl@node2:3306 MySQL node2:3306 JS > var cluster = dba.getCluster() MySQL node2:3306 JS > cluster.addInstance('repl@node3:3306') [node3] $ mysqlsh --log-level=DEBUG3 --uri repl@node2:3306 MySQL node3:3306 JS > dba.configureLocalInstance()
無事追加できたのでstatusを確認してみましょう。
node3のstatusがRECOVERINGとなっており復帰できていないようですね。
復帰できていないくせに、clusterのstatusはOKになっているのが罠感あります。
[node2] $ mysqlsh --log-level=DEBUG3 --uri repl@node2:3306 MySQL node2:3306 JS > var cluster = dba.getCluster() MySQL node2:3306 JS > cluster.status() { "clusterName": "myCluster", "defaultReplicaSet": { "name": "default", "primary": "node2:3306", "ssl": "DISABLED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "node1:3306": { "address": "node1:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node2:3306": { "address": "node2:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "node3:3306": { "address": "node3:3306", "mode": "n/a", "readReplicas": {}, "role": "HA", "status": "RECOVERING" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "node2:3306" }
node3のjournalを確認してみましょう。
ERRORがでているようです。内容的にwarning感がありますが他にめぼしいログは無いですね。
[node3] $ sudo journalctl -xe ... ...[ERROR] Plugin group_replication reported: 'Group contains 3 members which is greater than group_replication_auto_increment_increment value of 1. This can lead to an higher rate of transactional aborts.' ...
エラーはgroup_replication_auto_increment_increment = が1なのにmemberは3やでと言っていますが、設定ファイルにはgroup_replication_auto_increment_increment = 7となっています。
mysqlで確認しても7になっていますね。
第49回 MySQLのAUTO_INCREMENTについて:MySQL道普請便り|gihyo.jp … 技術評論社
こちらのサイトの解説を見てもこれが原因で復帰できないとは思えません。
MySQL Group Replicationのモニタリング – variable.jp [データベース,パフォーマンス,運用]
こちらのサイトを参考に状況を調査してみましょう。
queueにtransactionが残っているようですね?
[node3] $ sudo mysql -u root mysql > select * from performance_schema.replication_group_member_stats \G; *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15628341822720480:10 MEMBER_ID: d46d27c7-a24e-11e9-a4e6-fa163e4b9377 COUNT_TRANSACTIONS_IN_QUEUE: 1 COUNT_TRANSACTIONS_CHECKED: 0 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: LAST_CONFLICT_FREE_TRANSACTION: 1 row in set (0.00 sec)
他のnodeの状況も調査してみましょう。
どちらもqueueは0ですね。checkedの値が違うのが気になりますがそれはおいておきましょう。
このCOUNT_TRANSACTIONS_IN_QUEUE
は、https://mysqlhighavailability.com/mysql-group-replication-monitoring/ このサイトによると、
Count_Transactions_in_queue – Number of transactions in queue pending certification and apply. For the simple run that we had this field shows value 0 but if the load is high we will have transactions in queue.
となっており、なんか処理されていないtransactionがあるようです。
[node1] $ sudo mysql -u root mysql> select * from performance_schema.replication_group_member_stats \G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15628341822720480:10 MEMBER_ID: 440dd2c5-a22b-11e9-b851-fa163e47fb1b COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 4 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: 10d5fb2f-a3b1-11e9-b146-fa163e47fb1b:1-15, 15f39657-a3b2-11e9-b146-fa163e47fb1b:1-15, 19a85aa3-a3b0-11e9-b146-fa163e47fb1b:1-15, 34f9b48b-a3ac-11e9-b146-fa163e47fb1b:1-15, 440dd2c5-a22b-11e9-b851-fa163e47fb1b:1-156, 7ea76381-a3ab-11e9-b146-fa163e47fb1b:1-3, 9c72025d-a3b5-11e9-b146-fa163e47fb1b:1-25, a70e6e35-a3b0-11e9-b146-fa163e47fb1b:1-15, b4c0063d-a3ad-11e9-b146-fa163e47fb1b:1-15, ea69819e-a3ac-11e9-b146-fa163e47fb1b:1-15, f1813b82-a242-11e9-bb2d-fa163e47fb1b:1-2, f5a6d972-a3b6-11e9-b146-fa163e47fb1b:1-29, f9d4b90c-a246-11e9-b146-fa163e47fb1b:1-268:1000193-1000256:2000233-2000244 LAST_CONFLICT_FREE_TRANSACTION: f5a6d972-a3b6-11e9-b146-fa163e47fb1b:29 1 row in set (0.00 sec) [node2] $ sudo mysql -u root mysql> select * from performance_schema.replication_group_member_stats \G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15628341822720480:10 MEMBER_ID: 34ce3aed-a249-11e9-b3e8-fa163e4531c6 COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 8 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: 10d5fb2f-a3b1-11e9-b146-fa163e47fb1b:1-15, 15f39657-a3b2-11e9-b146-fa163e47fb1b:1-15, 19a85aa3-a3b0-11e9-b146-fa163e47fb1b:1-15, 34f9b48b-a3ac-11e9-b146-fa163e47fb1b:1-15, 440dd2c5-a22b-11e9-b851-fa163e47fb1b:1-156, 7ea76381-a3ab-11e9-b146-fa163e47fb1b:1-3, 9c72025d-a3b5-11e9-b146-fa163e47fb1b:1-25, a70e6e35-a3b0-11e9-b146-fa163e47fb1b:1-15, b4c0063d-a3ad-11e9-b146-fa163e47fb1b:1-15, ea69819e-a3ac-11e9-b146-fa163e47fb1b:1-15, f1813b82-a242-11e9-bb2d-fa163e47fb1b:1-2, f5a6d972-a3b6-11e9-b146-fa163e47fb1b:1-29, f9d4b90c-a246-11e9-b146-fa163e47fb1b:1-268:1000193-1000256:2000233-2000244 LAST_CONFLICT_FREE_TRANSACTION: f5a6d972-a3b6-11e9-b146-fa163e47fb1b:29 1 row in set (0.00 sec)
処理を進めて貰う方法を調べます。
MySQL 5.7と8.0でロック状態を確認する(sys.innodb_lock_waitsビュー) - Qiita
こちらのサイトを参考に確認してみましょう。
何も情報が無いですね?
[node3] $ sudo mysql -u root mysql> select * from sys.innodb_lock_waits \G Empty set, 3 warnings (0.01 sec) mysql> show warnings; +---------+------+-----------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------------------------------------------------------------+ | Warning | 1681 | 'INFORMATION_SCHEMA.INNODB_LOCK_WAITS' is deprecated and will be removed in a future release. | | Warning | 1681 | 'INFORMATION_SCHEMA.INNODB_LOCKS' is deprecated and will be removed in a future release. | | Warning | 1681 | 'INFORMATION_SCHEMA.INNODB_LOCKS' is deprecated and will be removed in a future release. | +---------+------+-----------------------------------------------------------------------------------------------+ 3 rows in set (0.01 sec)
なぜあなたは SHOW ENGINE INNODB STATUS を読まないのか - そーだいなるらくがき帳
Inno DBのstatusを見てみましょう。
問題が起きてそうには見えないですねー。
[node3] $ sudo mysql -u root mysql> show engine innodb status\G *************************** 1. row *************************** Type: InnoDB Name: Status: ===================================== 2019-07-11 19:57:31 0x7fb7dcfb7700 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 5 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 4 srv_active, 0 srv_shutdown, 4770 srv_idle srv_master_thread log flush and writes: 4774 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 5 OS WAIT ARRAY INFO: signal count 5 RW-shared spins 0, rounds 10, OS waits 5 RW-excl spins 0, rounds 0, OS waits 0 RW-sx spins 0, rounds 0, OS waits 0 Spin rounds per wait: 10.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx ------------ TRANSACTIONS ------------ Trx id counter 6690 Purge done for trx's n:o < 6688 undo n:o < 0 state: running but idle History list length 12 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 421911789999952, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421911790001776, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421911790000864, not started 0 lock struct(s), heap size 1136, 0 row lock(s) -------- FILE I/O -------- I/O thread 0 state: waiting for completed aio requests (insert buffer thread) I/O thread 1 state: waiting for completed aio requests (log thread) I/O thread 2 state: waiting for completed aio requests (read thread) I/O thread 3 state: waiting for completed aio requests (read thread) I/O thread 4 state: waiting for completed aio requests (read thread) I/O thread 5 state: waiting for completed aio requests (read thread) I/O thread 6 state: waiting for completed aio requests (write thread) I/O thread 7 state: waiting for completed aio requests (write thread) I/O thread 8 state: waiting for completed aio requests (write thread) I/O thread 9 state: waiting for completed aio requests (write thread) Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] , ibuf aio reads:, log i/o's:, sync i/o's: Pending flushes (fsync) log: 0; buffer pool: 0 406 OS file reads, 111 OS file writes, 40 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 0, seg size 2, 0 merges merged operations: insert 0, delete mark 0, delete 0 discarded operations: insert 0, delete mark 0, delete 0 Hash table size 2365399, node heap has 0 buffer(s) Hash table size 2365399, node heap has 0 buffer(s) Hash table size 2365399, node heap has 0 buffer(s) Hash table size 2365399, node heap has 0 buffer(s) Hash table size 2365399, node heap has 0 buffer(s) Hash table size 2365399, node heap has 0 buffer(s) Hash table size 2365399, node heap has 0 buffer(s) Hash table size 2365399, node heap has 0 buffer(s) 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 7554795 Log flushed up to 7554795 Pages flushed up to 7554795 Last checkpoint at 7554786 0 pending log flushes, 0 pending chkp writes 27 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 8795455488 Dictionary memory allocated 133496 Buffer pool size 524256 Free buffers 523847 Database pages 409 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 373, created 36, written 71 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 409, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---------------------- INDIVIDUAL BUFFER POOL INFO ---------------------- ---BUFFER POOL 0 Buffer pool size 65536 Free buffers 65499 Database pages 37 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 37, created 0, written 16 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 37, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---BUFFER POOL 1 Buffer pool size 65528 Free buffers 65517 Database pages 11 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 11, created 0, written 0 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 11, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---BUFFER POOL 2 Buffer pool size 65536 Free buffers 65521 Database pages 15 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 15, created 0, written 0 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 15, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---BUFFER POOL 3 Buffer pool size 65528 Free buffers 65452 Database pages 76 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 76, created 0, written 0 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 76, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---BUFFER POOL 4 Buffer pool size 65536 Free buffers 65465 Database pages 71 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 71, created 0, written 1 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 71, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---BUFFER POOL 5 Buffer pool size 65528 Free buffers 65460 Database pages 68 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 68, created 0, written 3 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 68, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---BUFFER POOL 6 Buffer pool size 65536 Free buffers 65458 Database pages 78 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 42, created 36, written 41 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 78, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] ---BUFFER POOL 7 Buffer pool size 65528 Free buffers 65475 Database pages 53 Old database pages 0 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages read 53, created 0, written 10 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 53, unzip_LRU len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 0 read views open inside InnoDB Process ID=15367, Main thread ID=140427713836800, state: sleeping Number of rows inserted 90, updated 12, deleted 0, read 143 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 1 row in set (0.01 sec)
各nodeでshow master status;
をしてみると、node3だけずれています。
一旦RESET MASTER;
してみましょう。
MySQL :: MySQL 5.6 リファレンスマニュアル :: 13.4.1.2 RESET MASTER 構文
[node3] $ sudo mysql -u root mysql> show master status; +---------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------+----------+--------------+------------------+-------------------+ | binlog.000001 | 150 | | | | +---------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) mysql> STOP GROUP_REPLICATION; Query OK, 0 rows affected (10.26 sec) mysql> RESET MASTER; mysql> START GROUP_REPLICATION; mysql> select * from performance_schema.replication_group_member_stats \G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier VIEW_ID: 15628341822720480:12 MEMBER_ID: d46d27c7-a24e-11e9-a4e6-fa163e4b9377 COUNT_TRANSACTIONS_IN_QUEUE: 0 COUNT_TRANSACTIONS_CHECKED: 0 COUNT_CONFLICTS_DETECTED: 0 COUNT_TRANSACTIONS_ROWS_VALIDATING: 0 TRANSACTIONS_COMMITTED_ALL_MEMBERS: LAST_CONFLICT_FREE_TRANSACTION: 1 row in set (0.00 sec)
COUNT_TRANSACTIONS_IN_QUEUEは0になりました。
node2でcluster.status()を見るとnode3がONLINEになっています。
おそらく原因としては、node3をaddInstanceしたタイミングでSTART GROUP_REPLICATION;
が走り、group_nameが空っぽのclusterのPrimaryとして動作し始めたので、group_nameを修正して元のclusterに追加した際に前のclusterのログが残っていたことで不整合が発生していたのではないでしょうか。
詳しい原因を調べたかったですが、node3に消えては困るログが残っていたわけでも無いのでRESET MASTER;
で解決しました。
とりあえず、Primary NodeとSecondary Nodeの復旧方法が確認できました。
おわりに
冒頭にも書きましたが、my.cnfへの書き出しができていれば、mysqlプロセスを再起動するだけでONLINE状態まで戻ることが確認できました。
MySQL 5.7系を使っていると、remoteからMySQL Shellで設定の書き出しができないため、nodeにsshしてMySQL Shellでdba.configureLocalInstance()
を実行する必要があります。
データを流してみて同期の流れや不整合についてなども確認したかったのですが、ボリュームが増えちゃったので一旦区切ります。
次回はデータ周りを確認してみましょう。