12 基于语句的复制时,不要使用load datainfile命令
?
What MHA does on monitoring and failover
这一部分很多内容与上述重复,我只做简要翻译,在监控和故障转移过程中,MHA主要做了以下几项工作
Verifying replicationsettings and identifying the current master
核实复制配置并识别出当前的master
Monitoring the masterserver
Detecting the masterserver failure
Verifying slaveconfigurations again
Shutting down failedmaster server (optional)
Recovering a newmaster
Activating the newmaster
Recovering the restslaves
Notifications(optional)
监控master server直到master崩溃,在这一步时manager不再监控slave的状态。所以如果需要添加或删除slave节点,最好重新修改manager配置文件并重启MHA
检测到master故障
重新扫描配置文件,各种重连,核实master确实已经崩溃。如果最近一次的报错和现在一样并且时间相隔非常之短,MHA将会停止继续报错并进入下一步
关闭崩溃的主机(可选),防止错误继续扩散
重新选举出一个新的master。如果崩溃的主机能够通过SSH连接,则复制崩溃主机的binlog到最新的slave上,并指向他的end_log_pos。在选择新的master上遵守manager上的配置文件,如果某个slave能成为master,则设置candidate_master=1。如果某个slave永远不能成为master,则设置no_master=1。识别出最新的slave并将其选举为新的master,最新的slave即接受到最新的relay log的那台slave。
激活新的master
重新设置其余的slave使其指向新选举出来的master
发送通告(可选),比如发送邮件,禁用新master上backup工作等,可通过 report_script脚本设置
?
What MHA does on online(fast) master switch
简要翻译,在线master切换过程中,MHA主要做了以下工作
Verifying replication settings and identifying the current master
Identifying the new mater
Rejecting writes on the current master
Waiting for all slaves to catch up replication
Granting writes on the new master
Switching replication on all the rest slaves
核实复制配置并识别出当前的master,这个过程还会检测以下几个条件是否满足:
Slave上的IO线程is running
Salve上的SQL线程is running
Slave上所有的复制延迟少于2s
在master上的update操作没有超过2秒的
识别出新的master
在当前master上执行FLUSHTABLES WITH READ LOCK阻塞写操作防止数据一致性问题
等待所有的slave的复制跟上master
在新的master上执行SHOW MASTER STATUS,记录下binlog文件名称和pos,并执行SET GLOBAL read_only=0授权其写操作
在其他salve上并行执行CHANGE MASTER, START SLAVE,指向新的master,并start slave
?
?
Parameters
MHA manager配置参数列表如下
| Parameter Name |
Required? |
Parameter Scope |
Default Value |
Example |
| hostname |
Yes |
Local Only |
- |
hostname=mysql_server1, hostname=192.168.0.1, etc |
| ip |
No |
Local Only |
gethostbyname($hostname) |
ip=192.168.1.3 |
| port |
No |
Local/App/Global |
3306 |
port=3306 |
| ssh_host |
No |
Local Only |
same as hostname |
ssh_host=mysql_server1, ssh_host=192.168.0.1, etc |
| ssh_ip |
No |
Local Only |
gethostbyname($ssh_host) |
ssh_ip=192.168.1.3 |
| ssh_port |
No |
Local/App/Global |
22 |
ssh_port=22 |
| ssh_connection_timeout |
No |
Local/App/Global |
5 |
ssh_connection_timeout=20 |
| ssh_options |
No |
Local/App/Global |
""(empty string) |
ssh_options="-i /root/.ssh/id_dsa2" |
| candidate_master |
No |
Local Only |
0 |
candidate_master=1 |
| no_master |
No |
Local Only |
0 |
no_master=1 |
| ignore_fail |
No |
Local Only |
0 |
ignore_fail=1 |
| skip_init_ssh_check |
No |
Local Only |
0 |
skip_init_ssh_check=1 |
| skip_reset_slave |
No |
Local/App/Global |
0 |
skip_reset_slave=1 |
| user |
No |
Local/App/Global |
root |
user=mysql_root |
| password |
No |
Local/App/Global |
""(empty string) |
password=rootpass |
| repl_user |
No |
Local/App/Global |
Master_User value from SHOW SLAVE STATUS |
repl_user=repl |
| repl_password |
No |
Local/App/Global |
- (current replication password) |
repl_user=replpass |
| disable_log_bin |
No |
Local/App/Global |
0 |
disable_log_bin=1 |
| master_pid_file |
No |
Local/App/Global |
""(empty string) |
master_pid_file=/var/lib/mysql/master1.pid |
| ssh_user |
No |
Local/App/Global |
current OS user |
ss |