webgame中Mysql Deadlock ERROR 1213 (40001)错误的排查历程(二)
log entries 1
MySQL thread id 122562823, OS thread handle 0x7fa5c4fbe700, query id 7457663621 10.1.1.8 s001_gamedb Updating
UPDATE `role_state` SET `state` = 1
WHERE `role_id` = '53016'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 477 page no 1386 n bits 128 index `PRIMARY` of table `xxx_roles_xxx`.`role_state` trx id AC512256 lock_mode X locks rec but not gap waiting
Record lock, heap no 17 PHYSICAL RECORD: n_fields 80; compact format; info bits 0
0: len 3; hex 00cf18; asc ;;
......
......
*** (2) TRANSACTION:
TRANSACTION AC512250, ACTIVE 0 sec inserting, thread declared inside InnoDB 500
mysql tables in use 1, locked 1
6 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 122679850, OS thread handle 0x7fac007ff700, query id 7457663711 10.1.1.8 s001_gamedb update
REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES ('53016', 4967, 3291, 350, 174, ***)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 477 page no 1386 n bits 128 index `PRIMARY` of table `xxx_roles_xxx`.`role_state` trx id AC512250 lock_mode X locks rec but not gap
Record lock, heap no 17 PHYSICAL RECORD: n_fields 80; compact format; info bits 0
......
......
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 427 page no 488 n bits 192 index `PRIMARY` of table `xxx_roles_xxx`.`role_fight` trx id AC512250 lock_mode X locks rec but not gap waiting
Record lock, heap no 64 PHYSICAL RECORD: n_fields 51; compact format; info bits 0
......
*** WE ROLL BACK TRANSACTION (1)
......
这是我精简之后的信息,我抓去了LATEST DETECTED DEADLOCK部分的数据,这部分的数据是INNODB的最后一次发生死锁的信息,更详细的说明见MYSQL官方手册对Standard Monitor and Lock Monitor Output返回结果的解释。
OK,发现一场案例,保存这个INNODB的状态数据备用。迅速到程序异常日志中查看相同时间点是否有死锁发生。果然,我们程序异常日志中记录了这起案例(文章开头的日志)。
再到binlog中抓去这个时间段前后10分钟(大约范围)的mysql sql语句执行日志。
案情分析:
engine status中,大约看出MYSQL记录了两个事务之间发生锁竞争时,遗留的数据,
事务1“执行”(注意,这里加了双引号)
UPDATE `role_state` SET `state` = 1 WHERE `role_id` = '53016'
发现被修改资源已经被lock_mode X locks了(详情见: INNODB锁模式),准备等待该资源锁被释放
事务2执行
REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES ('53016', 4967, 3291, 350, 174, ***)
也发现该资源被lock_mode X locks了。
最后部分,mysql给了很重要的一个数据“WE ROLL BACK TRANSACTION (1)” MYSQL回滚了事物1。既然mysql回滚了1,那么肯定是事务1的语句触发了死锁,被mysql回滚了,也就是应该为程序中的异常日志所记录的那部分。同时,MYSQL执行了事务2,那么事务2的SQL语句肯定被记录在BINLOG中了。
抽丝剥茧:
如何确定事务1、事务2执行了哪些SQL语句呢?
根据show engine innodb status的结果,确定事务2被执行的
- SQL语句(业务逻辑的role_id唯一标识): REPLACE INTO `role_fight` (`role_id`, `life_max`, `mana_max`, `attack_physical`, `attack_internal`,****) VALUES (’53016′, 4967, 3291, 350, 174, ***)
- 线程ID(mysql的唯一标识): MySQL thread id 122679850
- 执行时间(时间线):121128 15:59:46
根据这三个标识,以及BINLOG的起始表示“BEGIN、COMMIT”,几乎可以100%确定该事务所包含的SQL语句。
BINLOG信息大约如下:
# at 511750764
#121128 15:59:46 server id 1 end_log_pos 511750843 Query thread_id=122679850 exec_time=0 error_code=0
SET TIMESTAMP=1354089586/*!*/;
BEGIN
/*!*/;
# at 511750843
#121128 15:59:46 server id 1 end_log_pos 511751090 Query thread_id=122679850 exec_time=0 error_code=0
use xxx_roles_xxx/*!*/;
SET TIMESTAMP=1354089586/*!*/;
UPDATE `role_pet` SET `in_supporting` = 0, `levelup_pause_time` = 1354089587, `auto_feed` = 0, `supporting_pause_time` = 1354089587
WHERE `role_id`