前言
死锁是MySQL并发场景下的“隐形杀手”,轻则导致个别事务失败,重则引发系统雪崩。本文将从死锁监控、日志解读、原因分析、解决方案四个维度,提供一套完整的死锁处理流程,助你快速定位并彻底解决死锁问题。
一、死锁现场:如何发现与捕获
- 实时监控死锁发生
sql
– 查看当前是否存在锁等待
SHOW ENGINE INNODB STATUS\G
– 监控锁信息表(MySQL 8.0+)
SELECT * FROM performance_schema.data_locks;
SELECT * FROM performance_schema.data_lock_waits;
– 查看当前运行的事务
SELECT * FROM information_schema.INNODB_TRX;
– 查看锁等待关系
SELECT
r.trx_id AS waiting_trx_id,
r.trx_mysql_thread_id AS waiting_thread,
r.trx_query AS waiting_query,
b.trx_id AS blocking_trx_id,
b.trx_mysql_thread_id AS blocking_thread,
b.trx_query AS blocking_query
FROM information_schema.INNODB_LOCK_WAITS w
JOIN information_schema.INNODB_TRX b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.INNODB_TRX r ON r.trx_id = w.requesting_trx_id;
2. 开启死锁日志记录
sql
– 查看死锁日志配置
SHOW VARIABLES LIKE ‘%deadlock%’;
– 开启死锁日志记录(临时)
SET GLOBAL innodb_print_all_deadlocks = ON;
– 永久配置(修改my.cnf)
[mysqld]
innodb_print_all_deadlocks = 1
innodb_lock_wait_timeout = 50 – 锁等待超时时间(秒)
log_error = /var/log/mysql/error.log – 错误日志路径
3. 自动死锁监控脚本
bash
#!/bin/bash
deadlock_monitor.sh
LOG_FILE=“/var/log/mysql/deadlock_monitor.log”
ERROR_LOG=“/var/log/mysql/error.log”
监控错误日志中的死锁信息
tail -fERRORLOG∣grep−−line−buffered−i"deadlock"∣whilereadlinedoecho"ERROR_LOG | grep --line-buffered -i "deadlock" | while read line do echo "ERRORLOG∣grep−−line−buffered−i"deadlock"∣whilereadlinedoecho"(date): 检测到死锁事件" >>LOGFILEecho"LOG_FILE echo "LOGFILEecho"line" >> $LOG_FILE
# 立即捕获现场信息 mysql -e "SHOW ENGINE INNODB STATUS\G" >> $LOG_FILE mysql -e "SELECT * FROM information_schema.INNODB_TRX\G" >> $LOG_FILEdone
二、死锁日志深度解读(实战案例)
- 典型死锁日志示例
text
LATEST DETECTED DEADLOCK
2023-10-01 10:00:00 0x7f8e5c0b9700
*** (1) TRANSACTION:
TRANSACTION 123456, ACTIVE 10 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 100, OS thread handle 12345, query id 1000 localhost root updating
UPDATE orders SET status = ‘paid’ WHERE order_id = 100
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 10 page no 5 n bits 72 index PRIMARY of tabletest.orderstrx id 123456 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10 page no 6 n bits 72 index idx_user_id of tabletest.orderstrx id 123456 lock_mode X locks rec but not gap waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
*** (2) TRANSACTION:
TRANSACTION 123457, ACTIVE 8 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 101, OS thread handle 12346, query id 1001 localhost root updating
UPDATE orders SET amount = 200 WHERE user_id = 50
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 10 page no 6 n bits 72 index idx_user_id of tabletest.orderstrx id 123457 lock_mode X locks rec but not gap
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10 page no 5 n bits 72 index PRIMARY of tabletest.orderstrx id 123457 lock_mode X locks rec but not gap waiting
2. 日志关键信息解读
text
关键字段解析:
- TRANSACTION 123456 – 事务ID
- ACTIVE 10 sec – 事务活动时间
- mysql tables in use 1 – 涉及表数量
- locked 1 – 锁定表数量
- LOCK WAIT – 正在等待锁
- lock_mode X – 排他锁(S为共享锁)
- locks rec but not gap – 记录锁,非间隙锁
- index PRIMARY/idx_user_id – 锁所在的索引
- UPDATE orders … – 正在执行的SQL
三、死锁原因深度分析
- 四大死锁成因及特征
死锁类型 产生原因 典型特征
顺序不一致 事务访问资源的顺序不同 互相持有对方需要的锁
间隙锁冲突 范围查询产生间隙锁重叠 涉及GAP锁、Next-Key锁
唯一键冲突 插入操作导致唯一键冲突 插入操作回滚时产生死锁
锁升级冲突 共享锁升级为排他锁 多个事务持有共享锁后同时尝试升级 - 常见死锁场景分析
场景1:交叉更新顺序
sql
– 事务A
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1; – 锁住id=1
UPDATE accounts SET balance = balance + 100 WHERE id = 2; – 尝试锁id=2
– 事务B(同时执行,顺序相反)
BEGIN;
UPDATE accounts SET balance = balance + 50 WHERE id = 2; – 锁住id=2
UPDATE accounts SET balance = balance - 50 WHERE id = 1; – 尝试锁id=1
解决方法: 统一资源访问顺序,按固定顺序(如ID升序)更新。
场景2:间隙锁冲突
sql
– 表结构:id为主键,age有普通索引
– 事务A
BEGIN;
SELECT * FROM users WHERE age = 20 FOR UPDATE; – 对age=20加间隙锁
– 事务B
BEGIN;
INSERT INTO users (age) VALUES (20); – 插入被间隙锁阻塞
解决方法: 使用READ COMMITTED隔离级别或使用唯一索引。
场景3:插入唯一键冲突
sql
– 表有唯一索引 uk_email
– 事务A
BEGIN;
INSERT INTO users (email) VALUES (‘a@test.com’); – 成功
– 事务B
BEGIN;
INSERT INTO users (email) VALUES (‘a@test.com’); – 等待唯一键锁
– 事务A回滚
ROLLBACK;
– 事务B获得锁,但检查到重复,也回滚,产生死锁
解决方法: 使用INSERT … ON DUPLICATE KEY UPDATE。
四、解决方案与预防策略
- 应急处理方案
sql
– 方法1:主动杀死死锁事务
SELECT * FROM information_schema.INNODB_TRX; – 查看事务ID
KILL 100; – 杀死对应线程ID
– 方法2:设置锁等待超时(自动回滚)
SET SESSION innodb_lock_wait_timeout = 10; – 10秒超时
– 方法3:使用nowait(立即返回错误)
SELECT * FROM table WHERE … FOR UPDATE NOWAIT;
2. 代码层面优化
java
// 方案1:统一资源访问顺序
public void transfer(Account from, Account to, BigDecimal amount) {
// 按ID固定顺序锁定
Account first = from.getId() < to.getId() ? from : to;
Account second = from.getId() < to.getId() ? to : from;
lock(first); try { lock(second); // 执行转账逻辑 } finally { unlock(second); unlock(first); }}
// 方案2:使用乐观锁
@Transactional
public boolean updateWithOptimisticLock(Order order) {
int rows = orderMapper.update(
"UPDATE orders SET status = ?, version = version + 1 " +
“WHERE id = ? AND version = ?”,
order.getStatus(), order.getId(), order.getVersion());
return rows > 0; // 失败则重试
}
3. 数据库层面优化
sql
– 1. 使用合适的隔离级别
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
– 2. 优化索引设计(减少锁范围)
– 不良设计:没有索引,全表锁
UPDATE users SET status = 1 WHERE phone = ‘13800138000’;
– 优化后:添加索引
ALTER TABLE users ADD INDEX idx_phone(phone);
– 3. 避免长事务
SET SESSION autocommit = 1; – 自动提交
– 或明确控制事务边界
BEGIN;
– 只做必要操作
COMMIT;
– 4. 批量操作分批次
– 不良:一次性更新10万条
UPDATE large_table SET flag = 1 WHERE condition;
– 优化:分批更新
UPDATE large_table SET flag = 1 WHERE condition LIMIT 1000;
– 循环执行直到完成
– 5. 使用SELECT … FOR UPDATE NOWAIT/SKIP LOCKED
– NOWAIT:获取不到锁立即返回错误
SELECT * FROM orders WHERE id = 100 FOR UPDATE NOWAIT;
– SKIP LOCKED:跳过已锁定的行
SELECT * FROM tasks WHERE status = ‘pending’ FOR UPDATE SKIP LOCKED LIMIT 10;
4. 架构层面预防
yaml
应用层配置
重试机制(带退避策略):
- 第一次重试:立即重试
- 第二次重试:等待100ms
- 第三次重试:等待500ms
- 超过3次:报警人工介入
热点数据拆分:
- 用户账户按ID分表
- 库存扣减使用Redis+Lua原子操作
读写分离:
- 写操作走主库
- 读操作走从库
- 避免读操作加锁影响写事务
监控与告警体系
sql
– 创建死锁监控表
CREATE TABLE deadlock_history (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
deadlock_time DATETIME DEFAULT CURRENT_TIMESTAMP,
deadlock_log TEXT,
resolved_time DATETIME,
resolved_action VARCHAR(50)
);
– 死锁自动捕获存储过程
DELIMITER //
CREATE PROCEDURE capture_deadlock()
BEGIN
DECLARE deadlock_text TEXT;
-- 检查错误日志中是否有新死锁 -- (实际应用中需要结合日志解析工具) -- 记录到监控表 INSERT INTO deadlock_history (deadlock_log) VALUES (deadlock_text);END//
DELIMITER ;
bash
#!/bin/bash
死锁告警脚本
THRESHOLD=5 # 1小时内死锁次数阈值
统计最近1小时死锁次数
COUNT=$(grep -c “LATEST DETECTED DEADLOCK” /var/log/mysql/error.log.1)
if [ $COUNT -geKaTeX parse error: Expected 'EOF', got '#' at position 23: …LD ]; then #̲ 发送告警(邮件、钉钉、企业微…{COUNT}次死锁!" |
mail -s “MySQL死锁告警” admin@example.com
# 自动抓取现场信息 mysql -e "SHOW ENGINE INNODB STATUS\G" > /tmp/deadlock_emergency_$(date +%Y%m%d_%H%M%S).logfi
五、死锁排查标准化流程
排查六步法
确认死锁发生:查看错误日志或监控告警
捕获现场信息:立即执行SHOW ENGINE INNODB STATUS
分析死锁日志:确定涉及的事务、SQL、锁类型
定位业务代码:找到对应的应用代码逻辑
制定解决方案:根据死锁类型选择应对策略
验证与预防:修复后监控效果,优化预防措施
预防检查清单
✅ 事务设计
事务尽可能短小
按固定顺序访问资源
避免事务内用户交互
✅ SQL优化
合理使用索引
避免全表扫描更新
批量操作分批次
✅ 架构设计
热点数据拆分
读写分离
缓存层保护数据库
✅ 监控体系
死锁日志监控
长事务监控
锁等待监控
六、进阶工具与技巧
- 使用pt-deadlock-logger分析
bash
Percona Toolkit工具
pt-deadlock-logger --user=root --password=xxx --socket=/tmp/mysql.sock
输出示例:
2023-10-01T10:00:00 server_id=1 thread_id=100 query_id=1000 …
- 可视化分析工具
bash
使用mysqld-deadlock-visualizer
python deadlock_visualizer.py /var/log/mysql/error.log
生成死锁关系图,直观展示资源争用
- 压力测试复现
sql
– 使用sysbench模拟并发
sysbench oltp_read_write
–mysql-host=127.0.0.1
–mysql-port=3306
–mysql-user=root
–mysql-password=xxx
–mysql-db=test
–table-size=1000000
–threads=32
–time=300
–report-interval=10
run
总结:死锁处理黄金法则
预防优于治疗:设计阶段考虑并发访问顺序
快速发现:建立完善的监控告警体系
保留现场:死锁发生时立即捕获完整信息
精准分析:深入理解日志中的锁类型和事务关系
根因解决:从代码、SQL、架构多层面优化
记住:死锁不是洪水猛兽,而是系统并发设计的“体检报告”。通过系统性的分析处理,死锁问题不仅能解决,还能推动系统架构的持续优化。每一次死锁分析,都是提升系统稳定性的宝贵机会。