背景
在日常的使用过程中,时不时会遇到个别,或者大量的连接堆积在 MySQL 中的现象,这时一般会考虑使用 kill 命令强制杀死这些长时间堆积起来的连接,尽快释放连接数和数据库服务器的 CPU 资源。
问题描述
在实际操作 kill 命令的时候,有时候会发现连接并没有第一时间被 kill 掉,仍旧在 processlist 里面能看到,但是显示的 Command 为 Killed,而不是常见的 Query 或者是 Execute 等。例如:
mysql> show processlist; +----+------+--------------------+--------+---------+------+--------------+---------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+--------------------+--------+---------+------+--------------+---------------------------------+ | 31 | root | 192.168.1.10:50410 | sbtest | Query | 0 | starting | show processlist | | 32 | root | 192.168.1.10:50412 | sbtest | Query | 62 | User sleep | select sleep(3600) from sbtest1 | | 35 | root | 192.168.1.10:51252 | sbtest | Killed | 47 | Sending data | select sleep(100) from sbtest1 | | 36 | root | 192.168.1.10:51304 | sbtest | Query | 20 | Sending data | select sleep(3600) from sbtest1 | +----+------+--------------------+--------+---------+------+--------------+---------------------------------+
原因分析
遇事不决先翻官方文档,这里摘取部分官方文档的内容:
When you use KILL, a thread-specific kill flag is set for the thread. In most cases, it might take some time for the thread to die because the kill flag is checked only at specific intervals:During SELECT operations, for ORDER BY and GROUP BY loops, the flag is checked after reading a block of rows. If the kill flag is set, the statement is aborted.
ALTER TABLE operations that make a table copy check the kill flag periodically for each few copied rows read from the original table. If the kill flag was set, the statement is aborted and the temporary table is deleted.
The KILL statement returns本文来源gaodai#ma#com搞@@代~&码网^ without waiting for confirmation, but the kill flag check aborts the operation within a reasonably small amount of time. Aborting the operation to perform any necessary cleanup also takes some time.
During UPDATE or DELETE operations, the kill flag is checked after each block read and after each updated or deleted row. If the kill flag is set, the statement is aborted. If you are not using transactions, the changes are not rolled back.
GET_LOCK() aborts and returns NULL.
If the thread is in the table lock handler (state: Locked), the table lock is quickly aborted.
If the thread is waiting for free disk space in a write call, the write is aborted with a “disk full” error message.
官方文档第一段就很明确的说清楚了 kill 的作用机制:会给连接的线程设置一个线程级别的 kill 标记,等到下一次“标记检测”的时候才会生效。这也意味着如果下一次“标记检测”迟迟没有发生,那么就有可能会出现问题描述中的现象。
官方文档中列举了不少的场景,这里根据官方的描述列举几个比较常见的问题场景:
- select 语句中进行 order by,group by 的时候,如果服务器 CPU 资源比较紧张,那么读取/获取一批数据的时间会变长,从而影响下一次“标记检测”的时间。
- 对大量数据进行 DML 操作的时候,kill 这一类 SQL 语句会触发事务回滚(InnoDB引擎),虽然语句被 kill 掉了,但是回滚操作也会非常久。
- kill alter 操作时,如果服务器的负载比较高,那么操作一批数据的时间会变长,从而影响下一次“标记检测”的时间。
- 其实参考 kill 的作用机制,做一个归纳性的描述的话,那么:任何阻塞/减慢 SQL 语句正常执行的行为,都会导致下一次“标记检测”推迟、无法发生,最终都会导致 kill 操作的失败。