select *
from test.vmark vk
where id in (select v.id
from usr_center.vmark_degree_update_log v, (select min(id) id
from usr_center.vmark_degree_update_log
where degree_update_cause = 0
and degree_update_type = 0
group by user_id) log where v.id = log.id
and v.degree_update_type = 0
and v.degree_update_before between '2015-01-01 00:00:00' and
'2015-01-10 00:00:00');
+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+
| 1 | PRIMARY | vk | ALL | NULL | NULL | NULL | NULL | 66051213 | Using where || 2 | DEPENDENT SUBQUERY | | ALL | NULL | NULL | NULL | NULL | 82947 | Using where |
| 2 | DEPENDENT SUBQUERY | v | eq_ref | PRIMARY,idx_dd | PRIMARY | 4 | func | 1 | Using where || 3 | DERIVED | vmar_degree_update_log | ref | idx_dd | idx_dd | 1 | | 12508106 | Using where; Using temporary; Using filesort |
+----+--------------------+-------------------------+--------+----------------+---------+---------+------+----------+----------------------------------------------+
4 rows in set (8.10 sec)
怎么解读这个执行计划呢?1.首先执行id=3的标识为DERIVED的步骤,这个关键词是衍生,出现在from后的子查询会有这个标识。索引idx_dd(degree_update_type,degree_update_after)将被用于这一步,ref表名这是非唯一索引扫描,预计扫描12287942行。2.执行id=2的第一条计划,也就是将第id=4的结果,group by成的82946行,对这个临时表做一个全表扫描。这里DEPENDENT SUBQUERY的意思是,这里要扫描的行数或执行次数,取决于其他步骤,即依赖于第4步。这里扫描82946行。3.得到log.id信息后,根据v.id=log.id,去驱动v表,走的是primary,eq_ref也反映这是唯一索引扫描,从rows可以看出,每执行一次返回1行,执行多少次呢,执行82946次,因为是"DEPENDENT SUBQUERY",所以执行次数或扫描行数依赖于第2步,也就是第2步每扫出一条,第3步就走一次索引。这里我们也看出来了,MySQL的执行计划并不是反映最终返回几行,不是反映这步骤总共扫描几行,也不告诉你执行多少次,而只是返回执行一次返回多少行。这里扫描82946行。4.全表扫描test.vmar表,66050840行,然后Nest loop join之前步骤返回的结果集(实际有8万多行)。这一步骤最耗时,读66050840*80000行。这里扫描行数成本是:12287942+82946*2+66050840*80000=7263409280000,7千亿行,我假设1亿行读20分钟,也需要2300多个小时,这个执行计划很恐怖。结论:放在from的子查询是非关联子查询,没关系。但是放在where后的,却要紧。
问题出在第4步,test.vmark表实际上id上有主键索引,我们如果能用前面三步的结果集,获得id值,再去驱动test.vmark的id值,那么就很容易得到想要的结果。
更改SQL如下,将in转变成join。select *
from test.vmar vk join (select v.id
from usr_center.vmar_degree_update_log v,
(select min(id) id
from usr_center.vmar_degree_update_log
where degree_update_cause = 0
and degree_update_type = 0
group by user_id) log
where v.id = log.id
and v.degree_update_type = 0
and v.degree_update_before between '2015-01-01 00:00:00' and
'2015-01-10 00:00:00') child
where vk.id = child.id;+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------------+--------+----------------+---------+---------+----------+----------+----------------------------------------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 82371 | |
| 1 | PRIMARY | vk | eq_ref | PRIMARY | PRIMARY | 4 | child.id | 1 | |
| 2 | DERIVED | | ALL | NULL | NULL | NULL | NULL | 82948 | |
| 2 | DERIVED | v | eq_ref | PRIMARY,idx_dd | PRIMARY | 4 | log.id | 1 | Us