🌝

如何设计 MySQL 索引(二):实战

Posted at — Jan 14, 2023

题目一:联合索引

两种索引(status, create_time)、(create_time, status),以下 SQL 语句走哪个索引?[^1]

1
2
3
select  * from trade_info where status = 1 
    and create_time >= '2020-10-01 00:00:00'
    and create_time <= '2020-10-07 23:59:59';

执行 explain 语句,可以看到使用了索引(status, create_time)。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
explain select * from trade_info where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59'\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: trade_info
   partitions: NULL
         type: index
possible_keys: idx_create_time_status,idx_status_create_time
          key: idx_create_time_status
      key_len: 9
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index

进一步通过 optimizer_trace 来跟踪优化器的选择。MySQL 的 CBO(Cost-Based Optimizer 基于成本的优化器)选择的是复合索引 idx_status_create_time,因为该索引中的 status 和 create_time 都能参与了数据过滤,成本较低;而 idx_create_time_status 只有 create_time 参数数据过滤,status 被忽略了。

这是因为,status = 1 是精确匹配,create_time 是范围匹配,根据最左前缀匹配原则用(status,create_time)。

1
2
3
4
5
6
-- 开启optimizer_trace跟踪
set session optimizer_trace="enabled=on",end_markers_in_json=on;
-- 执行SQL语句
select * from trade_info where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59';
-- 查看跟踪结果
SELECT trace FROM information_schema.OPTIMIZER_TRACE\G
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
*************************** 1. row ***************************
trace: {
  "steps": [
    // 省略...
    {
      "join_optimization": {
        "steps": [
          {
            "rows_estimation": [
              {
                "table": "`trade_info`",
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_create_time_status",
                      "usable": true,
                      "key_parts": [
                        "create_time",
                        "status",
                        "id"
                      ]
                    },
                    {
                      "index": "idx_status_create_time",
                      "usable": true,
                      "key_parts": [
                        "status",
                        "create_time",
                        "id"
                      ]
                    }
                  ],
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "idx_create_time_status",
                        "ranges": [
                          "0x41cb0f <= create_time <= 0x47cb0f"
                        ],
                        "cost": 0.36,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "idx_status_create_time",
                        "ranges": [
                          "1 <= status <= 1 AND 0x41cb0f <= create_time <= 0x47cb0f"
                        ],
                        "cost": 0.36,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ]
                  }
                }
            ]
          }
        ]
      }
    }
  ]
}

题目二:联合索引

where 的顺序,会对索引选择有影响吗?答案是不会,以下 SQL 仍然选择索引 idx_create_time_status。查询引擎会自动优化为匹配联合索引的顺序。

1
2
3
4
select  * from trade_info
    where create_time >= '2020-10-01 00:00:00'
    and create_time <= '2020-10-07 23:59:59'
    and status = 1;

题目三:联合索引

user表有索引(name, age, birthday)。注意以下例子中第一个,和第三个,等同,只用到了复合索引中的第一列 name。当创建 (a,b,c) 复合索引时,想要索引生效的话,只能使用 a 和 ab、ac 和 abc 三种组合。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
SELECT * from user WHERE name='Tom';
-- 可以命中索引,用到(name)

SELECT * from user WHERE name='Tom' AND age=18;
-- 可以命中索引,用到(name, age)

SELECT * from user WHERE name='Tom' AND birthday='2000-01-01 00:00:00'; 
-- 可以命中索引,等同于第一个,用到(name)

SELECT * from user WHERE name='Tom' AND age=18 AND birthday='2000-01-01 00:00:00'; 
-- 可以命中索引,用到(name, age, birthday)

SELECT * from user WHERE name='Tom' AND birthday='2000-01-01 00:00:00' AND age=18; 
-- 可以命中索引,用到(name, age, birthday)

SELECT * from user WHERE birthday='2000-01-01 00:00:00' AND age=18 AND name='Tom'; 
-- 可以命中索引,用到(name, age, birthday)

SELECT * from user WHERE age=18 AND birthday='2000-01-01 00:00:00'; 
-- 无法命中索引    

关于 EXPLAIN 语句的介绍:http://khaidoan.wikidot.com/mysql-using-explain。注意下面例子中的 key_len、ref 字段。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
mysql> explain SELECT * from user WHERE name='Tom' AND birthday='2000-01-01 00:00:00';
+----+-------------+-------+------------+------+-------------------+-------------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys     | key               | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+-------+------------+------+-------------------+-------------------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | user  | NULL       | ref  | name_age_birthday | name_age_birthday | 123     | const |    1 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+-------------------+-------------------+---------+-------+------+----------+-----------------------+

mysql> explain SELECT * from user WHERE birthday='2000-01-01 00:00:00' AND age=18 AND name='Tom';
+----+-------------+-------+------------+------+-------------------+-------------------+---------+-------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys     | key               | key_len | ref               | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------+-------------------+---------+-------------------+------+----------+-------+
|  1 | SIMPLE      | user  | NULL       | ref  | name_age_birthday | name_age_birthday | 132     | const,const,const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+-------------------+-------------------+---------+-------------------+------+----------+-------+

mysql> explain SELECT * from user WHERE age=18 AND birthday='2000-01-01 00:00:00';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | user  | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+

题目四:排序查询

表 test 有联合索引(c1, c2, c3, c4)[^2]。下面这个c3及以后的索引全失效,因此只用到了两个索引,c3、c4要进行排序查找。

变体 1

1
SELECT * FROM test WHERE c1='a1' AND c2='a2' AND c4='a4' ORDER BY c3;
1
2
3
4
5
6
mysql> EXPLAIN SELECT * FROM test WHERE c1='a1' AND c2='a2' AND c4='a4' ORDER BY c3;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref         | rows | filtered | Extra                 |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-----------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | idx_c1234     | idx_c1234 | 10      | const,const |    1 |   100.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-----------------------+

Extra 的解释表是怎么被处理的,一些值表明 query 是高效的(using index),一些表示是低效(using filesort、using temporary)。这里 using index condition 表示使用了索引下推(index condition push-down)。

变体 2

1
SELECT * FROM test WHERE c1='a1' AND c2='a2' ORDER BY c4;

用到了 c1,c2 索引,但是 c4 是用于排序,中间的 c3 索引断了,MySQL 会使用文件内排序给出查询结果,这样子就导致了性能下降。

1
2
3
4
5
6
mysql> EXPLAIN SELECT * FROM test WHERE c1='a1' AND c2='a2' ORDER BY c4;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref         | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+----------------+
|  1 | SIMPLE      | test  | NULL       | ref  | idx_c1234     | idx_c1234 | 10      | const,const |    1 |   100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+----------------+

变体 3

1
SELECT * FROM test WHERE c1='a1' AND c2='a2' AND c5='a5' ORDER BY c2, c3;

用到了 c1,c2 索引,但是 c2、c3 是用于排序,没有出现 filesort,性能可以。

1
2
3
4
5
6
mysql> EXPLAIN SELECT * FROM test WHERE c1='a1' AND c2='a2' AND c5='a5' ORDER BY c2, c3;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | idx_c1234     | idx_c1234 | 10      | const,const |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-------------+

变体 4

1
SELECT * FROM test WHERE c1='a1' AND c2='a2' AND c5='a5' ORDER BY c3, c2;

将c2、c3的顺序倒置,竟然没有出现filesort,这是因为c2=‘a2’,order by中的 c2 字段已经是一个常量了,所以真正排序的字段就只有c3.

1
2
3
4
5
6
mysql> EXPLAIN SELECT * FROM test WHERE c1='a1' AND c2='a2' AND c5='a5' ORDER BY c3, c2;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | test  | NULL       | ref  | idx_c1234     | idx_c1234 | 10      | const,const |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------------+------+----------+-------------+

题目五:分组查询

group by表面上是分组,但实际上分组的前提必须排序。

1
SELECT c1,c2,c3,c4 FROM test WHERE c1='a1' AND c4='a4' GROUP BY c3, c2;
1
2
3
4
5
6
mysql> EXPLAIN SELECT c1,c2,c3,c4 FROM test WHERE c1='a1' AND c4='a4' GROUP BY c3, c2;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key       | key_len | ref   | rows | filtered | Extra                                     |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------------------------------------------+
|  1 | SIMPLE      | test  | NULL       | ref  | idx_c1234     | idx_c1234 | 5       | const |    1 |   100.00 | Using where; Using index; Using temporary |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------------------------------------------+

题目六:分页查询

1
2
3
4
SELECT  * FROM trade_info WHERE status = 0 
    AND create_time >= '2020-10-01 00:00:00'
    AND create_time <= '2020-10-07 23:59:59'
    ORDER BY id DESC LIMIT 102120, 20;

对于典型的分页 limit m, n 来说,越往后翻页越慢,也就是m越大会越慢,因为要定位m位置需要扫描的数据越来越多,导致IO开销比较大。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
mysql> EXPLAIN SELECT * FROM trade_info WHERE status = 0 AND create_time >= '2020-10-01 00:00:00' AND create_time <= '2020-10-07 23:59:59' ORDER BY id DESC LIMIT 102120, 20\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: trade_info
   partitions: NULL
         type: index
possible_keys: create_time_status,status_create_time
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Backward index scan

这里可以利用辅助索引的【索引覆盖】来进行优化,先获取 id,这一步就是索引覆盖扫描,不需要回表,然后通过 id 跟原表 trade_info 进行关联,改写后的 SQL 如下:

1
2
3
4
5
6
7
SELECT * FROM trade_info AS a,
    (SELECT id FROM trade_info WHERE status = 0
        AND create_time >= '2020-10-01 00:00:00'
        AND create_time <= '2020-10-07 23:59:59'
        ORDER BY id DESC LIMIT 102120, 20
    ) AS b -- 这一步走的是索引覆盖扫描,不需要回表
WHERE a.id = b.id;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
mysql> explain select * from trade_info as a, (select  id from trade_info where status = 0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59' order by id desc limit 102120, 20) as b where a.id = b.id\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
   partitions: NULL
         type: index
possible_keys: PRIMARY
          key: idx_status_create_time
      key_len: 9
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
   partitions: NULL
         type: ref
possible_keys: <auto_key0>
          key: <auto_key0>
      key_len: 4
          ref: test.a.id
         rows: 2
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: trade_info
   partitions: NULL
         type: index
possible_keys: idx_status_create_time,idx_create_time_status
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Backward index scan

题目七:批量更新大数据

营销系统有一批过期的优惠卷要失效,核心SQL如下:

1
2
-- 需要更新的数据量500w
update coupons set status = 1 where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59';

在 Oracle 里更新 500w 数据是很快,因为可以利用多个 cpu core 去执行,但是 MySQL 就需要注意了,一个 SQL 只能使用一个 cpu core 去处理,如果SQL很复杂或执行很慢,就会阻塞后面的 SQL 请求,造成活动连接数暴增,MySQL CPU 100%,相应的接口 Timeout,同时对于主从复制架构,而且做了业务读写分离,更新 500w 数据需要 5 分钟,Master 上执行了 5 分钟,binlog 传到了 Slave 也需要执行 5 分钟,那就是 Slave 延迟 5 分钟,在这期间会造成业务脏数据,比如重复下单等[^1]。

优化思路:分而治之。先获取 where 条件中的最小 id 和最大 id,然后分批次去更新,每个批次 1000 条,这样既能快速完成更新,又能保证主从复制不会出现延迟。

  1. 先获取要更新的数据范围内的最小 id 和最大 id(表没有物理 delete,所以 id 是连续的)
1
2
3
4
5
mysql> explain select min(id) min_id, max(id) max_id from coupons where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59'; 
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
| id | select_type | table | partitions | type  | possible_keys          | key                    | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
|  1 | SIMPLE      | users | NULL       | range | idx_status_create_time | idx_status_create_time | 6       | NULL | 180300 |   100.00 | Using where; Using index |
  1. 以每次 1000 条 commit 一次进行循环 update,主要代码如下:
1
2
3
4
5
6
current_id = min_id;
for  current_id < max_id do
  update coupons set status = 1 where id >=current_id and id <= current_id + 1000;  // 通过主键id更新 1000 条很快
commit;
current_id += 1000;
done

附录

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- 题目一
CREATE TABLE trade_info (
    id INT,
    status INT,
    create_time DATE,
    PRIMARY KEY (id),
    KEY idx_status_create_time (status, create_time),
    KEY idx_create_time_status (create_time, status)
);

-- 删除索引
DROP INDEX idx_status_create_time ON trade_info;

-- 创建索引
CREATE INDEX idx_status_create_time ON trade_info(status, create_time);


-- 题目三
CREATE TABLE user (
    id INT,
    name VARCHAR(30),
    age INT,
    birthday DATE,
    addr VARCHAR(30), -- 必须要多加几个其他字段做测试
    PRIMARY KEY (id),
    KEY name_age_birthday (name, age, birthday)
);

-- 题目四
CREATE TABLE test (
  id INT,
  c1 INT,
  c2 INT,
  c3 INT,
  c4 INT,
  c5 INT,
  PRIMARY KEY (id),
  KEY idx_c1234 (c1, c2, c3, c4)
);

参考资料

[^1] 阿里面试官:MySQL如何设计索引更高效? [^2] MySQL索引面试题分析