Let’s take the employees database, and slightly modify the employees tables:

 

 

 

mysql> ALTER TABLE employees ADD INDEX idx_first (first_name),ENGINE=InnoDB;

 

1 SELECT * FROM employees ORDER BY first_name;

This query can ofcourse by executed by running a full table scan, but we could also take advantage of the idx_first index, which will translate to a full index scan.

Let’s see how the optimizer will execute it:

 

1

2

3

4

5

6

7

8

9

10

11

12

mysql> EXPLAIN SELECT * FROM employees ORDER BY first_name\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: employees

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 300363

Extra: Using filesort

 

1

2

3

4

5

6

7

8

9

10

11

12

mysql> EXPLAIN SELECT * FROM employees FORCE INDEX(idx_first) ORDER BY first_name\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: employees

type: index

possible_keys: NULL

key: idx_first

key_len: 16

ref: NULL

rows: 300363

Extra:

Honestly, it looks better: the number of rows is the same, but a full index scan instead of a full table scan and no filesort. But predicting how a query will perform by only looking at the execution plan is not enough, we must run both queries and compare execution time.

First case: the employees table does not fit in memory
With the full table scan, the query runs in about 4s.
With the full index scan, it runs in about 30s.

So the optimizer was right after all. But why? Simply because all access patterns are not equal. When we are doing a full table scan, we are doing sequential reads, which are quite fast even with slow disks. But when we are using the index, we first have to do a full index scan (fast sequential reads, no problem) and then lots of random reads to fetch rows by index value. And random reads are orders of magnitude slower than sequential reads.

The optimizer has this knowledge, so it is able to pick the right execution plan.

Second case: the employees table fits in memory
With the full table scan, the query runs in about 3.3s.
With the full index scan, the query runs in about 2.6s.

We can see here a limitation of the optimizer: it does not know on which kind of media data is stored. If it is stored on spinning disks, the assumption that random reads are much slower than sequential reads is correct, but it is not the case anymore if data is stored in memory. That’s why when execution plans look similar, you should always execute the query to really see which execution plan should be chosen. Here if we know that the table will always be in memory, we should add the FORCE INDEX hint to ensure optimal response time.

Now let’s modify the query by selecting only the first_name field instead of selecting all the fields:

 

1

2

3

4

5

6

7

8

9

10

11

12

mysql> explain SELECT first_name FROM employees ORDER BY first_name\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: employees

type: index

possible_keys: NULL

key: idx_first

key_len: 16

ref: NULL

rows: 300584

Extra: Using index

The optimizer chooses the full index scan. It is a good choice because the index now covers the query, which means that reading the index is enough to get the results.

Conclusions:

  • For a non-covering index, the difference between a full table scan and an execution plan based on a full index scan is basically the difference between sequential reads and random reads: it can be close if you have fast storage or it can be very different if you have slow storage.
  • A full index scan can become interesting when the index is covering.
  • Don’t forget to measure response time when you are trying different execution plans. It is too easy to get focused on having a good-looking execution, but the end user only cares about response time!

 

Recent Posts

Start typing and press Enter to search