MySQL Query Optimization using UNION

Every developers might also understand how important index is when performing query but not of all do really care about how MySQL Query Optimization works. In the most cases, developers will think that index will work as it is and do not check in detail how index really work for a specific cases. The fact is that in many cases, even when your query is using the exact condition as you create index, the actual result is not as fast as your expectation. In those cases, it is recommended to use MySQL Explain and look into the key, key_len, and Extra information to get idea about how to optimize the query. In this short article, I only mention about how to use Union for query optimization in case index is lost when querying data.

The inspired idea of this article comes from the article Using UNION to implement loose index scan in MySQL at MySQL Performance blog, so I will do summarize about it before detailing into our cases.

So if you have table people with KEY(age,zip) and you will run query something like SELECT name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347) do you think it will use index effectively ? In theory it could – it could look at each of the ages from the range and look at all zip codes supplied. In practice – it will not:

mysql> explain SELECT name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347);
 +----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
 +----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
 | 1 | SIMPLE | people | range | age | age | 4 | NULL | 90556 | Using where |
 +----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
 1 row in set (0.01 sec)

As you see instead only first index keypart is used (key_len is 4) and zip part where clause is applied after rows are retrived. Notice Using Where. There are even more bad news. Full rows will need to be read to check if zip is in the list, while it could be done only by reading data from the index. MySQL can ether read index only for all rows, in this case you will see “Using Index” in EXPLAIN output or it will read row data for all rows – it can’t read Index and perform row read only if it needs to be done at this point.

So MySQL Will not use indexes in all cases when it is technically possible. For multiple key part indexes MySQL will only be able to use multiple keyparts if first keyparts matched with “=”. Here is example:

mysql> explain SELECT name FROM people WHERE age=18 AND zip IN (12345,12346, 12347);
 +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
 +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
 | 1 | SIMPLE | people | range | age | age | 4 | NULL | 8 | Using where |
 +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
 1 row in set (0.00 sec)

Note number of rows has decreased from 90556 to 8, whle “key_len” remains the same, and the query time for the first case is 1 min 56.09 sec compared to 0.06 sec in the later case. So, the idea here is instead of use nearly 2 min for the first sql, we separate the query into smaller parts, each costs 0.06 seconds, and then finally combining these results:

mysql> SELECT  name FROM people WHERE age=18 AND zip IN (12345,12346, 12347)
    -> UNION ALL
    -> SELECT  name FROM people WHERE age=19 AND zip IN (12345,12346, 12347)
    -> UNION ALL
    -> SELECT  name FROM people WHERE age=20 AND zip IN (12345,12346, 12347)
    -> UNION ALL
    -> SELECT  name FROM people WHERE age=21 AND zip IN (12345,12346, 12347)
    -> UNION ALL
    -> SELECT  name FROM people WHERE age=22 AND zip IN (12345,12346, 12347);

In our case, we still see the filesort when querying data based on fields that we created index

mysql> EXPLAIN SELECT * FROM fb_stt WHERE (created_date >  '2013-07-20') ORDER BY score DESC LIMIT 10;
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows | Extra                       |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | fb_stt    | range | STT_ALL       | STT_ALL | 3       | NULL |   44 | Using where; Using filesort |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-----------------------------+
1 row in set (0.01 sec)

We can see that the result is using file sort, it is really the bad scenario when matched data (in WHERE clause) is grown. So let’s split it into sub-queries:

mysql> EXPLAIN
    -> (SELECT * FROM fb_stt WHERE created_date = '2013-07-20' ORDER BY score DESC LIMIT 10)
    -> UNION ALL
    -> (SELECT * FROM fb_stt WHERE created_date = '2013-07-21' ORDER BY score DESC LIMIT 10)
    -> ORDER BY score DESC LIMIT 10;
+----+--------------+------------+------+---------------+---------+---------+-------+------+----------------+
| id | select_type  | table      | type | possible_keys | key     | key_len | ref   | rows | Extra          |
+----+--------------+------------+------+---------------+---------+---------+-------+------+----------------+
|  1 | PRIMARY      | fb_stt     | ref  | STT_ALL       | STT_ALL | 3       | const |    7 | Using where    |
|  2 | UNION        | fb_stt     | ref  | STT_ALL       | STT_ALL | 3       | const |   15 | Using where    |
| NULL | UNION RESULT | <union1,2> | ALL  | NULL          | NULL    | NULL    | NULL  | NULL | Using filesort |
+----+--------------+------------+------+---------------+---------+---------+-------+------+----------------+
3 rows in set (0.01 sec)

The above one still uses filesort, but on a very small dataset, so it will improve general performance significantly.

Leave a Reply