Optimizing MySQL Queries: A Practical How-To Guide
A practical MySQL query tuning guide using EXPLAIN, indexes, safer rewrites, and slow-query evidence.
Optimizing MySQL Queries: A Practical How-To Guide
Slow MySQL queries are rarely mysterious once you look at the execution plan. The hard part is not knowing that indexes matter. The hard part is proving which query is slow, understanding why MySQL chose a plan, and changing the query or index without making writes, storage, or other queries worse.
Start with evidence. Use the slow query log, Performance Schema, application traces, or a monitoring tool such as PMM to find queries that actually hurt users. Then use EXPLAIN and, when safe, EXPLAIN ANALYZE to see what MySQL is doing.
Understanding Query Performance
Common causes include:
- Missing or Ineffective Indexes: Without appropriate indexes, MySQL has to perform full table scans, which are very inefficient for large tables.
- Poorly Written SQL: Non-sargable filters, unnecessary
SELECT *, accidental cross joins, and inefficient join conditions can all degrade performance. - Large Data Sets: More data means more pages to read, sort, group, and cache.
- Hardware and Configuration: Suboptimal server configuration or insufficient hardware resources can also play a role, though this guide focuses on query-level optimization.
The Power of EXPLAIN
EXPLAIN is the first tool to reach for when you want to understand how MySQL plans a query. For plain EXPLAIN SELECT, MySQL shows the optimizer's chosen plan without returning the result set. EXPLAIN ANALYZE executes the query and reports actual timing, so use it carefully on production systems.
How to Use EXPLAIN
For a read query, prepend EXPLAIN:
EXPLAIN SELECT * FROM users WHERE username = 'john_doe';
Interpreting EXPLAIN Output
The output of EXPLAIN is a table with several important columns:
id: The sequence number of the SELECT within the query. Higher numbers are generally executed first.select_type: The type of SELECT (e.g.,SIMPLE,PRIMARY,SUBQUERY,DERIVED).table: The table being accessed.partitions: The partitions used (if partitioning is enabled).type: The join type. This is one of the most useful columns. Aim forconst,eq_ref,ref, orrangewhen the query shape allows it. Be suspicious ofindexand especiallyALLon large tables.possible_keys: Shows which indexes MySQL could use.key: The index MySQL actually chose to use.key_len: The length of the index portion MySQL expects to use. Shorter is not automatically better; it depends on selectivity and the query.ref: The column or constant compared to the index (key).rows: An estimate of the number of rows MySQL expects to examine.filtered: The percentage of rows filtered by the table condition.Extra: Contains additional information about how MySQL resolves the query. Key values to watch for include:Using where: Indicates MySQL applies a condition while processing rows. It is common and not always bad.Using index: Means the query is covered by an index (all required columns are in the index), which is good.Using temporary: MySQL needs to create a temporary table, often forGROUP BYorORDER BYoperations. This can be slow.Using filesort: MySQL must do an external sort (not using an index for ordering). This is often a sign of an inefficientORDER BYclause.
Identifying Bottlenecks with EXPLAIN
Let's look at some common scenarios and how EXPLAIN helps identify issues:
Scenario 1: Full Table Scan
Consider a query like:
SELECT * FROM orders WHERE order_date = '2023-10-26';
If the order_date column is not indexed, EXPLAIN might show:
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 1000000 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
Problem: type: ALL indicates a full table scan. rows: 1000000 shows that MySQL has to examine every row in the orders table. key: NULL means no index was used.
Solution: Add an index to the order_date column:
CREATE INDEX idx_order_date ON orders (order_date);
After adding the index, re-run EXPLAIN. You should see a more selective access type such as ref or range, and the estimated row count should drop if the date filter is selective.
Scenario 2: Inefficient ORDER BY or GROUP BY
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id ORDER BY customer_id;
If customer_id is not indexed, EXPLAIN may show:
+----+-------------+--------+-------+---------------+------+---------+------+--------+----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+--------+----------------------------------+
| 1 | SIMPLE | orders | index | NULL | NULL | NULL | NULL | 100000 | Using temporary; Using filesort |
+----+-------------+--------+-------+---------------+------+---------+------+--------+----------------------------------+
Problem: Using temporary and Using filesort indicate that MySQL is performing costly operations to sort and group the data. This is often because no index can satisfy both the grouping and ordering requirements efficiently.
Solution: For this specific query, an index on (customer_id) may let MySQL scan rows in grouping order. If the real query filters by date, status, or tenant first, a composite index may be better, such as (tenant_id, status, customer_id).
CREATE INDEX idx_customer_id ON orders (customer_id);
Scenario 3: Using SELECT * Unnecessarily
When you select all columns (*) but only need a few, you transfer more data and may prevent a covering index from being useful. This is especially noticeable on wide tables with JSON columns, text blobs, or many nullable fields.
-- Assume an index on 'status'
SELECT * FROM tasks WHERE status = 'pending';
EXPLAIN might show Using where but if the query requires columns not in the index used for filtering, it will still need to access the table data.
Solution: Specify only the columns you need:
SELECT task_id, description FROM tasks WHERE status = 'pending';
If you frequently query this exact shape, consider a covering index that includes the filter column and returned columns:
CREATE INDEX idx_tasks_status_id_description
ON tasks (status, task_id, description);
Do not create covering indexes for every query. They speed reads at the cost of storage and write overhead.
Rewriting Slow Queries
Beyond indexing, the way you structure SQL can change the amount of work MySQL has to do.
Avoid Correlated Subqueries
Correlated subqueries can execute once for each row processed by the outer query. MySQL can optimize some of them, but if EXPLAIN shows repeated dependent lookups, a join or derived table is often clearer and faster.
Often inefficient:
SELECT o.order_id, o.order_date
FROM orders o
WHERE o.customer_id IN (
SELECT c.customer_id
FROM customers c
WHERE c.country = 'USA'
);
Often better as a join:
SELECT o.order_id, o.order_date
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.country = 'USA';
Use EXPLAIN on both versions. The join is not automatically faster in every schema, but it is easier for many teams to reason about and index.
Optimize LIKE Clauses
Leading wildcards (%) in LIKE clauses usually prevent a normal B-tree index from being used for a range seek.
Inefficient:
SELECT * FROM products WHERE product_name LIKE '%widget';
Better (if possible):
SELECT * FROM products WHERE product_name LIKE 'widget%';
If you need contains-style matching, consider MySQL full-text indexes for suitable text search, n-gram approaches for specific languages, or a search engine when relevance and flexible matching matter.
Use UNION ALL Instead of UNION When Possible
UNION removes duplicate rows, which requires an extra sorting and deduplication step. If you know there are no duplicates or don't need to remove them, UNION ALL is faster.
Slow:
SELECT name FROM table1
UNION
SELECT name FROM table2;
Fast:
SELECT name FROM table1
UNION ALL
SELECT name FROM table2;
Other Optimization Tips
- Keep Statistics Updated: Ensure table statistics are current so the query optimizer can make informed decisions. This is often handled automatically but can be manually updated with
ANALYZE TABLE. - Server Configuration: Query tuning will not compensate for a tiny InnoDB buffer pool or overloaded disks. In MySQL 8.0, the old query cache is removed, so do not plan new tuning around
query_cache_size. - Regular Monitoring: Use tools like MySQL Enterprise Monitor, Percona Monitoring and Management (PMM), or built-in performance schema views to track slow queries and identify trends.
A Practical Tuning Workflow
For production systems, tune from the slow query outward:
- Capture the exact SQL, bind values, row counts, and timing.
- Run
EXPLAIN FORMAT=TREEorEXPLAIN FORMAT=JSONif your MySQL version supports it. - Check whether the selected index matches the filter and join pattern.
- Test a query rewrite or index change on realistic data.
- Compare rows examined, temporary tables, sort behavior, and wall-clock latency.
This keeps you from adding indexes because a query "looks slow." Indexes have a cost. Every insert, update, and delete must maintain them. A table with ten overlapping indexes can become slower overall even if one read query improves.
For a common multi-tenant application query, the index order often matters more than the number of indexed columns:
SELECT id, created_at, total
FROM orders
WHERE tenant_id = 42
AND status = 'paid'
AND created_at >= '2025-01-01'
ORDER BY created_at DESC
LIMIT 50;
A useful index might be:
CREATE INDEX idx_orders_tenant_status_created
ON orders (tenant_id, status, created_at DESC);
That index starts with equality filters, then supports the date range and ordering. If you put created_at first, MySQL may scan many tenants before finding the right one. If you leave out status, the query may still work but examine many extra rows.
Watch for Non-Sargable Filters
A condition is sargable when MySQL can use an index to search for matching rows. Wrapping an indexed column in a function often breaks that:
-- Harder to use an index on created_at
SELECT * FROM orders
WHERE DATE(created_at) = '2025-01-15';
Rewrite it as a range:
SELECT *
FROM orders
WHERE created_at >= '2025-01-15'
AND created_at < '2025-01-16';
The second version lets MySQL seek into an index on created_at. The same idea applies to LOWER(email), math on numeric columns, and implicit type conversions. If the column is indexed, keep the column side of the comparison clean when you can.
Be Careful With Pagination
Offset pagination gets expensive on deep pages:
SELECT id, title
FROM posts
WHERE status = 'published'
ORDER BY published_at DESC
LIMIT 20 OFFSET 200000;
MySQL still has to walk through the earlier rows before returning the page you asked for. For feeds, audit logs, and admin tables, keyset pagination is usually better:
SELECT id, title, published_at
FROM posts
WHERE status = 'published'
AND (published_at, id) < ('2025-05-01 12:00:00', 987654)
ORDER BY published_at DESC, id DESC
LIMIT 20;
Pair it with an index such as (status, published_at, id). This changes the product behavior a little because users move through a cursor rather than jumping to page 10,000, but it can turn a painful query into a predictable one.
Validate With Real Data
Small staging databases lie. A query that is instant on 20,000 rows may be awful on 200 million rows, especially when data distribution is skewed. Test against production-like volume and cardinality when possible. If you cannot copy production data, at least generate data with similar tenant sizes, status distribution, and date ranges.
One last habit helps: keep the old plan and the new plan in the ticket. Future you will want to know why an index exists.
The best MySQL tuning habit is to make every change earn its place. Capture the slow query, inspect the plan, change one query or index, then compare latency and rows examined. A clean EXPLAIN plan is useful, but the real win is lower production latency without creating new write pressure or storage bloat.