MySQL not using indexes - sql

I just enabled the slow-log (+not using indexes) and I'm getting hundreds of entries for the same kind of query (only user changes)
SELECT id
, name
FROM `all`
WHERE id NOT IN(SELECT id
FROM `picks`
WHERE user=999)
ORDER BY name ASC;
EXPLAIN gives:
+----+--------------------+-------------------+-------+------------------+--------+---------+------------+------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------+-------+------------------+--------+---------+------------+------+------------------------------------------+
| 1 | PRIMARY | all | index | NULL | name | 156 | NULL | 209 | Using where; Using index; Using filesort |
| 2 | DEPENDENT SUBQUERY | picks | ref | user,user_2,pick | user_2 | 8 | const,func | 1 | Using where; Using index |
+----+--------------------+-------------------+-------+------------------+--------+---------+------------+------+------------------------------------------+
Any idea about how to optimize this query? I've tried with a bunch of different indexes on different fields but nothing.

I don't necessarily agree that 'not in' and 'exists' are ALWAYS bad performance choices, however, it could be in this situation.
You might be able to get your results using a much simpler query:
SELECT id
, name
FROM `all`
, 'picks'
WHERE all.id = picks.id
AND picks.user <> 999
ORDER BY name ASC;

"not in" and "exists" always bad choices for performance. May be left join with cheking "NULL" will be better try it.

This is probably the best way to write the query. Select everything from all and try to find matching rows from picks that share the same id and user is 999. If such a row doesn't exist, picks.id will be NULL, because it's using a left outer join. Then you can filter the results to return only those rows.
SELECT all.id, all.name
FROM
all
LEFT JOIN picks ON picks.id=all.id AND picks.user=999
WHERE picks.id IS NULL
ORDER BY all.name ASC

Related

How select data from two column in sql?

I have a table in postgresql as follow:
id | name | parent_id |
1 | morteza | null |
2 | ali | null |
3 | morteza2 | 1 |
4 | morteza3 | 1 |
My unique data are records with id=1,2, and record id=1 modified twice. now I want to select data with last modified. Query result for above data is as follow:
id | name |
1 | morteza3 |
2 | ali |
What's the suitable query?
If I am following correctly, you can use distinct on and coalesce():
select distinct on (coalesce(parent_id, id)) coalesce(parent_id, id) as new_id, name
from mytable
order by coalesce(parent_id, id), id desc
Demo on DB Fiddle:
new_id | name
-----: | :-------
1 | morteza3
2 | ali
From your description it would seem that the latest version of each row has parent_id IS NULL. (And obsoleted row versions have parent_id IS NOT NULL.)
The query is simple then:
SELECT id, name
FROM tbl
WHERE parent_id IS NULL;
db<>fiddle here
If you have many updates (hence, many obsoleted row versions), a partial index will help performance a lot:
CREATE INDEX ON tbl(id) WHERE parent_id IS NULL;
The actual index column is mostly irrelevant (unless there are additional requirements). The WHERE clause is the point here, to exclude the many obsoleted rows from the index. See:
Postgres partial index on IS NULL not working
Slow PostgreSQL query in production - help me understand this explain analyze output

group by date of timestamp

How can you group by a the date portion of a timestamp column in mysql and still take advantage of an index on that ?
Of course you can use solutions like
select date(thetime) from mytable group by date(thetime)
however this query will not be able to use an index on thetime but will instead require a temporary table as you are transforming the column using a function before grouping by it.
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | mytable | index | NULL | thetime | 4 | NULL | 48183 | Using index; Using temporary; Using filesort |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+----------------------------------------------+
Theoretically there's no reason why it shouldn't be able to use a range scan on an index on that column and not need a temporary table. Is there any syntax that can persuade the query optimizer to execute the query to do that?
From what I can understand the database already does what you want? The index is being scanned (rather than the table itself) as evident in xplan.
But you cannot get around the fact that an intermediate table is needed to hold the dates during the grouping (distinct) operation.

Why does select statement influence query execution and performance in MySQL?

I'm encountering a strange behavior of MySQL.
Query execution (i.e. the usage of indexes as shown by explain [QUERY]) and time needed for execution are dependent on the elements of the where clause.
Here is a query where the problem occurs:
select distinct
e1.idx, el1.idx, r1.fk_cat, r2.fk_cat
from ent e1, ent_leng el1, rel_c r1, _tax_c t1, rel_c r2, _tax_c t2
where el1.fk_ent=e1.idx
and r1.fk_ent=e1.idx and ((r1.fk_cat=43) or (r1.fk_cat=t1.fk_cat1 and t1.fk_cat2=43))
and r2.fk_ent=e1.idx and ((r2.fk_cat=10) or (r2.fk_cat=t2.fk_cat1 and t2.fk_cat2=10))
The corresponding explain output is:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+--------+-------------------------+---------+---------+---------------+-------+------------------------------------
| 1 | SIMPLE | el1 | index | fk_ent | fk_ent | 4 | NULL | 15002 | Using index; Using temporary
| 1 | SIMPLE | e1 | eq_ref | PRIMARY | PRIMARY | 4 | DB.el1.fk_ent | 1 | Using index
| 1 | SIMPLE | r1 | ref | fk_ent,fk_cat,fks | fks | 4 | DB.e1.idx | 1 | Using where; Using index
| 1 | SIMPLE | r2 | ref | fk_ent,fk_cat,fks | fks | 4 | DB.el1.fk_ent | 1 | Using index
| 1 | SIMPLE | t1 | index | fk_cat1,fk_cat2,fk_cats | fk_cats | 8 | NULL | 69 | Using where; Using index; Distinct;
| | | | | | | | | | Using join buffer
| 1 | SIMPLE | t2 | index | fk_cat1,fk_cat2,fk_cats | fk_cats | 8 | NULL | 69 | Using where; Using index; Distinct;
| Using join buffer
As you can see a one-column index has the same name as the column it belongs to. I also added some useless indexes along with the used ones, just to see if they change the execution (which they don't).
The execution takes ~4.5 seconds.
When I add the column entl1.name to the select part (nothing else changed), the index fk_ent in el1 cannot be used any more:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+--------+-------------------------+---------+---------+---------------+-------+------------------------------------
| 1 | SIMPLE | el1 | ALL | fk_ent | NULL | NULL | NULL | 15002 | Using temporary
The execution now takes ~8.5 seconds.
I always thought that the select part of a query does not influence the usage of indexes by the engine and doesn't affect performance in such a way.
Leaving out the attribute isn't a solution, and there are even more attributes that i have to select.
Even worse, the query in the used form is even a bit more complex and that makes the performance issue a big problem.
So my questions are:
1) What is the reason for this strange behavior?
2) How can I solve the performance problem?
Thanks for your help!
Gred
It's the DISTINCT restriction. You can think of that as another WHERE restriction. When you change the select list, you are really changing the WHERE clause for the DISTINCT restriction, and now the optimizer decides that it has to do a table scan anyway, so it might as well not use your index.
EDIT:
Not sure if this helps, but if I am understanding your data correctly, I think you can get rid of the DISTINCT restriction like this:
select
e1.idx, el1.idx, r1.fk_cat, r2.fk_cat
from ent e1
Inner Join ent_leng el1 ON el1.fk_ent=e1.idx
Inner Join rel_c r1 ON r1.fk_ent=e1.idx
Inner Join rel_c r2 ON r2.fk_ent=e1.idx
where
((r1.fk_cat=43) or Exists(Select 1 From _tax_c t1 Where r1.fk_cat=t1.fk_cat1 and t1.fk_cat2=43))
and
((r2.fk_cat=10) or Exists(Select 1 From _tax_c t2 Where r2.fk_cat=t2.fk_cat1 and t2.fk_cat2=10))
MySQL will return data from an index if possible, saving the entire row from being loaded. In this way, the selected columns can influence the index selection.
With this in mind, it can much more efficient to add all required columns to an index, especially in the case of only selecting a small subset of columns.

A subquery that should be independent is not. Why?

I have a table files with files and a table reades with read accesses to these files. In the table reades there is a column file_id where refers to the respective column in files.
Now I would like to list all files which have not been accessed and tried this:
SELECT * FROM files WHERE file_id NOT IN (SELECT file_id FROM reades)
This is terribly slow. The reason is that mySQL thinks that the subquery is dependent on the query:
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | files | ALL | NULL | NULL | NULL | NULL | 1053 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | reades | ALL | NULL | NULL | NULL | NULL | 3242 | 100.00 | Using where |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
But why? The subquery is completely independent and more or less just meant to return a list of ids.
(To be precise: Each file_id can appear multiple times in reades, of course, as there can be arbitrarily many read operations for each file.)
Try replacing the subquery with a join:
SELECT *
FROM files f
LEFT OUTER JOIN reades r on r.file_id = f.file_id
WHERE r.file_id IS NULL
Here's a link to an article about this problem. The writer of that article wrote a stored procedure to force MySQL to evaluate subqueries as independant. I doubt that's necessary in this case though.
i've seen this before. it's a bug in mysql. try this:
SELECT * FROM files WHERE file_id NOT IN (SELECT * FROM (SELECT file_id FROM reades))
there bug report is here: http://bugs.mysql.com/bug.php?id=25926
Try:
SELECT * FROM files WHERE file_id NOT IN (SELECT reades.file_id FROM reades)
That is: if it's coming up as dependent, perhaps that's because of ambiguity in what file_id refers to, so let's try fully qualifying it.
If that doesn't work, just do:
SELECT files.*
FROM files
LEFT JOIN reades
USING (file_id)
WHERE reades.file_id IS NULL
Does MySQL support EXISTS in the same way that MSSQL would?
If so, you could rewrite the query as
SELECT * FROM files as f WHERE file_id NOT EXISTS (SELECT 1 FROM reades r WHERE r.file_id = f.file_id)
Using IN is horribly inefficient as it runs that subquery for each row in the parent query.
Looking at this page I found two possible solutions which both work. Just for completeness I add one of those, similar to the answers with JOINs shown above, but it is fast even without using foreign keys:
SELECT * FROM files AS f
INNER JOIN (SELECT DISTINCT file_id FROM reades) AS r
ON f.file_id = r.file_id
This solves the problem, but still this does not answer my question :)
EDIT: If I interpret the EXPLAIN output correctly, this is fast, because the interpreter generates a temporary index:
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 843 | |
| 1 | PRIMARY | f | eq_ref | PRIMARY | PRIMARY | 4 | r.file_id | 1 | |
| 2 | DERIVED | reades | range | NULL | file_id | 5 | NULL | 811 | Using index for group-by |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
IN-subqueries are in MySQL 5.5 and earlier converted to EXIST subqueries. The given query will be converted to the following query:
SELECT * FROM files
WHERE NOT EXISTS (SELECT 1 FROM reades WHERE reades.filed_id = files.file_id)
As you see, the subquery is actually dependent.
MySQL 5.6 may choose to materialize the subquery. That is, first, run the inner query and store the result in a temporary table (removing duplicates). Then, it can use a join-like operation between the outer table (i.e., files) and the temporary table to find the rows with no match. This way of executing the query will probably be more optimal if reades.file_id is not indexed.
However, if reades.file_id is indexed, the traditional IN-to-EXISTS execution strategy is actually pretty efficient. In that case, I would not expect any significant performance improvement from converting the query into a join as suggested in other answers. MySQL 5.6 optimizer makes a cost-based choice between materialization and IN-to-EXISTS execution.

Mysql Index Being Ignored

EXPLAIN SELECT
*
FROM
content_link link
STRAIGHT_JOIN
content
ON
link.content_id = content.id
WHERE
link.content_id = 1
LIMIT 10;
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
| 1 | SIMPLE | link | ref | content_id | content_id | 4 | const | 1 | |
| 1 | SIMPLE | content | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
However, when I remove the WHERE, the query stops using the key (even when i explicitly force it to)
EXPLAIN SELECT
*
FROM
content_link link FORCE KEY (content_id)
STRAIGHT_JOIN
content
ON
link.content_id = content.id
LIMIT 10;
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
| 1 | SIMPLE | link | index | content_id | PRIMARY | 7 | NULL | 4555299 | Using index |
| 1 | SIMPLE | content | eq_ref | PRIMARY | PRIMARY | 4 | ft_dir.link.content_id | 1 | |
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
Are there any work-arounds to this?
I realize I'm selecting the entire table in the second example, but why does mysql suddenly decide that it's going to ignore my FORCE anyway and not use the key? Without the key the query takes like 10 minutes.. ugh.
FORCE is a bit of a misnomer. Here's what the MySQL docs say (emphasis mine):
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Since you aren't actually "finding" any rows (you are selecting them all), a table scan is always going to be fastest, and the optimizer is smart enough to know that in spite of what you are telling them.
ETA:
Try adding an ORDER BY on the primary key once and I bet it'll use the index.
An index helps search quickly inside a table, but it just slows things down if you select the entire table. So MySQL is correct in ignoring the index.
In your case, maybe the index has a hidden side effect that's not known to MySQL. For example, if the inner join holds only for a few rows, an index would speed things up. But MySQL can't know that without an explicit hint.
There is an exception: when every column you select is inside the index, the index is still useful if you select every row. For example, if you have an index on LastName, the following query still benefits from the index:
select LastName from orders
But this one won't:
select * from Orders
Your content_id seems to accept NULL values.
MySQL optimizer thinks there is no guarantee that your query will return all values only by using the index (though actually there is guarantee, since you use the column in a JOIN)
That's why it reverts to full table scan.
Either add a NOT NULL condition:
SELECT *
FROM content_link link FORCE KEY (content_id)
STRAIGHT_JOIN
content
ON content.id = link.content_id
WHERE link.content_id IS NOT NULL
LIMIT 10;
or mark your column as NOT NULL:
ALTER TABLE content_link MODIFY content_id NOT NULL
Update:
This is verified bug 45314 in MySQL.