Strange use of the index in Mysql - sql

Explain
SELECT `feed_objects`.*
FROM `feed_objects`
WHERE (`feed_objects`.feed_id IN
( 165,160,159,158,157,153,152,151,150,149,148,147,129,128,127,126,125,124,122,
121, 120,119,118,117,116,115,114,113,111,110)) ;
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | feed_objects | ALL | by_feed_id | NULL | NULL | NULL | 188 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+------+-------------+
Not used index by_feed_id
But when I point less than the values in the WHERE - everything is working right
Explain
SELECT `feed_objects`.*
FROM `feed_objects`
WHERE (`feed_objects`.feed_id IN
(165,160,159,158,157,153,152,151,150,149,148,147,129,128,127,125,124)) ;
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | feed_objects | range | by_feed_id | by_feed_id | 9 | NULL | 18 | Using where |
+----+-------------+--------------+-------+---------------+------------+---------+------+------+-------------+
Used index by_feed_id
What is the problem?

The MySQL optimizer makes a lot of decisions that sometimes look strange. In this particular case, I believe that you have a very small table (188 rows total from the looks of the first EXPLAIN), and that is affecting the optimizer's decision.
The "How MySQL Uses Indexes" manual pages offers this snippet of info:
Sometimes MySQL does not use an index,
even if one is available. One
circumstance under which this occurs
is when the optimizer estimates that
using the index would require MySQL to
access a very large percentage of the
rows in the table. (In this case, a
table scan is likely to be much faster
because it requires fewer seeks.)
Because the number of ids in your first WHERE clause is relatively large compared to the table size, MySQL has determined that it is faster to scan the data than to consult the index, as the data scan is likely to result in less disk access time.
You could test this by adding rows to the table and re-running the first EXPLAIN query. At a certain point, MySQL will start using the index for it as well as the second query.

Related

How to make postgres search fast

I have a Postgresql-database with more than 100 billion rows in one table.
The table schema is as follow:
id_1 | integer | | not null |
id_2 | bigint | | not null |
created_at | timestamp without time zone | | not null |
id_3 | bigint | | |
char1 | character varying(20) | | not null |
lang | character(6) | | not null |
gps | point | | |
some_dat | character varying(140)[] | | |
JSON | jsonb | | not null |
I'm trying to search inside the JSON object and sort the data by the JSON object but the problem is that it takes too much time for sorting and returning the data.
Also when sorting the data by created_at for example, it takes also time for the result.
I'm trying to make my application as a real-time as I can.
I have 2 indexing for id_1 and id_2
Also, I tried to use materialized view for each (id) but the problem is updating the materialized view takes much time also.
Any suggestions please?
I'm running PostgreSQL 10.3, on a Linux server with SSD and 128 GB of ram.
Thanks,
If you want to sort a query result with an expression like this:
ORDER BY expr1, expr2, ...
You need the following index to speed up the sorting:
CREATE INDEX ON atable ((expr1), (expr2), ...);
If that does not work because the expressions contain functions that are not IMMUTABLE, you cannot speed up the sort with an index. In that case, consider rewriting your query with IMMUTABLE expressions.

group by date of timestamp

How can you group by a the date portion of a timestamp column in mysql and still take advantage of an index on that ?
Of course you can use solutions like
select date(thetime) from mytable group by date(thetime)
however this query will not be able to use an index on thetime but will instead require a temporary table as you are transforming the column using a function before grouping by it.
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | mytable | index | NULL | thetime | 4 | NULL | 48183 | Using index; Using temporary; Using filesort |
+----+-------------+-------------+-------+---------------+-------------+---------+------+-------+----------------------------------------------+
Theoretically there's no reason why it shouldn't be able to use a range scan on an index on that column and not need a temporary table. Is there any syntax that can persuade the query optimizer to execute the query to do that?
From what I can understand the database already does what you want? The index is being scanned (rather than the table itself) as evident in xplan.
But you cannot get around the fact that an intermediate table is needed to hold the dates during the grouping (distinct) operation.

Why does select statement influence query execution and performance in MySQL?

I'm encountering a strange behavior of MySQL.
Query execution (i.e. the usage of indexes as shown by explain [QUERY]) and time needed for execution are dependent on the elements of the where clause.
Here is a query where the problem occurs:
select distinct
e1.idx, el1.idx, r1.fk_cat, r2.fk_cat
from ent e1, ent_leng el1, rel_c r1, _tax_c t1, rel_c r2, _tax_c t2
where el1.fk_ent=e1.idx
and r1.fk_ent=e1.idx and ((r1.fk_cat=43) or (r1.fk_cat=t1.fk_cat1 and t1.fk_cat2=43))
and r2.fk_ent=e1.idx and ((r2.fk_cat=10) or (r2.fk_cat=t2.fk_cat1 and t2.fk_cat2=10))
The corresponding explain output is:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+--------+-------------------------+---------+---------+---------------+-------+------------------------------------
| 1 | SIMPLE | el1 | index | fk_ent | fk_ent | 4 | NULL | 15002 | Using index; Using temporary
| 1 | SIMPLE | e1 | eq_ref | PRIMARY | PRIMARY | 4 | DB.el1.fk_ent | 1 | Using index
| 1 | SIMPLE | r1 | ref | fk_ent,fk_cat,fks | fks | 4 | DB.e1.idx | 1 | Using where; Using index
| 1 | SIMPLE | r2 | ref | fk_ent,fk_cat,fks | fks | 4 | DB.el1.fk_ent | 1 | Using index
| 1 | SIMPLE | t1 | index | fk_cat1,fk_cat2,fk_cats | fk_cats | 8 | NULL | 69 | Using where; Using index; Distinct;
| | | | | | | | | | Using join buffer
| 1 | SIMPLE | t2 | index | fk_cat1,fk_cat2,fk_cats | fk_cats | 8 | NULL | 69 | Using where; Using index; Distinct;
| Using join buffer
As you can see a one-column index has the same name as the column it belongs to. I also added some useless indexes along with the used ones, just to see if they change the execution (which they don't).
The execution takes ~4.5 seconds.
When I add the column entl1.name to the select part (nothing else changed), the index fk_ent in el1 cannot be used any more:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+--------+-------------------------+---------+---------+---------------+-------+------------------------------------
| 1 | SIMPLE | el1 | ALL | fk_ent | NULL | NULL | NULL | 15002 | Using temporary
The execution now takes ~8.5 seconds.
I always thought that the select part of a query does not influence the usage of indexes by the engine and doesn't affect performance in such a way.
Leaving out the attribute isn't a solution, and there are even more attributes that i have to select.
Even worse, the query in the used form is even a bit more complex and that makes the performance issue a big problem.
So my questions are:
1) What is the reason for this strange behavior?
2) How can I solve the performance problem?
Thanks for your help!
Gred
It's the DISTINCT restriction. You can think of that as another WHERE restriction. When you change the select list, you are really changing the WHERE clause for the DISTINCT restriction, and now the optimizer decides that it has to do a table scan anyway, so it might as well not use your index.
EDIT:
Not sure if this helps, but if I am understanding your data correctly, I think you can get rid of the DISTINCT restriction like this:
select
e1.idx, el1.idx, r1.fk_cat, r2.fk_cat
from ent e1
Inner Join ent_leng el1 ON el1.fk_ent=e1.idx
Inner Join rel_c r1 ON r1.fk_ent=e1.idx
Inner Join rel_c r2 ON r2.fk_ent=e1.idx
where
((r1.fk_cat=43) or Exists(Select 1 From _tax_c t1 Where r1.fk_cat=t1.fk_cat1 and t1.fk_cat2=43))
and
((r2.fk_cat=10) or Exists(Select 1 From _tax_c t2 Where r2.fk_cat=t2.fk_cat1 and t2.fk_cat2=10))
MySQL will return data from an index if possible, saving the entire row from being loaded. In this way, the selected columns can influence the index selection.
With this in mind, it can much more efficient to add all required columns to an index, especially in the case of only selecting a small subset of columns.

MySQL update taking(too) long time

After some expected growth on our service all of the sudden some updates are taking extremely long time, these used to be pretty fast until the table reached about 2MM records, now they take about 40-60 seconds each.
update table1 set field1=field1+1 where id=2229230;
Query OK, 0 rows affected (42.31 sec)
Rows matched: 1 Changed: 0 Warnings: 0
Here are the field types:
`id` bigint(20) NOT NULL auto_increment,
`field1` int(11) default '0',
Result from the profiling, for context switches which is the only one that seems to have high numbers on the results:
mysql> show profile context switches
-> ;
+----------------------+-----------+-------------------+---------------------+
| Status | Duration | Context_voluntary | Context_involuntary |
+----------------------+-----------+-------------------+---------------------+
| (initialization) | 0.000007 | 0 | 0 |
| checking permissions | 0.000009 | 0 | 0 |
| Opening tables | 0.000045 | 0 | 0 |
| System lock | 0.000009 | 0 | 0 |
| Table lock | 0.000008 | 0 | 0 |
| init | 0.000056 | 0 | 0 |
| Updating | 46.063662 | 75487 | 14658 |
| end | 2.803943 | 5851 | 857 |
| query end | 0.000054 | 0 | 0 |
| freeing items | 0.000011 | 0 | 0 |
| closing tables | 0.000008 | 0 | 0 |
| logging slow query | 0.000265 | 2 | 1 |
+----------------------+-----------+-------------------+---------------------+
12 rows in set (0.00 sec)
The table is about 2.5 million records, the id is the primary key, and it has one unique index on another field (not included in the update).
It's a innodb table.
any pointers on what could be the cause ?
Any particular variables that could help track the issue down ?
Is there a "explain" for updates ?
EDIT: Also I just noticed that the table also has a :
`modDate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
Explain:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | table1 | const | PRIMARY | PRIMARY | 8 | const | 1 | |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.02 sec)
There's no way that query should take a long time, if id is really the primary key (unless you have lots and lots of ids equal to 2229230?). Please run the following two sqls and post the results:
show create table table1;
explain select * from table1 where id = 2229230;
Update: just to be complete, also do a
select count(*) from table1 where id = 2229230;
Ok after a few hours of tracking down this one, it seems the cause was a "duplicate index", the create table that I was doing to answer Keith, had this strange combo:
a unique key on fieldx
a key on fieldx
the second one is obviously redundant and useless, but after I dropped the key all updates went back to < 1 sec.
+1 to Keith and Ivan as their comments help me finally track down the issue.
May I recommend as well:
OPTIMIZE TABLE table1;
Sometimes your tables just need a little love, if they've grown rapidly and you haven't optimized them in a while the indices may be a little crazy. In fact, if you've got the time (and you do), it never hurts to throw in
REPAIR TABLE table1;
What helped for me was changing the engine from 'innodb' to 'myisam'.
My update query on a similar size dataset went from 100 ms to 0.1 ms.
Be aware that changing the engine type for an existing application might have consequences as InnoDB has better data integrity and some functions your application might be depending on.
For me it was worth losing InnoDB for a 1000 times speed increase on big datasets.
I'd come across this same issue, which turned out to be due to a trigger on the table. Any time an update was done on my table, the trigger would update another table which is were the delay was actually caused. Adding an index on that table fixed the issue. So try checking for triggers on the table.
Another thing to take into consideration when debugging update statements taking way too much time compared to the select statement version:
are the update statements executed within a transaction?
In my case a transaction turned the update statement(s) from a blazingly fast execution time to a extremely slow one. The more rows affected the heavier the loss of performance. My statements work with user defined variables, but don't know if that part of the 'problem'.

Mysql Index Being Ignored

EXPLAIN SELECT
*
FROM
content_link link
STRAIGHT_JOIN
content
ON
link.content_id = content.id
WHERE
link.content_id = 1
LIMIT 10;
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
| 1 | SIMPLE | link | ref | content_id | content_id | 4 | const | 1 | |
| 1 | SIMPLE | content | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+---------+-------+---------------+------------+---------+-------+------+-------+
However, when I remove the WHERE, the query stops using the key (even when i explicitly force it to)
EXPLAIN SELECT
*
FROM
content_link link FORCE KEY (content_id)
STRAIGHT_JOIN
content
ON
link.content_id = content.id
LIMIT 10;
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
| 1 | SIMPLE | link | index | content_id | PRIMARY | 7 | NULL | 4555299 | Using index |
| 1 | SIMPLE | content | eq_ref | PRIMARY | PRIMARY | 4 | ft_dir.link.content_id | 1 | |
+----+-------------+---------+--------+---------------+---------+---------+------------------------+---------+-------------+
Are there any work-arounds to this?
I realize I'm selecting the entire table in the second example, but why does mysql suddenly decide that it's going to ignore my FORCE anyway and not use the key? Without the key the query takes like 10 minutes.. ugh.
FORCE is a bit of a misnomer. Here's what the MySQL docs say (emphasis mine):
You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.
Since you aren't actually "finding" any rows (you are selecting them all), a table scan is always going to be fastest, and the optimizer is smart enough to know that in spite of what you are telling them.
ETA:
Try adding an ORDER BY on the primary key once and I bet it'll use the index.
An index helps search quickly inside a table, but it just slows things down if you select the entire table. So MySQL is correct in ignoring the index.
In your case, maybe the index has a hidden side effect that's not known to MySQL. For example, if the inner join holds only for a few rows, an index would speed things up. But MySQL can't know that without an explicit hint.
There is an exception: when every column you select is inside the index, the index is still useful if you select every row. For example, if you have an index on LastName, the following query still benefits from the index:
select LastName from orders
But this one won't:
select * from Orders
Your content_id seems to accept NULL values.
MySQL optimizer thinks there is no guarantee that your query will return all values only by using the index (though actually there is guarantee, since you use the column in a JOIN)
That's why it reverts to full table scan.
Either add a NOT NULL condition:
SELECT *
FROM content_link link FORCE KEY (content_id)
STRAIGHT_JOIN
content
ON content.id = link.content_id
WHERE link.content_id IS NOT NULL
LIMIT 10;
or mark your column as NOT NULL:
ALTER TABLE content_link MODIFY content_id NOT NULL
Update:
This is verified bug 45314 in MySQL.