Is InnoDB sorting really THAT slow?

Is InnoDB sorting really THAT slow? - sql

I had all my tables in myISAM but the table level locking was starting to kill me when I had long running update jobs. I converted my primary tables over to InnoDB and now many of my queries are taking over 1 minute to complete where they were nearly instantaneous on myISAM. They are usually stuck in the Sorting result step. Did I do something wrong?
For example :
SELECT * FROM `metaward_achiever`
INNER JOIN `metaward_alias` ON (`metaward_achiever`.`alias_id` = `metaward_alias`.`id`)
WHERE `metaward_achiever`.`award_id` = 1507
ORDER BY `metaward_achiever`.`modified` DESC
LIMIT 100
Takes about 90 seconds now. Here is the describe :
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+-------+-----------------------------+
| 1 | SIMPLE | metaward_achiever | ref | metaward_achiever_award_id,metaward_achiever_alias_id | metaward_achiever_award_id | 4 | const | 66424 | Using where; Using filesort |
| 1 | SIMPLE | metaward_alias | eq_ref | PRIMARY | PRIMARY | 4 | paul.metaward_achiever.alias_id | 1 | |
+----+-------------+-------------------+--------+-------------------------------------------------------+----------------------------+---------+---------------------------------+-------+-----------------------------+
It seems that now TONS of my queries get stuck in the "Sorting result" step :
mysql> show processlist;
+--------+------+-----------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+------+-----------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| 460568 | paul | localhost | paul | Query | 0 | NULL | show processlist |
| 460638 | paul | localhost | paul | Query | 0 | Sorting result | SELECT `metaward_achiever`.`id`, `metaward_achiever`.`modified`, `metaward_achiever`.`created`, `met |
| 460710 | paul | localhost | paul | Query | 79 | Sending data | SELECT `metaward_achiever`.`id`, `metaward_achiever`.`modified`, `metaward_achiever`.`created`, `met |
| 460722 | paul | localhost | paul | Query | 49 | Updating | UPDATE `metaward_alias` SET `modified` = '2009-09-15 12:43:50', `created` = '2009-08-24 11:55:24', ` |
| 460732 | paul | localhost | paul | Query | 25 | Sorting result | SELECT `metaward_achiever`.`id`, `metaward_achiever`.`modified`, `metaward_achiever`.`created`, `met |
+--------+------+-----------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
Any why is that simple update stuck for 49 seconds?
If it helps, here are the schemas :
| metaward_alias | CREATE TABLE `metaward_alias` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`modified` datetime NOT NULL,
`created` datetime NOT NULL,
`string_id` varchar(255) DEFAULT NULL,
`shortname` varchar(100) NOT NULL,
`remote_image` varchar(500) DEFAULT NULL,
`image` varchar(100) NOT NULL,
`user_id` int(11) DEFAULT NULL,
`type_id` int(11) NOT NULL,
`md5` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `string_id` (`string_id`),
KEY `metaward_alias_user_id` (`user_id`),
KEY `metaward_alias_type_id` (`type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=858381 DEFAULT CHARSET=utf8 |
| metaward_award | CREATE TABLE `metaward_award` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`modified` datetime NOT NULL,
`created` datetime NOT NULL,
`string_id` varchar(20) NOT NULL,
`owner_id` int(11) NOT NULL,
`name` varchar(100) NOT NULL,
`description` longtext NOT NULL,
`owner_points` int(11) NOT NULL,
`url` varchar(500) NOT NULL,
`remote_image` varchar(500) DEFAULT NULL,
`image` varchar(100) NOT NULL,
`parent_award_id` int(11) DEFAULT NULL,
`slug` varchar(110) NOT NULL,
`true_points` double DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `string_id` (`string_id`),
KEY `metaward_award_owner_id` (`owner_id`),
KEY `metaward_award_parent_award_id` (`parent_award_id`),
KEY `metaward_award_slug` (`slug`),
KEY `metaward_award_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=122176 DEFAULT CHARSET=utf8 |
| metaward_achiever | CREATE TABLE `metaward_achiever` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`modified` datetime NOT NULL,
`created` datetime NOT NULL,
`award_id` int(11) NOT NULL,
`alias_id` int(11) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `metaward_achiever_award_id` (`award_id`),
KEY `metaward_achiever_alias_id` (`alias_id`)
) ENGINE=InnoDB AUTO_INCREMENT=77175366 DEFAULT CHARSET=utf8 |
And these in my my.cnf
innodb_file_per_table
innodb_buffer_pool_size = 2048M
innodb_additional_mem_pool_size = 16M
innodb_flush_method=O_DIRECT

As written in Should you move from MyISAM to Innodb ? (which is pretty recent):
Innodb Needs Tuning As a final note about MyISAM to Innodb migration I should mention about Innodb tuning. Innodb needs tuning. Really. MyISAM for many applications can work well with defaults. I’ve seen hundreds of GB databases ran with MyISAM with default settings and it worked reasonably. Innodb needs resources and it will not work well with defaults a lot. Tuning MyISAM from defaults rarely gives more than 2-3 times gain while it can be as much as 10-50 times for Innodb tables in particular for write intensive workloads. Check here for details.
So, about MySQL Innodb Settings, the author wrote in Innodb Performance Optimization Basics:
The most important ones are:
innodb_buffer_pool_size 70-80% of
memory is a safe bet. I set it to 12G
on 16GB box.
UPDATE: If you’re looking for more
details, check out detailed guide on
tuning innodb buffer pool
innodb_log_file_size – This depends on
your recovery speed needs but 256M
seems to be a good balance between
reasonable recovery time and good
performance
innodb_log_buffer_size=4M 4M is good
for most cases unless you’re piping
large blobs to Innodb in this case
increase it a bit.
innodb_flush_log_at_trx_commit=2 If you’re not concern about ACID and can loose transactions for last second or two in case of full OS crash than set this value. It can dramatic effect especially on a lot of short write transactions.
innodb_thread_concurrency=8 Even with current Innodb Scalability Fixes having limited concurrency helps. The actual number may be higher or lower depending on your application and default which is 8 is decent start
innodb_flush_method=O_DIRECT Avoid double buffering and reduce swap pressure, in most cases this setting improves performance. Though be careful if you do not have battery backed up RAID cache as when write IO may suffer.
innodb_file_per_table – If you do not have too many tables use this option, so you will not have uncontrolled innodb main tablespace growth which you can’t reclaim. This option was added in MySQL 4.1 and now stable enough to use.
Also check if your application can run in READ-COMMITED isolation mode – if it does – set it to be default as transaction-isolation=READ-COMMITTED. This option has some performance benefits, especially in locking in 5.0 and even more to come with MySQL 5.1 and row level replication.
Just for the record, the people behind mysqlperformanceblog.com ran a benchmark comparing Falcon, MyISAM and InnoDB. The benchmark was really supposed to be highlighting Falcon, except it was InnoDB that won the day, topping both Falcon and MyISAM in queries per second for almost every test: InnoDB vs MyISAM vs Falcon benchmarks – part 1.

That is a large result set (66,424 rows) that MySQL must manually sort. Try adding an index to metaward_achiever.modified.
There is a limitation with MySQL 4.x that only allows MySQL to use one index per table. Since it is using the index on metaward_achiever.award_id column for the WHERE selection, it cannot also use the index on metaward_achiever.modified for the sort. I hope you're using MySQL 5.x, which may have improved this.
You can see this by doing explain on this simplified query:
SELECT * FROM `metaward_achiever`
WHERE `metaward_achiever`.`award_id` = 1507
ORDER BY `metaward_achiever`.`modified` DESC
LIMIT 100
If you can get this using the indexes for both the WHERE selection and sorting, then you're set.
You could also create a compound index with both metaward_achiever.award_id and metaward_achiever. If MySQL doesn't use it, then you can hint at it or remove the one on just award_id.
Alternatively, if you can get rid of metaward_achiever.id and make metaward_achiever.award_id your primary key and add a key on metaward_achiever.modified, or better yet make metaward_achiever.award_id combined with metaward.modified your primary key, then you'll be really good.
You can try to optimize the file sorting by modifying settings. Unfortunately, I'm not experienced with this, as our DBA handles the configuration, but you might want to check out this great blog:
http://www.mysqlperformanceblog.com/
Here's an article about filesort in particular:
http://s.petrunia.net/blog/?p=24

My guess is that you probably haven't configured your InnoDB settings beyond the defaults. You should do a quick google for setting up your InnoDB options.
The one that caused me the most noticeable performance issues out of the box was innodb_buffer_pool_size. This should be set to 50-80% of your machine's memory. By default it's often only a few MB. Crank it way up, and you should see a noticeable performance increase.
Also take a look at innodb_additional_mem_pool_size.
Start here, but also google around for "innodb performance tuning".

MySQL's query optimizer is not good, from my memory. Try a subselect instead of a straight join.
SELECT * FROM (SELECT * FROM `metaward_achiever`
WHERE `metaward_achiever`.`award_id` = 1507) a
INNER JOIN `metaward_alias` ON (a.`alias_id` = `metaward_alias`.`id`)
ORDER BY a.`modified` DESC
LIMIT 100
Or something like that (untested syntax above).

Sorting is something done by the database server, not the storage engine, in MySQL.
If in both cases, the engine was not able to provide the results in already-sorted form (it depends on the index used), then the server needs to sort them.
The only reason that MyISAM / InnoDB might be different is that the order the rows come back could affect how sorted the data are already - MyISAM could give the data back in "more sorted" order in some cases (and vice versa).
Still, sorting 60k rows is not going to take long as it's a very small data set. Are you sure you've got your sort buffer set big enough?
Using an on-disc filesort() instead of an in-memory one is much slower. The engine should however, not make any difference to this. filesort is not an engine function, but a MySQL core function. filesort does, in fact, suck in quite a lot of ways but it's not normally that slow.

Try adding a key for fields: metaward_achiever.alias_id, metaward_achiever.award_id, and metaward_achiever.modified this will help alot. And do not use keys on varchar fields it will increase time for inserts and updates. Also it seems you have 77M records in achiever table, you may want to care about innodb optimizations. There are lots of good tutors around how to set memory limits for it.

Related

Duplicate entry error when trying to import a dump

I have the following table on a server:
CREATE TABLE routes (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
from_lat double NOT NULL,
from_lng double NOT NULL,
to_lat double NOT NULL,
to_lng double NOT NULL,
distance int(11) unsigned NOT NULL,
drive_time int(11) unsigned NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY route (from_lat,from_lng,to_lat,to_lng)
) ENGINE=InnoDB;
We are saving some routing information from point A (from_lat, from_lng) to point B (to_lat, to_lng). There is a unique index on the coordinates.
However, there are two entries in the database that confuse me:
+----+----------+----------+---------+---------+----------+------------+
| id | from_lat | from_lng | to_lat | to_lng | distance | drive_time |
+----+----------+----------+---------+---------+----------+------------+
| 27 | 52.5333 | 13.1667 | 52.5833 | 13.2833 | 13647 | 1125 |
| 28 | 52.5333 | 13.1667 | 52.5833 | 13.2833 | 13647 | 1125 |
+----+----------+----------+---------+---------+----------+------------+
They are exactly the same.
When I not try to export the database using mysqldump and trying to reimport it, I get an error:
ERROR 1062 (23000): Duplicate entry '52.5333-13.1667-52.5833-13.2833' for key 'route'
How can it be that this is in the database, when there is an unique key on them? Shouldn't MySQL reject them?

Is it possible that the double values are slightly different, but only after the 4th digit?
If you export and import them, they would be the same, and that would give a unique constraint violation.
Quoting from this MySQL bug report:
When mysqldump dumps a DOUBLE value, it uses insufficient precision to
distinguish between some close values (and, presumably, insufficient
precision to recreate the exact values from the original database). If
the DOUBLE value is a primary key or part of a unique index, restoring
the database from this output fails with a duplicate key error.
Try to display them with more digits behind the comma (how will depend on your client.)

PostgreSQL date query performance problems

I have around 5 million rows in a postgres table. I'd like to know how many rows match start_time >= NOW(), but despite having an index on start_time the query is extremely slow (in the order of several hours).
EXPLAIN SELECT COUNT(*) FROM core_event WHERE start_time >= NOW();
Aggregate (cost=449217.81..449217.82 rows=1 width=0)
-> Index Scan using core_event_start_time on core_event (cost=0.00..447750.83 rows=586791 width=0)
Index Cond: (start_time >= now())
Here's the schema information for the table:
id | integer | not null default nextval('core_event_id_seq'::regclass)
source | character varying(100) | not null
external_id | character varying(100) |
title | character varying(250) | not null
location | geometry | not null
start_time | timestamp with time zone |
stop_time | timestamp with time zone |
thumb | character varying(300) |
image | character varying(100) |
image_thumb | character varying(100) |
address | character varying(300) |
description | text |
venue_name | character varying(100) |
website | character varying(300) |
city_id | integer |
category | character varying(100) |
phone | character varying(50) |
place_id | integer |
image_url | character varying(300) |
event_type | character varying(200) |
hidden | boolean | not null
views | integer | not null
added | timestamp with time zone |
I have indexes on the following fields:
city_id
external_id (unique)
location
location_id
place_id
start_time
Is there any easy way for me to speed up the query (eg. a partial index), or am I going to have to resort to partitioning the data by date?

Try adding a partial index like the following:
CREATE INDEX core_event_start_time_recent_idx ON core_event (start_time)
WHERE start_time >= '2011-01-12 0:0'::timestamptz
This will create a comparatively small index. Index creation will take some time, but queries like this one will be much faster thereafter.
SELECT count(*) FROM core_event WHERE start_time >= now();
The effectiveness of this index for queries against now() will degrade slowly over the course of time, depending on how many new rows are coming in. Update (= drop & create) the index with a more recent timestamp occasionally at off hours.
You could automate this with plpgsql function that you call per cronjob or pgAgent.
You might try and see if a running CLUSTER on the table improves things (if it doesn't go against other requirements in your db):
CLUSTER core_event USING core_event_start_time;
Yes, cluster on the full index, not the partial one. This will take a while and needs an exclusive lock, because it effectively rewrites the table. It also effectively vacuums the table fully. Read about it in the manual.
You also may want to increase the statistics target for core_event.start_time;
ALTER core_event ALTER start_time SET STATISTICS 1000; -- example value
The default is just 100. Then:
ANALYZE core_event;
Or course, all the usual performance stuff applies, too.

Do most of these columns get populated for each row? If so, the amount of disk that postgresql has to look at to test rows for liveness even after checking the index will be fairly large. Try for example creating a separate table that only has id and start_time:
create table core_event_start_time as select id, start_time from core_event;
alter table core_event_start_time add primary key(id);
alter table core_event_start_time add foreign key(id) references core_event(id);
create index on core_event_start_time(start_time);
Now see how long it takes to count IDs in core_event_start_time only. Of course, this approach will take up more buffer cache at the expense of space for your actual core_event table...
If it helps, you can add a trigger onto core_event to keep the auxiliary table updated.
(postgresql 9.2 will introduce "index only scans" which may help with this sort of situation, but that's for the future)

Mysql concurrent select and insert slow the database

I have a large mysql table (about 5M rows) on which i frequently insert data.
This table is the same i have to read data from and sometimes the entire database gets slow because of selecting data while there are many pending inserts.
I put indexes on each field i use in the WHERE statment, so i really don't know why select gets so slow.
Could anyone provide me a hint to solve this problem ?
here is the sql of table and query
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_id` int(10) unsigned NOT NULL default '0',
`dest` varchar(20) character set latin1 default NULL,
`body` text character set latin1,
`sent_on` timestamp NOT NULL default CURRENT_TIMESTAMP,
`md5` varchar(32) character set latin1 NOT NULL default '',
`interface` enum('mobile','desktop') default NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `md5` (`md5`),
FULLTEXT KEY `dest` (`dest`,`body`),
FULLTEXT KEY `body` (`body`)
) ENGINE=MyISAM AUTO_INCREMENT=7074256 DEFAULT CHARSET=utf8
and here the query:
EXPLAIN SELECT SQL_CALC_FOUND_ROWS id, sent_on, dest AS who, body,interface FROM messages WHERE user_id = 2 ORDER BY sent_on DESC LIMIT 0,50 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: messages
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 13997
Extra: Using where; Using filesort
1 row in set (0.00 sec)

Note the following in your EXPLAIN output:
Extra: Using where; Using filesort
The Using filesort means that MySQL is dumping the query results to a file to sort it, then reading the results back in to get the top 50 rows.
While I'm no expert, I think that you could optimize this process by providing an index which can both satisfy the selection criteria and sort order all in one go; then the selection and ordering can be determiend by an index scan only, without having to sort the result set every time.
In this case, your WHERE is on user_id, and your ORDER BY is on sent_on. So, in theory, if you provide a single index on those two columns (in that order), then the engine will be able to use the first half of the index to filter the results, and because the second half of the index is on the sent_on column, the index results will already be in order according to that column, allowing MySQL to simply retrieve the first 50 results from that index. No additional sorting required.
Disclaimer: I'm not a DBA. I may be completely wrong.
See Also: Mysql.com: Multiple Column Indexes

Maybe you have disabled Concurrent Inserts?

Could the ORDER BY be slowing you down? I don't know if its a good idea to index sent_on, it would depend on SELECT vs INSERT frequency

Optimize sql query

Is it possable to optimize this query?
SELECT count(locId) AS antal , locId
FROM `geolitecity_block`
WHERE (1835880985>= startIpNum AND 1835880985 <= endIpNum)
OR (1836875969>= startIpNum AND 1836875969 <= endIpNum)
OR (1836878754>= startIpNum AND 1836878754 <= endIpNum)
...
...
OR (1843488110>= startIpNum AND 1843488110 <= endIpNum)
GROUP BY locId ORDER BY antal DESC LIMIT 100
The table looks like this
CREATE TABLE IF NOT EXISTS `geolitecity_block` (
`startIpNum` int(11) unsigned NOT NULL,
`endIpNum` int(11) unsigned NOT NULL,
`locId` int(11) unsigned NOT NULL,
PRIMARY KEY (`startIpNum`),
KEY `locId` (`locId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
UPDATE
and the explain-query looks like this
+----+-------------+-------------------+-------+---------------+-------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+-------+---------+------+------+----------------------------------------------+
| 1 | SIMPLE | geolitecity_block | index | PRIMARY | locId | 4 | NULL | 108 | Using where; Using temporary; Using filesort |
+----+-------------+-------------------+-------+---------------+-------+---------+------+------+----------------------------------------------+

To optimize performance, create an index on startIpNum and endIpNum.
CREATE INDEX index_startIpNum ON geolitecity_block (startIpNum);
CREATE INDEX index_endIpNum ON geolitecity_block (endIpNum);

Indexing columns that are being grouped or sorted on will almost always improve performance. I would suggest plugging this query into the DTA (Database Tuning Advisor) to see if SQL can make any suggestions, this might include the creation of one or more indexes in addition to statistics.

If it is possible in your use case, create a temporary table TMP_RESULT (remove order) and than submit a second query that orders results by antal. Filesort is extremely slow and -- in your case -- you can not avoid this operation, because you do not sort by any of keys/indices. To perform count operation, you have to scan complete table. A temporary table is a much faster solution.
ps. Adding an index on (startIpNum, endIpNum) definitely will help you to get better performance but -- if you have a lot of rows -- it will not be a huge improvement.

MySQL: Which indexes to use for a simple range select?

I have a table with ~30 million rows ( and growing! ) and currently i have some problems with a simple range select.
The query, looks like this one:
SELECT SUM( CEIL( dlvSize / 100 ) ) as numItems
FROM log
WHERE timeLogged BETWEEN 1000000 AND 2000000
AND user = 'example'</pre>
It takes minutes to finish and i think that the solution would be at the indexes that i'm using. Here is the result of explain:
+----+-------------+-------+-------+---------------------------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------------+---------+---------+------+----------+-------------+
| 1 | SIMPLE | log | range | PRIMARY,timeLogged | PRIMARY | 4 | NULL | 11839754 | Using where |
+----+-------------+-------+-------+---------------------------------+---------+---------+------+----------+-------------+
My table structure is this one ( reduced to make it fit better on the problem ):
CREATE TABLE IF NOT EXISTS `log` (
`origDomain` varchar(64) NOT NULL default '0',
`timeLogged` int(11) NOT NULL default '0',
`orig` varchar(128) NOT NULL default '',
`rcpt` varchar(128) NOT NULL default '',
`dlvSize` varchar(255) default NULL,
`user` varchar(255) default NULL,
PRIMARY KEY (`timeLogged`,`orig`,`rcpt`),
KEY `timeLogged` (`timeLogged`),
KEY `orig` (`orig`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Any ideas of what can I do to optimize this query or indexes on my table?

You may want to try adding a composite index on (user, timeLogged):
CREATE TABLE IF NOT EXISTS `log` (
...
KEY `user_timeLogged` (user, timeLogged),
...
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Related Stack Overflow post:
Database: When should I use a composite index?

In addition to the suggestions made by the other answers, I note that you have a column user in the table which is a varchar(255). If this refers to a column in a table of users, then 1) it would most likely to far more efficient to add an integer ID column to that table, and use that as the primary key and as a referencing column in other tables; 2) you are using InnoDB, so why not take advantage of the foreign key capabilities it offers?
Consider that if you index by a varchar(n) column, it is treated like a char(n) in the index, so each row of your current primary key takes up 4 + 128 + 128 = 260 bytes in the index.

Add an index on user.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas