MySQL 1 millon row query speed - sql

I'm having trouble getting a decent query time out of a large MySQL table, currently its taking over 20 seconds. The problem lies in the GROUP BY as MySQL needs to run a filesort but I don't see how I can get around this
QUERY:
SELECT play_date, COUNT(DISTINCT(email)) AS count
FROM log
WHERE type = 'play'
AND play_date BETWEEN '2009-02-23'
AND '2009-02-24'
GROUP BY play_date
ORDER BY play_date desc
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL type,type_2 NULL NULL NULL 530892 Using where; Using filesort
TABLE STRUCTURE
CREATE TABLE IF NOT EXISTS `log` (
`id` int(11) NOT NULL auto_increment,
`email` varchar(255) NOT NULL,
`type` enum('played','reg','friend') NOT NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP,
`play_date` date NOT NULL,
`email_refer` varchar(255) NOT NULL,
`remote_addr` varchar(15) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `type` (`type`),
KEY `email_refer` (`email_refer`),
KEY `type_2` (`type`,`timestamp`,`play_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=707859 ;
If anyone knows how I could improve the speed I would be very greatful
Tom
EDIT
I've added the new index with just play_date and type but MySQL refuses to use it
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL play_date NULL NULL NULL 801647 Using where; Using filesort
This index was created using ALTER TABLE log ADD INDEX (type, play_date);

You need to create index on fields type AND play_date.
Like this:
ALTER TABLE `log` ADD INDEX (`type`, `play_date`);
Or, alternately, you can rearrange your last key like this:
KEY `type_2` (`type`,`play_date`,`timestamp`)
so MySQL can use its left part as a key.

You should add an index on the fields that you base your search on.
In your case it play_date and type

You're not taking advantage of the key named type_2. It is a composite key for type, timestamp and play_date, but you're filtering by type and play_date, ignoring timestamp. Because of this, the engine can't make use of that key.
You should create an index on the fields type and play_date, or remove timestamp from the key type_2.
Or you could try to incorporate timestamp into your current query as a filter. But judging from your current query I don't think that is logical.

Does there need to be an index on play_date, or move the position in the composite index to second place?

The fastest options would be this
ALTER TABLE `log` ADD INDEX (`type`, `play_date`, 'email');
It would turn this index into a "covering index", which would mean that the query would only access the index stored in memory and not even goto the hard disk.

The DESC parameter is causing MySQL not to use the index for the ORDER BY. You can leave it ASC and iterate the resultset in reverse on the client side (?).

Related

SQLite speed up select with collate nocase

I have SQLite db:
CREATE TABLE IF NOT EXISTS Commits
(
GlobalVer INTEGER PRIMARY KEY,
Data blob NOT NULL
) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
I want to make 1 select:
SELECT Commits.Data
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ?
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
after that i want to make another select:
SELECT Commits.Data,Streams.Name
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ? COLLATE NOCASE
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
The problem is that 2nd select works super slow. I think this is because COLLATE NOCASE. I want to speed up it. I tried to add index but it doesn't help (may be i did sometinhg wrong?). How to execute 2nd query with speed approximately equals to 1st query's speed?
An index can be used to speed up a search only if it uses the same collation as the query.
By default, an index takes the collation from the table column, so you could change the table definition:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
However, this would make the first query slower.
To speed up both queries, you need two indexes, one for each collation. So to use the default collation for the implicit index, and NOCASE for the explicit index:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS Streams_nocase_idx ON Streams(Name COLLATE NOCASE, GlobalVar);
(Adding the second column to the index speeds up the ORDER BY in this query.)

In H2 Database, add index while table creation in single query

I am trying to create table having different indexes with single query but H2 gives Error for example:
create table tbl_Cust
(
id int primary key auto_increment not null,
fid int,
c_name varchar(50),
INDEX (fid)
);
but this gives error as
Unknown data type: "("; SQL statement:
[Error Code: 50004]
[SQL State: HY004]
Due to this I have to run 2 different queries to create table with Index. First query to create table and then second query to add index with
create INDEX c_fid on tbl_Cust(fid);
Is there something wrong in my query or H2 simply does not support this creation of table with index in single query?
Interesting question. The solution is even more interesting, as it involves MySQL compatibility mode.
It's actually possible to perform the exact same command you wrote without any modification, provided you just add to your jdbc url the MySQL mode.
Example URL like this: jdbc:h2:mem:;mode=mysql
SQL remains:
create table tbl_Cust
(
id int primary key auto_increment not null,
fid int,
c_name varchar(50),
INDEX (fid)
);
Update count: 0
(15 ms)
Too bad I did not see this question earlier... Hopefully the solution might become handy one day to someone :-)
I could resolve the problem. According to
http://www.h2database.com/html/grammar.html#create_index
I modified the query. It works fine with my H2 server.
CREATE TABLE subscription_validator (
application_id int(11) NOT NULL,
api_id int(11) NOT NULL,
validator_id int(11) NOT NULL,
PRIMARY KEY (application_id,api_id),
CONSTRAINT subscription_validator_ibfk_1 FOREIGN KEY (validator_id) REFERENCES validator (id) ON UPDATE CASCADE
);
CREATE INDEX validator_id ON subscription_validator(validator_id);

MySQL query slow when selecting VARCHAR

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?
The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.
Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.
I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;
TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

Mysql concurrent select and insert slow the database

I have a large mysql table (about 5M rows) on which i frequently insert data.
This table is the same i have to read data from and sometimes the entire database gets slow because of selecting data while there are many pending inserts.
I put indexes on each field i use in the WHERE statment, so i really don't know why select gets so slow.
Could anyone provide me a hint to solve this problem ?
here is the sql of table and query
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_id` int(10) unsigned NOT NULL default '0',
`dest` varchar(20) character set latin1 default NULL,
`body` text character set latin1,
`sent_on` timestamp NOT NULL default CURRENT_TIMESTAMP,
`md5` varchar(32) character set latin1 NOT NULL default '',
`interface` enum('mobile','desktop') default NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `md5` (`md5`),
FULLTEXT KEY `dest` (`dest`,`body`),
FULLTEXT KEY `body` (`body`)
) ENGINE=MyISAM AUTO_INCREMENT=7074256 DEFAULT CHARSET=utf8
and here the query:
EXPLAIN SELECT SQL_CALC_FOUND_ROWS id, sent_on, dest AS who, body,interface FROM messages WHERE user_id = 2 ORDER BY sent_on DESC LIMIT 0,50 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: messages
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 13997
Extra: Using where; Using filesort
1 row in set (0.00 sec)
Note the following in your EXPLAIN output:
Extra: Using where; Using filesort
The Using filesort means that MySQL is dumping the query results to a file to sort it, then reading the results back in to get the top 50 rows.
While I'm no expert, I think that you could optimize this process by providing an index which can both satisfy the selection criteria and sort order all in one go; then the selection and ordering can be determiend by an index scan only, without having to sort the result set every time.
In this case, your WHERE is on user_id, and your ORDER BY is on sent_on. So, in theory, if you provide a single index on those two columns (in that order), then the engine will be able to use the first half of the index to filter the results, and because the second half of the index is on the sent_on column, the index results will already be in order according to that column, allowing MySQL to simply retrieve the first 50 results from that index. No additional sorting required.
Disclaimer: I'm not a DBA. I may be completely wrong.
See Also: Mysql.com: Multiple Column Indexes
Maybe you have disabled Concurrent Inserts?
Could the ORDER BY be slowing you down? I don't know if its a good idea to index sent_on, it would depend on SELECT vs INSERT frequency

Why my mysql doesn't use index?

I'm tuning my query for mysql.
the schema has index of user_id (following..)
but the index is not used. why?
Env:
MySQL4.0.27,MyISAM
SQL is the following :
SELECT type,SUM(value_a) A, SUM(value_b) B, SUM(value_c) C
FROM big_record_table
WHERE user_id='<user_id>'
GROUP BY type
Explain:
|table |type |possible_keys |key |key_len |ref |rows |Extra|
|big_record_table| ALL| user_id_key|||| 1059756 |Using where; Using temporary; Using filesort|
could you describe detail?
scheme is following:
CREATE TABLE `big_record_table` (
`user_id` int(11) NOT NULL default '0',
`type` enum('type_a','type_b','type_c') NOT NULL default 'type_a',
`value_a` bigint(20) NOT NULL default '0',
`value_b` bigint(20) default NULL,
`value_c` bigint(20) NOT NULL default '0',
KEY `user_id_key` (`user_id`)
) TYPE=MyISAM
My guess is that type and user_id are not indexed.
Just a will run. You're not giving much to play with.
First, we don't see how your indexes are declared. Can you get a dump of the tables? In PostgreSQL you'd use pg_dump but I don't know how in MySQL. Have you done an ANALYZE on the table?
It could be that implicit type conversion is preventing your index from being used. You have defined user_id as an int, but specified a string in the query. This gives MySQL the option of either converting the string in the query into an int (which might not be accurate) - or convert every user_id in the database into a string to compare against the string in the query.
Short answer: try removing the quotes in the query
SELECT type,SUM(value_a) A, SUM(value_b) B, SUM(value_c) C
FROM big_record_table
WHERE user_id=123
GROUP BY type
(where 123 is replaced with the correct user-id).