I have a large mysql table (about 5M rows) on which i frequently insert data.
This table is the same i have to read data from and sometimes the entire database gets slow because of selecting data while there are many pending inserts.
I put indexes on each field i use in the WHERE statment, so i really don't know why select gets so slow.
Could anyone provide me a hint to solve this problem ?
here is the sql of table and query
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_id` int(10) unsigned NOT NULL default '0',
`dest` varchar(20) character set latin1 default NULL,
`body` text character set latin1,
`sent_on` timestamp NOT NULL default CURRENT_TIMESTAMP,
`md5` varchar(32) character set latin1 NOT NULL default '',
`interface` enum('mobile','desktop') default NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `md5` (`md5`),
FULLTEXT KEY `dest` (`dest`,`body`),
FULLTEXT KEY `body` (`body`)
) ENGINE=MyISAM AUTO_INCREMENT=7074256 DEFAULT CHARSET=utf8
and here the query:
EXPLAIN SELECT SQL_CALC_FOUND_ROWS id, sent_on, dest AS who, body,interface FROM messages WHERE user_id = 2 ORDER BY sent_on DESC LIMIT 0,50 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: messages
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 13997
Extra: Using where; Using filesort
1 row in set (0.00 sec)
Note the following in your EXPLAIN output:
Extra: Using where; Using filesort
The Using filesort means that MySQL is dumping the query results to a file to sort it, then reading the results back in to get the top 50 rows.
While I'm no expert, I think that you could optimize this process by providing an index which can both satisfy the selection criteria and sort order all in one go; then the selection and ordering can be determiend by an index scan only, without having to sort the result set every time.
In this case, your WHERE is on user_id, and your ORDER BY is on sent_on. So, in theory, if you provide a single index on those two columns (in that order), then the engine will be able to use the first half of the index to filter the results, and because the second half of the index is on the sent_on column, the index results will already be in order according to that column, allowing MySQL to simply retrieve the first 50 results from that index. No additional sorting required.
Disclaimer: I'm not a DBA. I may be completely wrong.
See Also: Mysql.com: Multiple Column Indexes
Maybe you have disabled Concurrent Inserts?
Could the ORDER BY be slowing you down? I don't know if its a good idea to index sent_on, it would depend on SELECT vs INSERT frequency
Related
I have SQLite db:
CREATE TABLE IF NOT EXISTS Commits
(
GlobalVer INTEGER PRIMARY KEY,
Data blob NOT NULL
) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
I want to make 1 select:
SELECT Commits.Data
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ?
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
after that i want to make another select:
SELECT Commits.Data,Streams.Name
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ? COLLATE NOCASE
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
The problem is that 2nd select works super slow. I think this is because COLLATE NOCASE. I want to speed up it. I tried to add index but it doesn't help (may be i did sometinhg wrong?). How to execute 2nd query with speed approximately equals to 1st query's speed?
An index can be used to speed up a search only if it uses the same collation as the query.
By default, an index takes the collation from the table column, so you could change the table definition:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
However, this would make the first query slower.
To speed up both queries, you need two indexes, one for each collation. So to use the default collation for the implicit index, and NOCASE for the explicit index:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS Streams_nocase_idx ON Streams(Name COLLATE NOCASE, GlobalVar);
(Adding the second column to the index speeds up the ORDER BY in this query.)
We are using mysql as message queue, now we want to change from mysql to redis.
We are facing few difficulties to implement the same logic of mysql in redis.
In Mysql the process as follows:
Bulk insert into mysql database using load data infile.
in other php script we select the records order by priority with conditions and delete those records from database and process those records
How we can achieve same in redis?
in redis we are able to insert bulk data using pipe with lpush like key and json_encoded data
how can we get the data from redis key order by priority with some conditions and delete those records from redis?
We have table structure in mysql as below:
CREATE TABLE `message_queue` (
`sql_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`msgdata` text,
`pid` tinyint(4) DEFAULT NULL,
`receiver` varchar(20) DEFAULT NULL,
`time` bigint(20) DEFAULT '0',
`udhdata` varchar(100) DEFAULT NULL,
PRIMARY KEY (`sql_id`),
KEY `pid` (`pid`),
KEY `time` (`time`)
) ENGINE=MyISAM
SELECT and delete queries
SELECT * FROM message_queue WHERE (time = 0 OR time <= UNIX_TIMESTAMP()) ORDER BY pid DESC, sql_id ASC limit 500;
DELETE FROM message_queue WHERE sql_id in(ids_list)
To satisfy most of the requirements you'll need to decompose your data into several data structures.
Insert:
Every record from your message_queue table should be stored in a Hash - sql_id looks like a good key name candidate.
Keep a Sorted Set, e.g. message_by_time where each member is the sql_id and the score is time.
Query:
Use ZRANGEBYSCORE message_by_time 0 0 and ZRANGEBYSCORE message_by_time -inf <replace-with-timestamp> to get the initial range.
Delete:
Call DEL and ZREM for each element.
Your requirements also specify the need to sort and limit - these can also be done in Redis, but I recommend handling them in the code instead.
I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?
The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.
Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.
I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;
TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.
I have a database table that stores imported information. For simplicity, its something like:
CREATE TABLE `data_import` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`amount` DECIMAL(12,2) NULL DEFAULT NULL,
`payee` VARCHAR(50) NULL DEFAULT NULL,
`posted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
INDEX `payee` (`payee`)
)
I also have a table that stores import rules:
CREATE TABLE `import_rules` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`search` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `search` (`search`)
)
The idea is that for each imported transaction, the query needs to try find a single matching rule - this match is done on the data_import.payee and import_rules.seach fields. Because these are both varchar fields, I have indexed them in the hope of making the query faster.
This is what I have come up with so far, which seems to work fine. Albeit slower than I hoped.
SELECT i.id, i.payee, i.amount, i.posted r.id, r.search
FROM import_data id
LEFT JOIN import_rules ir on REPLACE(i.payee, ' ', '') = REPLACE(ir.search, ' ', '')
One thing that the above query does not cater for, is that if import_data.posted = 1, then I dont need to find a rule for that line - is it possible to stop the query joining on that particular row? Similarly, if the payee is null, then it shouldn't try join either.
Are there any other ways that I can optimise this? I realise that doing text joins is not ideal...not sure if there are any better methods.
I highly recommend doing anything you can to get rid of the REPLACEs in that JOIN. Using REPLACE on both sides of the join totally eliminate the ability to use an index on either table.
Assuming you can get rid of the REPLACEs (by cleansing the existing data and/or new data):
If you need to join on text
columns, use a single byte per
character charset if you application
allows for it (for a smaller/faster index).
Make the N in VARCHAR(N) as small
as you can as it will affect the side
of the index (or arguably, use index
prefixes).
I imagine you want to make the
search index on import_rules
UNIQUE -- then you're sure to only
going to get 1 row result returned per row of
import_data
You can throw an AND into your WHERE clause if you'd like to enforce your 'don't join in this case' rule.
LEFT JOIN import_rules ir ON id.payee=ir.search AND id.posted != 1
The use of REPLACE() on the join is probably breaking the indexing, as it has an index of the values in the field, not the amended values after REPLACE().
As for not joining, you are already using a LEFT JOIN, so non-matching joins will result in NULLs for the import_rules fields; you should be able to add WHERE clauses to force that.
I'm having trouble getting a decent query time out of a large MySQL table, currently its taking over 20 seconds. The problem lies in the GROUP BY as MySQL needs to run a filesort but I don't see how I can get around this
QUERY:
SELECT play_date, COUNT(DISTINCT(email)) AS count
FROM log
WHERE type = 'play'
AND play_date BETWEEN '2009-02-23'
AND '2009-02-24'
GROUP BY play_date
ORDER BY play_date desc
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL type,type_2 NULL NULL NULL 530892 Using where; Using filesort
TABLE STRUCTURE
CREATE TABLE IF NOT EXISTS `log` (
`id` int(11) NOT NULL auto_increment,
`email` varchar(255) NOT NULL,
`type` enum('played','reg','friend') NOT NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP,
`play_date` date NOT NULL,
`email_refer` varchar(255) NOT NULL,
`remote_addr` varchar(15) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `type` (`type`),
KEY `email_refer` (`email_refer`),
KEY `type_2` (`type`,`timestamp`,`play_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=707859 ;
If anyone knows how I could improve the speed I would be very greatful
Tom
EDIT
I've added the new index with just play_date and type but MySQL refuses to use it
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL play_date NULL NULL NULL 801647 Using where; Using filesort
This index was created using ALTER TABLE log ADD INDEX (type, play_date);
You need to create index on fields type AND play_date.
Like this:
ALTER TABLE `log` ADD INDEX (`type`, `play_date`);
Or, alternately, you can rearrange your last key like this:
KEY `type_2` (`type`,`play_date`,`timestamp`)
so MySQL can use its left part as a key.
You should add an index on the fields that you base your search on.
In your case it play_date and type
You're not taking advantage of the key named type_2. It is a composite key for type, timestamp and play_date, but you're filtering by type and play_date, ignoring timestamp. Because of this, the engine can't make use of that key.
You should create an index on the fields type and play_date, or remove timestamp from the key type_2.
Or you could try to incorporate timestamp into your current query as a filter. But judging from your current query I don't think that is logical.
Does there need to be an index on play_date, or move the position in the composite index to second place?
The fastest options would be this
ALTER TABLE `log` ADD INDEX (`type`, `play_date`, 'email');
It would turn this index into a "covering index", which would mean that the query would only access the index stored in memory and not even goto the hard disk.
The DESC parameter is causing MySQL not to use the index for the ORDER BY. You can leave it ASC and iterate the resultset in reverse on the client side (?).