MySQL query slow when selecting VARCHAR - sql

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?

The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.

Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.

I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;

TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

Related

MS Access last() results are sometimes wrong

I have an MS Access query that uses last() but sometimes it doesn't work as expected--which I know is what's expected lol. But I need to find a solution, either in Access or by converting the below query to MySQL. Any suggestions?
SELECT maindata.TrendShort, Last(maindata.Resistance) AS LastOfResistance, Last(maindata.Support) AS LastOfSupport, Count(maindata.ID) AS Days, Max(maindata.Datestamp) AS Datestamp, maindata.ProductID
FROM market_opinion AS maindata
WHERE (((Exists (select * from market_opinion action_count where maindata.ProductID = action_count.ProductID and maindata.Datestamp < action_count.Datestamp and maindata.TrendShort<> action_count.TrendShort))=False))
GROUP BY maindata.TrendShort, maindata.ProductID
ORDER BY Count(maindata.ID) DESC;
Only the LastOfResistence and LastOfSupport are occasionally wrong, the other fields are always correct.
CREATE TABLE `market_opinion` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`ProductID` int(11) DEFAULT NULL,
`Trend` varchar(11) DEFAULT NULL,
`TrendShort` varchar(7) DEFAULT NULL,
`Resistance` decimal(9,2) unsigned DEFAULT NULL,
`Support` decimal(9,2) unsigned DEFAULT NULL,
`Username` varchar(12) DEFAULT NULL,
`Datestamp` date DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `ProductID` (`ProductID`),
KEY `Datestamp` (`Datestamp`),
KEY `TrendShort` (`TrendShort`)
) ENGINE=InnoDB AUTO_INCREMENT=9536 DEFAULT CHARSET=utf8;
Without some feel for the data, this becomes something of a guess, but what I'm thinking is that the Last(Resistance) and the Last(Support) aren't necessarily pulling from the same record as Max(DateStamp). You might try breaking your query out into a 2 part query, such as:
Select maindata.TrendShort, Resistance, Support, COUNT(ID) as Days, ProductId
FROM market_opinion maindata
INNER JOIN (SELECT mo.TrendShort, MAX(mo.DateStamp) AS MaxDate
FROM market_opinion mo
WHERE (((EXISTS(SELECT ...))=FALSE))
GROUP BY mo.TrendShort) inner
WHERE maindata.TrendShort=inner.TrendSort AND maindata.DateStamp = inner.MaxDate
ORDER BY Days DESC;
I've left out the bulk of your query where the ellipsis (...) is, I wouldn't expect any changes there. You might consider looking at http://www.access-programmers.co.uk/forums/showthread.php?t=42291 for a discussion of first/last vs min/max. Let me know if this gets you any closer. If not, maybe some sample data where it's not working out. It might give some insight into what's going on.
First or Last just returns some (arbitrary) record that is not the last or first respectively.
In most cases, simply use Min for the first and Max for the last.

How to make the Primary Key have X digits in PostgreSQL?

I am fairly new to SQL but have been working hard to learn. I am currently stuck on an issue with setting a primary key to have 8 digits no matter what.
I tried using INT(8) but that didn't work. Also AUTO_INCREMENT doesn't work in PostgreSQL but I saw there were a couple of data types that auto increment but I still have the issue of the keys not being long enough.
Basically I want to have numbers represent User IDs, starting at 10000000 and moving up. 00000001 and up would work too, it doesn't matter to me.
I saw an answer that was close to this, but it didn't apply to PostgreSQL unfortunately.
Hopefully my question makes sense, if not I'll try to clarify.
My code (which I am using from a website to try and make my own forum for a practice project) is:
CREATE Table users (
user_id INT(8) NOT NULL AUTO_INCREMENT,
user_name VARCHAR(30) NOT NULL,
user_pass VARCHAR(255) NOT NULL,
user_email VARCHAR(255) NOT NULL,
user_date DATETIME NOT NULL,
user_level INT(8) NOT NULL,
UNIQUE INDEX user_name_unique (user_name),
PRIMARY KEY (user_id)
) TYPE=INNODB;
It doesn't work in PostgreSQL (9.4 Windows x64 version). What do I do?
You are mixing two aspects:
the data type allowing certain values for your PK column
the format you chose for display
AUTO_INCREMENT is a non-standard concept of MySQL, SQL Server uses IDENTITY(1,1), etc.
Use a serial column in Postgres:
CREATE TABLE users (
user_id serial PRIMARY KEY
, ...
)
That's a pseudo-type implemented as integer data type with a column default drawing from an attached SEQUENCE. integer is easily big enough for your case (-2147483648 to +2147483647).
If you really need to enforce numbers with a maximum of 8 decimal digits, add a CHECK constraint:
CONSTRAINT id_max_8_digits CHECK (user_id BETWEEN 0 AND < 99999999)
To display the number in any fashion you desire - 0-padded to 8 digits, for your case, use to_char():
SELECT to_char(user_id, '00000000') AS user_id_8digit
FROM users;
That's very fast. Note that the output is text now, not integer.
SQL Fiddle.
A couple of other things are MySQL-specific in your code:
int(8): use int.
datetime: use timestamp.
TYPE=INNODB: just drop that.
You could make user_id a serial type column and set the seed of this sequence to 10000000.
Why?
int(8) in mysql doesn't actually only store 8 digits, it only displays 8 digits
Postgres supports check constraints. You could use something like this:
create table foo (
bar_id int primary key check ( 9999999 < bar_id and bar_id < 100000000 )
);
If this is for numbering important documents like invoices that shouldn't have gaps, then you shouldn't be using sequences / auto_increment

Mysql concurrent select and insert slow the database

I have a large mysql table (about 5M rows) on which i frequently insert data.
This table is the same i have to read data from and sometimes the entire database gets slow because of selecting data while there are many pending inserts.
I put indexes on each field i use in the WHERE statment, so i really don't know why select gets so slow.
Could anyone provide me a hint to solve this problem ?
here is the sql of table and query
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_id` int(10) unsigned NOT NULL default '0',
`dest` varchar(20) character set latin1 default NULL,
`body` text character set latin1,
`sent_on` timestamp NOT NULL default CURRENT_TIMESTAMP,
`md5` varchar(32) character set latin1 NOT NULL default '',
`interface` enum('mobile','desktop') default NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `md5` (`md5`),
FULLTEXT KEY `dest` (`dest`,`body`),
FULLTEXT KEY `body` (`body`)
) ENGINE=MyISAM AUTO_INCREMENT=7074256 DEFAULT CHARSET=utf8
and here the query:
EXPLAIN SELECT SQL_CALC_FOUND_ROWS id, sent_on, dest AS who, body,interface FROM messages WHERE user_id = 2 ORDER BY sent_on DESC LIMIT 0,50 \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: messages
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 13997
Extra: Using where; Using filesort
1 row in set (0.00 sec)
Note the following in your EXPLAIN output:
Extra: Using where; Using filesort
The Using filesort means that MySQL is dumping the query results to a file to sort it, then reading the results back in to get the top 50 rows.
While I'm no expert, I think that you could optimize this process by providing an index which can both satisfy the selection criteria and sort order all in one go; then the selection and ordering can be determiend by an index scan only, without having to sort the result set every time.
In this case, your WHERE is on user_id, and your ORDER BY is on sent_on. So, in theory, if you provide a single index on those two columns (in that order), then the engine will be able to use the first half of the index to filter the results, and because the second half of the index is on the sent_on column, the index results will already be in order according to that column, allowing MySQL to simply retrieve the first 50 results from that index. No additional sorting required.
Disclaimer: I'm not a DBA. I may be completely wrong.
See Also: Mysql.com: Multiple Column Indexes
Maybe you have disabled Concurrent Inserts?
Could the ORDER BY be slowing you down? I don't know if its a good idea to index sent_on, it would depend on SELECT vs INSERT frequency

MySQL query optimisation

I have a database table that stores imported information. For simplicity, its something like:
CREATE TABLE `data_import` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`amount` DECIMAL(12,2) NULL DEFAULT NULL,
`payee` VARCHAR(50) NULL DEFAULT NULL,
`posted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
INDEX `payee` (`payee`)
)
I also have a table that stores import rules:
CREATE TABLE `import_rules` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`search` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `search` (`search`)
)
The idea is that for each imported transaction, the query needs to try find a single matching rule - this match is done on the data_import.payee and import_rules.seach fields. Because these are both varchar fields, I have indexed them in the hope of making the query faster.
This is what I have come up with so far, which seems to work fine. Albeit slower than I hoped.
SELECT i.id, i.payee, i.amount, i.posted r.id, r.search
FROM import_data id
LEFT JOIN import_rules ir on REPLACE(i.payee, ' ', '') = REPLACE(ir.search, ' ', '')
One thing that the above query does not cater for, is that if import_data.posted = 1, then I dont need to find a rule for that line - is it possible to stop the query joining on that particular row? Similarly, if the payee is null, then it shouldn't try join either.
Are there any other ways that I can optimise this? I realise that doing text joins is not ideal...not sure if there are any better methods.
I highly recommend doing anything you can to get rid of the REPLACEs in that JOIN. Using REPLACE on both sides of the join totally eliminate the ability to use an index on either table.
Assuming you can get rid of the REPLACEs (by cleansing the existing data and/or new data):
If you need to join on text
columns, use a single byte per
character charset if you application
allows for it (for a smaller/faster index).
Make the N in VARCHAR(N) as small
as you can as it will affect the side
of the index (or arguably, use index
prefixes).
I imagine you want to make the
search index on import_rules
UNIQUE -- then you're sure to only
going to get 1 row result returned per row of
import_data
You can throw an AND into your WHERE clause if you'd like to enforce your 'don't join in this case' rule.
LEFT JOIN import_rules ir ON id.payee=ir.search AND id.posted != 1
The use of REPLACE() on the join is probably breaking the indexing, as it has an index of the values in the field, not the amended values after REPLACE().
As for not joining, you are already using a LEFT JOIN, so non-matching joins will result in NULLs for the import_rules fields; you should be able to add WHERE clauses to force that.

MySQL 1 millon row query speed

I'm having trouble getting a decent query time out of a large MySQL table, currently its taking over 20 seconds. The problem lies in the GROUP BY as MySQL needs to run a filesort but I don't see how I can get around this
QUERY:
SELECT play_date, COUNT(DISTINCT(email)) AS count
FROM log
WHERE type = 'play'
AND play_date BETWEEN '2009-02-23'
AND '2009-02-24'
GROUP BY play_date
ORDER BY play_date desc
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL type,type_2 NULL NULL NULL 530892 Using where; Using filesort
TABLE STRUCTURE
CREATE TABLE IF NOT EXISTS `log` (
`id` int(11) NOT NULL auto_increment,
`email` varchar(255) NOT NULL,
`type` enum('played','reg','friend') NOT NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP,
`play_date` date NOT NULL,
`email_refer` varchar(255) NOT NULL,
`remote_addr` varchar(15) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `type` (`type`),
KEY `email_refer` (`email_refer`),
KEY `type_2` (`type`,`timestamp`,`play_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=707859 ;
If anyone knows how I could improve the speed I would be very greatful
Tom
EDIT
I've added the new index with just play_date and type but MySQL refuses to use it
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE log ALL play_date NULL NULL NULL 801647 Using where; Using filesort
This index was created using ALTER TABLE log ADD INDEX (type, play_date);
You need to create index on fields type AND play_date.
Like this:
ALTER TABLE `log` ADD INDEX (`type`, `play_date`);
Or, alternately, you can rearrange your last key like this:
KEY `type_2` (`type`,`play_date`,`timestamp`)
so MySQL can use its left part as a key.
You should add an index on the fields that you base your search on.
In your case it play_date and type
You're not taking advantage of the key named type_2. It is a composite key for type, timestamp and play_date, but you're filtering by type and play_date, ignoring timestamp. Because of this, the engine can't make use of that key.
You should create an index on the fields type and play_date, or remove timestamp from the key type_2.
Or you could try to incorporate timestamp into your current query as a filter. But judging from your current query I don't think that is logical.
Does there need to be an index on play_date, or move the position in the composite index to second place?
The fastest options would be this
ALTER TABLE `log` ADD INDEX (`type`, `play_date`, 'email');
It would turn this index into a "covering index", which would mean that the query would only access the index stored in memory and not even goto the hard disk.
The DESC parameter is causing MySQL not to use the index for the ORDER BY. You can leave it ASC and iterate the resultset in reverse on the client side (?).