Why is my query not using this index? - optimization

I have a query I turned into a view that works OK. But the site_phrase sp table seems to be not using a column and goes through all the records in the table. Why is that? Here is the query:
EXPLAIN SELECT
`p`.`id` AS `id`,
`p`.`text` AS `phrase`,
`p`.`ignored` AS `ignored_phrase`,
`p`.`client_id` AS `client_id`,
`s`.`id` AS `site_id`,
`s`.`sub_domain` AS `sub_domain`,
`s`.`competitor` AS `competitor`,
`s`.`ignored` AS `ignored_site`,
`pc`.`id` AS `pc_id`,
`pc`.`name` AS `pc_name`,
`psc`.`id` AS `psc_id`,
`psc`.`name` AS `psc_name`,
`p`.`volume` AS `volume`,
MIN(`pos`.`position`) AS `position`,
`pos`.`id` AS `pos_id`
FROM `client` c
JOIN client_site cs ON cs.client_id = c.id
JOIN site s ON s.id = cs.site_id
JOIN site_phrase sp ON sp.site_id = s.id
JOIN phrase p ON p.id = sp.phrase_id
JOIN `position` pos ON pos.phrase_id = sp.phrase_id
AND pos.site_id = sp.site_id
LEFT JOIN `phrase_sub_category` `psc`
ON `psc`.`id` = `p`.`phrase_sub_category_id`
LEFT JOIN `phrase_category` `pc`
ON `pc`.`id` = `psc`.`phrase_category_id`
GROUP BY `p`.`id`,`s`.`id`,`serp`.`id`
ORDER BY `p`.`id`,`pos`.`position`
And here is a screenshot of the output the above query gets when I EXPLAIN / DESCRIBE it
http://img827.imageshack.us/img827/3336/indexsql.png
No matter how I alter the order of the tables above and how they are joined, the 1st or 2nd table always seems to be doing some sort of table scan. In the example of the screenshot, the problem table is sp. These tables are innoDB type, and there is appropriate indexes and foreign keys on all the tables I JOIN on. Any insights would be helpful.

MySQL will use a full table scan if it determines that it is faster than a using index. In the case of your SP table - with only 1300 records the table scan may be just as fast as a index.

Related

Query performance of non clustered index during inner join

We have two table in a inner join query. Which of the below options is recommended for high performance when there are millions of records in both the tables.
NOTE: We have a non-clustered index on the foreign key column. We don't have enough data to verify the performance in the development environment. Also there might be more tables come into this join with INNER or LEFT joins.
Tables:
Subscriber(SID(PK), Name)
Account(AID(PK),SID(FK), AName)
Query:
SELECT *
FROM Account A
INNER JOIN Subscriber S ON S.SubscriberID= A.SubscriberID
WHERE
S.SubscriberID = #subID -- option 1
A.SubscriberID = #subID -- option 2
It should not make any difference which column you put the predicate on.
SQL Server can see the S.SubscriberID= A.SubscriberID condition and create an "implied predicate" for the other column anyway.
This does become a cross join (between the filtered rows from both sides) then as discussed here

sqlite query does not complete - bad index, or just too much data?

I have two tables in SQLITE: from_original and from_binary. I want to LEFT OUTER JOIN to pull the records in from_original that are not present in from_binary. The problem is that the query I have written does not complete (I terminated it after about 1 hour). Can anyone help me to understand why?
I have an index defined for each of the fields in my query, but explain query plan only mentions one of the indices will be referenced. I am not sure if this is the problem, or if it is just a matter of too much data.
Here is the query I am trying to run:
select * from from_original o
left outer join from_binary b
on o.id = b.id
and o.timestamp = b.timestamp
where b.id is null
The tables each have about 4 million records:
I have an index defined on all the id and timestamp fields (see schema at end of post), but explain query plan only indicates that one of the id indices will be used.
Here is the table schema, including index definitions:
This is your query:
select o.*
from from_original o left outer join
from_binary b
on o.id = b.id and o.timestamp = b.timestamp
where b.id is null;
(Note: you do not need the columns from b because they should all be NULL.)
The best index for this query is a composite index on from_binary(id, timestamp). You do have a lot of data for SQLite, but this might finish in a finite amount of time.

Most efficient way for a nested sql server update query

I am going to update a table using the sum of specific value from 3 different tables. For this purpose I wrote this query. But it takes too much time, what is the most efficient query for this purpose?
UPDATE dbo.dumpfile_doroud
SET dumpfile_doroud.sms_count_on_net = (SELECT sms_count_on_net
FROM dbo.dumpfile139201
WHERE
dbo.dumpfile_doroud.msisdn = dbo.dumpfile139201.msisdn)
+ (SELECT sms_count_on_net
FROM dbo.dumpfile139202
WHERE
dbo.dumpfile_doroud.msisdn = dbo.dumpfile139202.msisdn)
+ (SELECT sms_count_on_net
FROM dbo.dumpfile139203
WHERE
dbo.dumpfile_doroud.msisdn = dbo.dumpfile139203.msisdn)
P.S: dumpfile_doroud is small table but other three tables are really big.
Try this:
UPDATE t1
SET t1.sms_count_on_net=isnull(t2.sms_count_on_net,0) +
isnull(t3.sms_count_on_net,0) +
isnull(t4.sms_count_on_net,0)
FROM dbo.dumpfile_doroud t1
LEFT JOIN dbo.dumpfile139201 t2
ON t2.msisdn = t1.msisdn
LEFT JOIN dumpfile139202 t3
ON t3.msisdn = t1.msisdn
LEFT JOIN dumpfile139203 t4
ON t4.msisdn = t1.msisdn
I don't think it's possible to make faster query, so you can try put indexes. I think you can create nonclustered index on column msisdn on all tables. Syntax:
CREATE NONCLUSTERED INDEX IX_doroud_dumpfile139201
ON dbo.dumpfile139201(msisdn);
You can run SQL Management studio and turn on display estimated execution plan this sometimes gives good advices on creating indexes.
Create a subquery to calculate the totals then join the table to it
UPDATE o
SET o.sms_count_on_net = n.sms_count_on_net
FROM
dbo.dumpfile_doroud o
JOIN
(SELECT
d.msisdn, sms_count_on_net = (d1.sms_count_on_net+d2.sms_count_on_net+d3.sms_count_on_net)
FROM
dbo.dumpfile_doroud d
LEFT JOIN dbo.dumpfile139201 d1 ON d1.msisdn = d.msisdn
LEFT JOIN dbo.dumpfile139202 d2 ON d2.msisdn = d.msisdn
LEFT JOIN dbo.dumpfile139203 d3 ON d3.msisdn = d.msisdn) n
ON o.msisdn = n.msisdn
Note that if the value is missing from any of those tables the total will be null. That may or may not be what you want

Optimizing "IS NULL"

I have a query that is very slow due to a IS NULL check in the where clause. At least, that's what it looks like. The query needs over a minute to complete.
Simplified query:
SELECT DISTINCT TOP 100 R.TermID2, NP.Title, NP.JID
FROM Titles NP
INNER JOIN Term P
ON NP.TermID = P.ID
INNER JOIN Relation R
ON P.ID = R.TermID2
WHERE R.TermID1 IS NULL -- the culprit?
AND NP.JID = 3
I have non-unique, non-clusterd and unique, clustered indices on all of the mentioned fields as well as an extra index that covers R.TermID1 and has a filter TermID1 IS NULL.
Term has 2835302 records. Relation has 25446678 records, where 10% of them has TermID1 = NULL.
The SQL plan in XML form is here: http://pastebin.com/raw.php?i=xcDs0VD0
So, I was messing around the index of the largest table, adding filtered indexes, covering columns, changin around the clauses, etc.
At one point I simply deleted the index and created a new index that had the old configuration and it worked!
You could remove the WHERE clause and put the conditions into the JOIN clauses.
SELECT DISTINCT TOP 100 R.TermID2, NP.Title, NP.JID
FROM Titles NP
INNER JOIN Term P
ON NP.TermID = P.ID AND NP.JID = 3
INNER JOIN Relation R
ON P.ID = R.TermID2 AND R.TermID1 IS NULL

SQL: Chaining Joins Efficiency

I have a query in my WordPress plugin like this:
SELECT users.*, U.`meta_value` AS first_name,M.`meta_value` AS last_name
FROM `nwp_users` AS users
LEFT JOIN `nwp_usermeta` U
ON users.`ID`=U.`user_id`
LEFT JOIN `nwp_usermeta` M
ON users.`ID`=M.`user_id`
LEFT JOIN `nwp_usermeta` C
ON users.`ID`=C.`user_id`
WHERE U.meta_key = 'first_name'
AND M.meta_key = 'last_name'
AND C.meta_key = 'nwp_capabilities'
ORDER BY users.`user_login` ASC
LIMIT 0,10
I'm new to using JOIN and I'm wondering how efficient it is to use so many JOIN in one query. Is it better to split it up into multiple queries?
The database schema can be found here.
JOIN usually isn't so bad if the keys are indexed. LEFT JOIN is almost always a performance hit and you should avoid it if possible. The difference is that LEFT JOIN will join all rows in the joined table even if the column you're joining is NULL. While a regular (straight) JOIN just joins the rows that match.
Post your table structure and we can give you a better query.
See this comment:
http://forums.mysql.com/read.php?24,205080,205274#msg-205274
For what it's worth, to find out what MySQL is doing and to see if you have indexed properly, always check the EXPLAIN plan. You do this by putting EXPLAIN before your query (literally add the word EXPLAIN before the query), then run it.
In your query, you have a filter AND C.meta_key = 'nwp_capabilities' which means that all the LEFT JOINs above it can be equally written as INNER JOINs. Because if the LEFT JOINS fail (LEFT OUTER is intended to preserve the results from the left side), the result will 100% be filtered out by the WHERE clause.
So a more optimal query would be
SELECT users.*, U.`meta_value` AS first_name,M.`meta_value` AS last_name
FROM `nwp_users` AS users
JOIN `nwp_usermeta` U
ON users.`ID`=U.`user_id`
JOIN `nwp_usermeta` M
ON users.`ID`=M.`user_id`
JOIN `nwp_usermeta` C
ON users.`ID`=C.`user_id`
WHERE U.meta_key = 'first_name'
AND M.meta_key = 'last_name'
AND C.meta_key = 'nwp_capabilities'
ORDER BY users.`user_login` ASC
LIMIT 0,10
(note: "JOIN" (alone) = "INNER JOIN")
Try explaining the query to see what is going on and if your select if optimized. If you haven't used explain before read some tutorials:
http://www.learn-mysql-tutorial.com/OptimizeQueries.cfm
http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm