Optimizing "IS NULL" - optimization

I have a query that is very slow due to a IS NULL check in the where clause. At least, that's what it looks like. The query needs over a minute to complete.
Simplified query:
SELECT DISTINCT TOP 100 R.TermID2, NP.Title, NP.JID
FROM Titles NP
INNER JOIN Term P
ON NP.TermID = P.ID
INNER JOIN Relation R
ON P.ID = R.TermID2
WHERE R.TermID1 IS NULL -- the culprit?
AND NP.JID = 3
I have non-unique, non-clusterd and unique, clustered indices on all of the mentioned fields as well as an extra index that covers R.TermID1 and has a filter TermID1 IS NULL.
Term has 2835302 records. Relation has 25446678 records, where 10% of them has TermID1 = NULL.
The SQL plan in XML form is here: http://pastebin.com/raw.php?i=xcDs0VD0

So, I was messing around the index of the largest table, adding filtered indexes, covering columns, changin around the clauses, etc.
At one point I simply deleted the index and created a new index that had the old configuration and it worked!

You could remove the WHERE clause and put the conditions into the JOIN clauses.
SELECT DISTINCT TOP 100 R.TermID2, NP.Title, NP.JID
FROM Titles NP
INNER JOIN Term P
ON NP.TermID = P.ID AND NP.JID = 3
INNER JOIN Relation R
ON P.ID = R.TermID2 AND R.TermID1 IS NULL

Related

PostgreSql query becomes slow when adding where condition

This query has so many joins and i have to apply conditions on joined table columns to get the desired results but query becomes slow when i apply condition on a datetime column.
Here is the query
select
distinct v0_.id as id_0,
MIN(v4_.price) as sclr_4
from
venue v0_
left join facility f5_ on
v0_.id = f5_.venue_id
and (f5_.deleted_at is null)
left join sport_facility_types s6_ on
f5_.id = s6_.facility_id
left join taxonomy_term t7_ on
s6_.sport_id = t7_.id
and (t7_.deleted_at is null)
left join term t8_ on
t7_.term_id = t8_.id
left join sport_facility_types_taxonomy_term s10_ on
s6_.id = s10_.sport_facility_types_id
left join taxonomy_term t9_ on
t9_.id = s10_.taxonomy_term_id
and (t9_.deleted_at is null)
left join term t11_ on
t9_.term_id = t11_.id
left join facility_venue_item_price f12_ on
f5_.id = f12_.facility_id
left join venue_item_price v4_ on
f12_.venue_item_price_id = v4_.id
left join calendar_entry c13_ on
v4_.calendar_entry_id = c13_.id
where
(v0_.status = 'active'
and f5_.status = 'active')
and (v0_.deleted_at is null)
and c13_.start_at >= '2022-10-21 19:00:00' --- this slows down the query
group by
v0_.id
And here is the query plan https://explain.dalibo.com/plan/46h0fb3343e246a5.
The query plan is so big that i cannot paste it here
Plain query plan https://explain.depesz.com/s/7qnD
Plain query plan without where condition https://explain.depesz.com/s/3sK3
The query shouldn't take much time as there are not many rows in tables.
calendar_entry table has ~350000 rows
venue_item_price table has also ~320000 rows
Your WHERE condition turns all the outer joins into inner joins (because c13_.start_at cannot be NULL any more), which changes the whole game. Usually that is an advantage, because it gives the optimizer more freedom, but that seems to have worked in the wrong direction in your case. One reason for that may be that you join more than 8 tables, and the default value for join_collapse_limit is 8.
That's pretty much all I can say without a readable execution plan.
When you don't reference c13_ anywhere in the output or WHERE clause, it can just skip doing that join altogether, because it can't change anything. That is, the join is (presumably) on a unique key, so it can't cause multiple rows to be returned; and it is a left join, so it can't cause zero rows to be returned.
Once you reference it in the WHERE, however, the join has to be executed. That join only contributes about half the time, so it isn't like the query was blazingly fast to start with.

SELECT FROM inner query slowdown

We have two very similar queries, one takes 22 seconds the other takes 6 seconds. Both use an inner select, have the exact same outer columns and outer joins. The only difference is the inner select that the outer query is using to join in on.
The inner query when run alone executes in 100ms or less in both cases and returns the EXACT SAME data.
Both queries as a whole have a lot of room for improvement, but this particular oddity is really puzzling to us and we just want to understand why. To me it would seem the inner query should be executed once in 100ms then the outer stuff happens. I have a feeling the inner select may be executed multiple times.
Query that takes 6 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
ORDER BY projectItemsID ASC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
Query that takes 22 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
For every row in your projectItems table, in the second function, you search two columns instead of one. If projectItemsID isn't the primary key or if it isn't indexed, it takes longer to parse an extra column.'
If you look at the sizes of the tables and the number of rows each query returns, you can calculate how many comparisons need to be made for each of the queries.
I believe that you're right that the inner query is being run for every single row that is being left joined with categories.
I can't find a proper source on it right now, but you can easily test this by doing something like this and comparing the run times. Here, we can at least be sure that the inner query is only running one time. (sorry if any syntax is incorrect, but you'll get the general idea):
DECLARE #innerQuery TABLE ( [all inner query columns here] )
INSERT INTO #innerQuery
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
SELECT {whole bunch of field names}
FROM #innerQuery as IQ
LEFT JOIN categories
ON IQ.fk_category = categories.categoryID
...{more joins}

Why is my query not using this index?

I have a query I turned into a view that works OK. But the site_phrase sp table seems to be not using a column and goes through all the records in the table. Why is that? Here is the query:
EXPLAIN SELECT
`p`.`id` AS `id`,
`p`.`text` AS `phrase`,
`p`.`ignored` AS `ignored_phrase`,
`p`.`client_id` AS `client_id`,
`s`.`id` AS `site_id`,
`s`.`sub_domain` AS `sub_domain`,
`s`.`competitor` AS `competitor`,
`s`.`ignored` AS `ignored_site`,
`pc`.`id` AS `pc_id`,
`pc`.`name` AS `pc_name`,
`psc`.`id` AS `psc_id`,
`psc`.`name` AS `psc_name`,
`p`.`volume` AS `volume`,
MIN(`pos`.`position`) AS `position`,
`pos`.`id` AS `pos_id`
FROM `client` c
JOIN client_site cs ON cs.client_id = c.id
JOIN site s ON s.id = cs.site_id
JOIN site_phrase sp ON sp.site_id = s.id
JOIN phrase p ON p.id = sp.phrase_id
JOIN `position` pos ON pos.phrase_id = sp.phrase_id
AND pos.site_id = sp.site_id
LEFT JOIN `phrase_sub_category` `psc`
ON `psc`.`id` = `p`.`phrase_sub_category_id`
LEFT JOIN `phrase_category` `pc`
ON `pc`.`id` = `psc`.`phrase_category_id`
GROUP BY `p`.`id`,`s`.`id`,`serp`.`id`
ORDER BY `p`.`id`,`pos`.`position`
And here is a screenshot of the output the above query gets when I EXPLAIN / DESCRIBE it
http://img827.imageshack.us/img827/3336/indexsql.png
No matter how I alter the order of the tables above and how they are joined, the 1st or 2nd table always seems to be doing some sort of table scan. In the example of the screenshot, the problem table is sp. These tables are innoDB type, and there is appropriate indexes and foreign keys on all the tables I JOIN on. Any insights would be helpful.
MySQL will use a full table scan if it determines that it is faster than a using index. In the case of your SP table - with only 1300 records the table scan may be just as fast as a index.

SQL Joins on varchar fields timing out

I have a join which deletes rows that match another table but the joining fields have to be a large varchar (250 chars). I know this isn't ideal but I can't think of a better way. Here's my query:
DELETE P
FROM dbo.FeedPhotos AS P
INNER JOIN dbo.ListingPhotos AS P1 ON P.photo = P1.feedImage
INNER JOIN dbo.Listings AS L ON P.accountID = L.accountID
WHERE P.feedID = #feedID
This query is constantly timing out even though there are less than 1000 rows in the ListingPhotos table.
Any help would be appreciated.
I'd probably start by removing this line, as it doesn't seem to be doing anything:
INNER JOIN dbo.Listings AS L ON P.accountID = L.accountID
There might not be a lot of rows in ListingPhotos, but if there are a lot of rows in Listings then the join won't be optimized out.
Also check your indexing, as any join is bound to be slow without the appropriate indexes. Although you should generally try to avoid joining on character fields anyway, it's usually a sign that the data is not normalized properly.
I would consider:
rewriting to use EXISTS. This will stop processing if one row is found more reliably then relying on JOIN which may have many more intermediate rows (which is what Aaronaught said)
ensure all datatypes match exactly. All differences in length or type will mean no indexes will be used
speaking of which, do you have an index (rough guess) on feedid, photo and accountid?
Something like:
DELETE
P
FROM
dbo.FeedPhotos AS P
WHERE
P.feedID = #feedID
AND
EXISTS (SELECT * FROM
dbo.ListingPhotos P1
WHERE P.photo = P1.feedImage)
AND
EXISTS (SELECT * FROM
dbo.Listings L
WHERE P.accountID = L.accountID)
Simply add an index.
CREATE INDEX idx_feedPhotos_feedid
ON dbo.FeedPhotos (feedId)

Optimizing for an OR in a Join in MySQL

I've got a pretty complex query in MySQL that slows down drastically when one of the joins is done using an OR. How can I speed this up? the relevant join is:
LEFT OUTER JOIN publications p ON p.id = virtual_performances.publication_id
OR p.shoot_id = shoots.id
Removing either condition in the OR decreases the query time from 1.5s to 0.1s. There are already indexes on all the relevant columns I can think of. Any ideas? The columns in use all have indexes on them. Using EXPLAIN I've discovered that once the OR comes into play MySQL ends up not using any of the indexes. Is there a special kind of index I can make that it will use?
This is a common difficulty with MySQL. Using OR baffles the optimizer because it doesn't know how to use an index to find a row where either condition is true.
I'll try to explain: Suppose I ask you to search a telephone book and find every person whose last name is 'Thomas' OR whose first name is 'Thomas'. Even though the telephone book is essentially an index, you don't benefit from it -- you have to search through page by page because it's not sorted by first name.
Keep in mind that in MySQL, any instance of a table in a given query can make use of only one index, even if you have defined multiple indexes in that table. A different query on that same table may use another index if the optimizer reasons that it's more helpful.
One technique people have used to help in situations like your is to do a UNION of two simpler queries that each make use of separate indexes:
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON (p.id = v.publication_id)
UNION ALL
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON p.shoot_id = s.id;
Make two joins on the same table (adding aliases to separate them) for the two conditions, and see if that is faster.
select ..., coalesce(p1.field, p2.field) as field
from ...
left join publications p1 on p1.id = virtual_performances.publication_id
left join publications p2 on p2.shoot_id = shoots.id
You can also try something like this on for size:
SELECT * FROM tablename WHERE id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.id IN virtual_performances.publication_id)
OR
p.id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.shoot_id = shoots.id);
It's a bit messier, and won't be faster in every case, but MySQL is good at selecting from straight data sets, so repeating yourself isn't so bad.