SQL Joins on varchar fields timing out - sql

I have a join which deletes rows that match another table but the joining fields have to be a large varchar (250 chars). I know this isn't ideal but I can't think of a better way. Here's my query:
DELETE P
FROM dbo.FeedPhotos AS P
INNER JOIN dbo.ListingPhotos AS P1 ON P.photo = P1.feedImage
INNER JOIN dbo.Listings AS L ON P.accountID = L.accountID
WHERE P.feedID = #feedID
This query is constantly timing out even though there are less than 1000 rows in the ListingPhotos table.
Any help would be appreciated.

I'd probably start by removing this line, as it doesn't seem to be doing anything:
INNER JOIN dbo.Listings AS L ON P.accountID = L.accountID
There might not be a lot of rows in ListingPhotos, but if there are a lot of rows in Listings then the join won't be optimized out.
Also check your indexing, as any join is bound to be slow without the appropriate indexes. Although you should generally try to avoid joining on character fields anyway, it's usually a sign that the data is not normalized properly.

I would consider:
rewriting to use EXISTS. This will stop processing if one row is found more reliably then relying on JOIN which may have many more intermediate rows (which is what Aaronaught said)
ensure all datatypes match exactly. All differences in length or type will mean no indexes will be used
speaking of which, do you have an index (rough guess) on feedid, photo and accountid?
Something like:
DELETE
P
FROM
dbo.FeedPhotos AS P
WHERE
P.feedID = #feedID
AND
EXISTS (SELECT * FROM
dbo.ListingPhotos P1
WHERE P.photo = P1.feedImage)
AND
EXISTS (SELECT * FROM
dbo.Listings L
WHERE P.accountID = L.accountID)

Simply add an index.
CREATE INDEX idx_feedPhotos_feedid
ON dbo.FeedPhotos (feedId)

Related

Excluding rows from result set, LEFT JOIN and EXCEPT

When you have two tables, and want to exclude rows from the second one, there are a multitude of options including EXISTS, NOT IN, LEFT JOIN and EXCEPT.
I've always used left join:
select N.ProductID from NewProducts N
left join Products P on P.ProductID = N.ProductID
where P.ProductID is null
Now I'm thinking it's cleaner to to use EXCEPT:
select ProductID from NewProducts
except
select ProductID from Products
Are there performance issues of using EXCEPT?
You can check execution plan and SQL profiler to choose the suitable query.
But, for me, NOT EXISTS is good. Reference here
The answer to your question is all up to you, depending on how large the data.
You can use any of that (EXISTS, NOT IN, LEFT JOIN and EXCEPT.) depending on your requirement.
you said that you always use LEFT JOIN , and that is good.. because joining the two tables will minimize the execution time of the query, especially when you are holding large amount of data.
JOIN is advisable but it is always depends on you.
You can see the difference of execution time using the execution plan of sql.

Optimizing "IS NULL"

I have a query that is very slow due to a IS NULL check in the where clause. At least, that's what it looks like. The query needs over a minute to complete.
Simplified query:
SELECT DISTINCT TOP 100 R.TermID2, NP.Title, NP.JID
FROM Titles NP
INNER JOIN Term P
ON NP.TermID = P.ID
INNER JOIN Relation R
ON P.ID = R.TermID2
WHERE R.TermID1 IS NULL -- the culprit?
AND NP.JID = 3
I have non-unique, non-clusterd and unique, clustered indices on all of the mentioned fields as well as an extra index that covers R.TermID1 and has a filter TermID1 IS NULL.
Term has 2835302 records. Relation has 25446678 records, where 10% of them has TermID1 = NULL.
The SQL plan in XML form is here: http://pastebin.com/raw.php?i=xcDs0VD0
So, I was messing around the index of the largest table, adding filtered indexes, covering columns, changin around the clauses, etc.
At one point I simply deleted the index and created a new index that had the old configuration and it worked!
You could remove the WHERE clause and put the conditions into the JOIN clauses.
SELECT DISTINCT TOP 100 R.TermID2, NP.Title, NP.JID
FROM Titles NP
INNER JOIN Term P
ON NP.TermID = P.ID AND NP.JID = 3
INNER JOIN Relation R
ON P.ID = R.TermID2 AND R.TermID1 IS NULL

multiple sql joins not producing desired results

I'm new to sql and trying to tweak someone else's huge stored procedure to get a subset of the results. The code below is maybe 10% of the whole procedure. I added the lp.posting_date, last left join, and the where clause. Trying to get records where the posting date is between the start date and the end date. Am I doing this right? Apparently not because the results are unaffected by the change. UPDATE: I CHANGED THE LAST JOIN. The results are correct if there's only one area allocation term. If there is more than one area allocation term, the results are duplicated for each term.
SELECT Distinct
l.lease_id ,
l.property_id as property_id,
l.lease_number as LeaseNumber,
l.name as LeaseName,
lty.name as LeaseType,
lst.name as LeaseStatus,
l.possession_date as PossessionDate,
l.rent as RentCommencementDate,
l.store_open_date as StoreOpenDate,
msr.description as MeasureUnit,
l.comments as Comments ,
lat.start_date as atStartDate,
lat.end_date as atEndDate,
lat.rentable_area as Rentable,
lat.usable_area as Usable,
laat.start_date as aatStartDate,
laat.end_date as aatEndDate,
MK.Path as OrgPath,
CAST(laa.percentage as numeric(9,2)) as Percentage,
laa.rentable_area as aaRentable,
laa.usable_area as aaUsable,
laa.headcounts as Headcount,
laa.area_allocation_term_id,
lat.area_term_id,
laa.area_allocation_id,
lp.posting_date
INTO #LEASES FROM la_tbl_lease l
INNER JOIN #LEASEID on l.lease_id=#LEASEID.lease_id
INNER JOIN la_tbl_lease_term lt on lt.lease_id=l.lease_id and lt.IsDeleted=0
LEFT JOIN la_tlu_lease_type lty on lty.lease_type_id=l.lease_type_id and lty.IsDeleted=0
LEFT JOIN la_tlu_lease_status lst on lst.status_id= l.status_id
LEFT JOIN la_tbl_area_group lag on lag.lease_id=l.lease_id
LEFT JOIN fnd_tlu_unit_measure msr on msr.unit_measure_key=lag.unit_measure_key
LEFT JOIN la_tbl_area_term lat on lat.lease_id=l.lease_id and lat.isDeleted=0
LEFT JOIN la_tbl_area_allocat_term laat on laat.area_term_id=lat.area_term_id and laat.isDeleted=0
LEFT JOIN dbo.la_tbl_area_allocation laa on laa.area_allocation_term_id=laat.area_allocation_term_id and laa.isDeleted=0
LEFT JOIN vw_FND_TLU_Menu_Key MK on menu_type_id_key=2 and isActive=1 and id=laa.menu_id_key
INNER JOIN la_tbl_lease_projection lp on lp.lease_projection_id = #LEASEID.lease_projection_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
As may have already been hinted at you should be careful when using the WHERE clause with an OUTER JOIN.
The idea of the OUTER JOIN is to optionally join that table and provide access to the columns.
The JOINS will generate your set and then the WHERE clause will run to restrict your set. If you are using a condition in the WHERE clause that says one of the columns in your outer joined table must exist / equal a value then by the nature of your query you are no longer doing a LEFT JOIN since you are only retrieving rows where that join occurs.
Shorten it and copy it out as a new query in ssms or whatever you are using for testing. Use an inner join unless you want to preserve the left side set even when there is no matching lp.lease_id. Try something like
if object_id('tempdb..#leases) is not null
drop table #leases;
select distinct
l.lease_id
,l.property_id as property_id
,lp.posting_date
into #leases
from la_tbl_lease as l
inner join la_tbl_lease_projection as lp on lp.lease_id = l.lease_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
select * from #leases
drop table #leases
If this gets what you want then you can work from there and add the other left joins to the query (getting rid of the select * and 'drop table' if you copy it back into your proc). If it doesn't then look at your Boolean date logic or provide more detail for us. If you are new to sql and its procedural extensions, try using the object explorer to examine the properties of the columns you are querying, and try selecting the top 1000 * from the tables you are using to get a feel for what the data looks like when building queries. -Mike
You can try the BETWEEN operator as well
Where lp.posting_date BETWEEN laat.start_date AND laat.end_date
Reasoning: You can have issues wheres there is no matching values in a table. In that instance on a left join the table will populate with null. Using the 'BETWEEN' operator insures that all returns have a value that is between the range and no nulls can slip in.
As it turns out, the problem was easier to solve and it was in a different place in the stored procedure. All I had to do was add one line to one of the cursors to include area term allocations by date.

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?
ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.
This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).
Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);
For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.

Optimizing for an OR in a Join in MySQL

I've got a pretty complex query in MySQL that slows down drastically when one of the joins is done using an OR. How can I speed this up? the relevant join is:
LEFT OUTER JOIN publications p ON p.id = virtual_performances.publication_id
OR p.shoot_id = shoots.id
Removing either condition in the OR decreases the query time from 1.5s to 0.1s. There are already indexes on all the relevant columns I can think of. Any ideas? The columns in use all have indexes on them. Using EXPLAIN I've discovered that once the OR comes into play MySQL ends up not using any of the indexes. Is there a special kind of index I can make that it will use?
This is a common difficulty with MySQL. Using OR baffles the optimizer because it doesn't know how to use an index to find a row where either condition is true.
I'll try to explain: Suppose I ask you to search a telephone book and find every person whose last name is 'Thomas' OR whose first name is 'Thomas'. Even though the telephone book is essentially an index, you don't benefit from it -- you have to search through page by page because it's not sorted by first name.
Keep in mind that in MySQL, any instance of a table in a given query can make use of only one index, even if you have defined multiple indexes in that table. A different query on that same table may use another index if the optimizer reasons that it's more helpful.
One technique people have used to help in situations like your is to do a UNION of two simpler queries that each make use of separate indexes:
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON (p.id = v.publication_id)
UNION ALL
SELECT ...
FROM virtual_performances v
JOIN shoots s ON (...)
LEFT OUTER JOIN publications p ON p.shoot_id = s.id;
Make two joins on the same table (adding aliases to separate them) for the two conditions, and see if that is faster.
select ..., coalesce(p1.field, p2.field) as field
from ...
left join publications p1 on p1.id = virtual_performances.publication_id
left join publications p2 on p2.shoot_id = shoots.id
You can also try something like this on for size:
SELECT * FROM tablename WHERE id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.id IN virtual_performances.publication_id)
OR
p.id IN
(SELECT p.id FROM tablename LEFT OUTER JOIN publications p ON p.shoot_id = shoots.id);
It's a bit messier, and won't be faster in every case, but MySQL is good at selecting from straight data sets, so repeating yourself isn't so bad.