Changing SQL NOT IN to JOINS - sql

Hello guys,
Our aim is to get a script that will insert the missing pairs of product - TaxCategory in the intermediate table (ProductTaxCategory)
The following script is correctly working but we are trying to find a way to optimize it:
INSERT ProductTaxCategory
(ProductTaxCategory_TaxCategoryId,ProductTaxCategory_ProductId)
SELECT
TaxCategoryId
,ProductId
FROM Product pr
CROSS JOIN TaxCategory tx
WHERE pr.ProductId NOT IN
(
SELECT ProductTaxCategory_ProductId
FROM ProductTaxCategory
)
OR
pr.ProductId IN
(
SELECT ProductTaxCategory_ProductId
FROM ProductTaxCategory
)
AND
tx.TaxCategoryId NOT IN
(
SELECT ProductTaxCategory_TaxCategoryId
FROM ProductTaxCategory
WHERE ProductTaxCategory_ProductId = pr.ProductId
)
How can we optimize this query ?

Try something like (full statement now):
INSERT INTO ProductTaxCategory
(ProductTaxCategory_TaxCategoryId,ProductTaxCategory_ProductId)
SELECT TaxCategoryId, ProductId
FROM Product pr CROSS JOIN TaxCategory tx
WHERE NOT EXISTS
(SELECT 1 FROM ProductTaxCategory
WHERE ProductTaxCategory_ProductId = pr.ProductId
AND ProductTaxCategory_TaxCategoryId = tx.TaxCategoryId)
EXISTS with (SELECT 1 ... WHERE ID=...) is often a better alternative to IN (SELECT ID FROM ... ) constructs.

You can do a LEFT JOIN with ProductTaxCategoryand check for NULLs.
Something like this.
INSERT ProductTaxCategory
(
ProductTaxCategory_TaxCategoryId,
ProductTaxCategory_ProductId
)
SELECT p.TaxCategoryId, p.ProductId
FROM
(
SELECT TaxCategoryId, ProductId
FROM Product pr
CROSS JOIN TaxCategory tx
) p
LEFT JOIN ProductTaxCategory ptx
ON P.TaxCategoryId = ptx.ProductTaxCategory_TaxCategoryId
AND P.ProductId = ptx.ProductTaxCategory_ProductId
WHERE ptx.ProductTaxCategory_ProductId IS NULL

Use CROSS JOIN and EXCEPT
INSERT ProductTaxCategory(ProductTaxCategory_ProductId, ProductTaxCategory_TaxCategoryId)
SELECT p.ProductID, tc.TaxCategoryId FROM Product p CROSS JOIN TaxCategory tc
EXCEPT
SELECT ProductTaxCategory_ProductId, ProductTaxCategory_TaxCategoryId FROM ProductTaxCategory
CROSS JOIN will search all the possible pairs. EXCEPT will get you what's missing. Finally you can INSERT them onto the table.

Related

SQL LEFT JOIN empties out the left table columns

I came across some weird behavior today with postgresql.
WITH actual_prices AS (
-- Looking for prices from now to the given number of days back
SELECT *
FROM prices
WHERE price_date >= now()::date - 93
)
, distinct_products_sold AS (
SELECT distinct(id_product) as pid FROM products_sold
)
, first_prices AS (
SELECT s.pid, p.product_id, p.price_date, p.price
FROM distinct_products_sold s
LEFT JOIN actual_prices p ON p.product_id = s.pid
)
select * from first_prices;
This code outputs something of this kind:
129 | | |
195 | | |
251 | | |
...
In other words, columns of table actual_prices are empty. I tried messing around with JOIN just to see what's going on: if I do RIGHT JOIN instead of LEFT JOIN, it empties the column of distinct_products_sold but the columns of actual_prices are displayed correctly. What can cause this?
You have it the wrong way around: it is not that the outer join causes data to be lost from one table, rather it forces a union between the tables by padding the missing columns with nulls e.g.
WITH P ( PID ) AS
(
SELECT *
FROM (
VALUES ( 1 ), ( 2 ), ( 3 )
) AS T ( C )
),
Q ( QID ) AS
(
SELECT *
FROM (
VALUES ( 4 ), ( 5 ), ( 6 )
) AS T ( C )
)
SELECT p.PID, q.QID
FROM P p, Q q
WHERE p.PID = q.QID
UNION
SELECT p.PID, NULL
FROM P p
WHERE p.PID NOT IN ( SELECT QID FROM Q );
Forgive me for my brainfart. Turns out it output unmatched results(how surprising). LEFT/RIGHT Joins also output unmatched results of left or right table.
P.S. Have a launch before posting a question.
No need for WITH clause here , try this:
SELECT t.pid , p.product_id, p.price_date, p.price
FROM (SELECT distinct id_product as pid FROM products_sold) t
LEFT JOIN prices p
ON(t.pid = p.product_id AND p.price_date >= now()::date - 93)
If all the columns from table prices are still NULL, then there are just no matches.
A left join keeps all the records from the leading table(the left table) and only the matched data from the right table.

SQL: Cross join query

SELECT * FROM training.dbo.[PERSON] P
LEFT JOIN training.dbo.PERSON_CAREER_HISTORY PC ON (P.PERSON_ID=PC.PERSON_ID)
CROSS JOIN (SELECT DISTINCT PC2.POSITION training.dbo.PERSON_CAREER_HISTORY) PC2
WHERE PC.POSITION IS NULL
the cross join part is not working giving the error
"Incorrect syntax near '.'."
I can't fix it, and been fixing it for about an hour. Please tell me my error
You missed the FROM in the CROSS JOIN.
SELECT * FROM training.dbo.[PERSON] P
LEFT JOIN training.dbo.PERSON_CAREER_HISTORY PC ON (P.PERSON_ID=PC.PERSON_ID)
CROSS JOIN (SELECT DISTINCT PC2.POSITION FROM training.dbo.PERSON_CAREER_HISTORY) PC2
WHERE PC.POSITION IS NULL
You have missed out the FROM keyword from the sub query. Try this:
SELECT
*
FROM
training.dbo.[PERSON] P
LEFT JOIN training.dbo.PERSON_CAREER_HISTORY PC
ON (P.PERSON_ID=PC.PERSON_ID)
CROSS JOIN (
SELECT DISTINCT
PC2.POSITION
FROM
training.dbo.PERSON_CAREER_HISTORY) PC2
WHERE
PC.POSITION IS NULL
If you are looking for the positions that people do not have, then this is the query that you want:
SELECT *
FROM training.dbo.[PERSON] P CROSS JOIN
(SELECT DISTINCT PC2.POSITION FROM training.dbo.PERSON_CAREER_HISTORY
) pp LEFT JOIN
training.dbo.PERSON_CAREER_HISTORY PCH
ON P.PERSON_ID = PC.PERSON_ID AND pp.POSITION = PC.POSITION
WHERE PC.POSITION IS NULL;
The joins are not correct in your version (as well as the problem with the subquery). Otherwise, I cannot figure out the purpose of your original query.

JOIN / LEFT JOIN conflict in SQL Server

I have a tricky query. I need to select all recent versions of 2 types of members of administrator groups. Here is the query:
SELECT refGroup.*
FROM tblSystemAdministratorGroups refGroup
JOIN tblGroup refMem ON refGroup.AttributeValue = refMem.ObjectUID
This query will return all the administrator groups. The next step will be getting the members of these groups. Since I have 2 types of memberships (Explicit, Computed), I will have to use a LEFT JOIN to make sure that I am not excluding any rows.
SELECT refGroup.*
FROM tblSystemAdministratorGroups refGroup
-- The JOIN bellow can be excluded but it is here just to clarify the architecture
JOIN tblGroup refMem ON refGroup.AttributeValue = refMem.ObjectUID
LEFT JOIN tblGroup_ComputedMember cm ON refMem.ObjectUID = cm.GroupObjectID
LEFT JOIN tblGroup_ExplicitMember em ON refMem.ObjectUID = em.GroupObjectID
The last piece in the puzzle is to get the latest version of each member. For that I will have to use JOIN to exclude older versions:
JOIN (
SELECT MAX([ID]) MaxId
FROM [OmadaReporting].[dbo].tblGroup_ComputedMember
GROUP BY ObjectID
) MostRecentCM ON MostRecentCM.MaxId = cm.Id
and
JOIN (
SELECT MAX([ID]) MaxId
FROM [OmadaReporting].[dbo].tblGroup_ExplicitMember
GROUP BY ObjectID
) MostRecentEM ON MostRecentEM.MaxId = em.Id
The full query will be:
SELECT refGroup.*
FROM tblSystemAdministratorGroups refGroup
JOIN tblGroup refMem ON refGroup.AttributeValue = refMem.ObjectUID
LEFT JOIN tblGroup_ComputedMember cm ON refMem.ObjectUID = cm.GroupObjectID
JOIN (
SELECT MAX([ID]) MaxId
FROM [OmadaReporting].[dbo].tblGroup_ComputedMember
GROUP BY ObjectID
) MostRecentCM ON MostRecentCM.MaxId = cm.Id
LEFT JOIN tblGroup_ExplicitMember em ON refMem.ObjectUID = em.GroupObjectID
JOIN (
SELECT MAX([ID]) MaxId
FROM [OmadaReporting].[dbo].tblGroup_ExplicitMember
GROUP BY ObjectID
) MostRecentEM ON MostRecentEM.MaxId = em.Id
The issue is clear: The 2 JOIN to exclude old versions are also applied to the select statement and clearly no rows are returned. What would be the best solution to escape such situation and to return the intended values?
SELECT refGroup.*
FROM tblSystemAdministratorGroups refGroup
JOIN tblGroup refMem ON refGroup.AttributeValue = refMem.ObjectUID
LEFT JOIN (
select GroupObjectID, ID, max(ID) over (partition by ObjectID) as maxID
from tblGroup_ComputedMember
) cm ON refMem.ObjectUID = cm.GroupObjectID and cm.ID = cm.maxID
LEFT JOIN (
select GroupObjectID, ID, max(ID) over (partition by ObjectID) as maxID
from tblGroup_ExplicitMember
) em ON refMem.ObjectUID = em.GroupObjectID and em.ID = em.maxID
where cm.ID = cm.MaxID
What about using LEFT join in your last two joins?
LEFT JOIN (
SELECT MAX([ID]) MaxId
FROM [OmadaReporting].[dbo].tblGroup_ComputedMember
GROUP BY ObjectID
) MostRecentCM ON MostRecentCM.MaxId = cm.Id
And then in Where clause filter values as:
WHERE MostRecentCM.MaxId IS NOT NULL
OR
MostRecentEM.MaxId IS NOT NULL

use field in sql join where clause

I am trying to write a crystal report using a sql statement because it runs much much faster. But I am having trouble with some of the linkings. I need to use the result of a link for criteria in subsequent links.
Ok, here is a sample of what my statement looks like:
(The lines marked with ** are the lines in question)
SELECT
Part.PartNum,
Cust.CustNum,
Cust.CustID,
YTD.Qty
FROM
(
SELECT
Pub.Part.PartNum,
Pub.Part.UserChar1 AS CustID
FROM
Pub.Part
) AS Part
LEFT OUTER JOIN (
SELECT
Pub.Customer.CustID,
Pub.Customer.CustNum,
Pub.Customer.Name
FROM
Pub.Customer
WHERE
Pub.Customer.CustID = '1038'
) AS Cust
ON Part.CustID = Cust.CustID
LEFT OUTER JOIN (
SELECT
Pub.OrderDtl.PartNum,
Sum(Pub.OrderDtl.OrderQty) AS Qty
FROM
Pub.OrderHed JOIN Pub.OrderDtl ON
Pub.OrderHed.OrderNum = Pub.OrderDtl.OrderNum
WHERE
**Pub.OrderHed.CustNum = Cust.CustNum AND**
**Pub.OrderDtl.PartNum = Part.PartNum AND**
YEAR(Pub.OrderHed.OrderDate)=YEAR(CURDATE())
GROUP BY
Pub.OrderDtl.PartNum
) AS YTD ON Part.PartNum = YTD.PartNum
Now, I get an error that says:
Part.PartNum cannot be found or is not specified for the query.
I get the same error for Cust.CustNum. Will you help me figure out what I am doing wrong? Thanks!
The problem is that you are using one of the aliases, inside of a sub-query which you cannot do. You will have to do something similar to this:
SELECT Part.PartNum,
Cust.CustNum,
Cust.CustID,
YTD.Qty
FROM
(
SELECT Pub.Part.PartNum,
Pub.Part.UserChar1 AS CustID
FROM Pub.Part
) AS Part
LEFT OUTER JOIN
(
SELECT Pub.Customer.CustID,
Pub.Customer.CustNum,
Pub.Customer.Name
FROM Pub.Customer
WHERE Pub.Customer.CustID = '1038'
) AS Cust
ON Part.CustID = Cust.CustID
LEFT OUTER JOIN
(
SELECT Pub.OrderDtl.PartNum,
Sum(Pub.OrderDtl.OrderQty) AS Qty,
Pub.OrderHed.CustNum
FROM Pub.OrderHed
JOIN Pub.OrderDtl
ON Pub.OrderHed.OrderNum = Pub.OrderDtl.OrderNum
WHERE YEAR(Pub.OrderHed.OrderDate)=YEAR(CURDATE())
GROUP BY Pub.OrderDtl.PartNum, Pub.OrderHed.CustNum
) AS YTD
ON Part.PartNum = YTD.PartNum
AND Cust.CustNum = YTD.CustNum
Looking at your query more, you can actually get rid of two of the subqueries:
SELECT Part.PartNum,
Cust.CustNum,
Cust.CustID,
YTD.Qty
FROM Pub.Part Part
LEFT OUTER JOIN Pub.Customer Cust
ON Part.CustID = Cust.CustID
AND Cust.CustID = '1038'
LEFT OUTER JOIN
(
SELECT d.PartNum,
Sum(d.OrderQty) AS Qty,
h.CustNum
FROM Pub.OrderHed h
JOIN Pub.OrderDtl d
ON h.OrderNum = d.OrderNum
WHERE YEAR(h.OrderDate)=YEAR(CURDATE())
GROUP BY d.PartNum
) AS YTD
ON Part.PartNum = YTD.PartNum
AND Cust.CustNum = YTD.CustNum
This is because you can't access a parent sub-query (cust, part) within another sub-query (YTD)
However, the solution is easy in your case, filter in the ON clause instead:
SELECT
Part.PartNum,
Cust.CustNum,
Cust.CustID,
YTD.Qty
FROM
(
SELECT
Pub.Part.PartNum,
Pub.Part.UserChar1 AS CustID
FROM
Pub.Part
) AS Part
LEFT OUTER JOIN (
SELECT
Pub.Customer.CustID,
Pub.Customer.CustNum,
Pub.Customer.Name
FROM
Pub.Customer
WHERE
Pub.Customer.CustID = '1038'
) AS Cust
ON Part.CustID = Cust.CustID
LEFT OUTER JOIN (
SELECT
Pub.OrderDtl.PartNum,
Sum(Pub.OrderDtl.OrderQty) AS Qty,
Pub.OrderHed.CustNum
FROM
Pub.OrderHed JOIN Pub.OrderDtl ON
Pub.OrderHed.OrderNum = Pub.OrderDtl.OrderNum
WHERE
YEAR(Pub.OrderHed.OrderDate)=YEAR(CURDATE())
GROUP BY
Pub.OrderDtl.PartNum
) AS YTD ON Part.PartNum = YTD.PartNum AND Cust.CustNum = YTD.CustNum

Problem joining tables in SQL

SELECT MID, FAD.FirstOpenedDate ,LCD.LastCloseDate
FROM mwMaster.dbo.Merchant M
JOIN (
SELECT MerchID, MIN(moddate) AS FirstOpenedDate
FROM mwMaster.dbo.MerchantStatusHistory
GROUP BY MerchID
) FAD ON FAD.MerchID = M.MerchID
LEFT JOIN (
SELECT MerchID, MAX(moddate) AS LastCloseDate
FROM mwMaster.dbo.MerchantStatusHistory
GROUP BY MerchID
) LCD ON LCD.MerchID = M.MerchID
JOIN (
SELECT merchid ,avg(Transactions) ,avg(Profit)
FROM mwMaster.dbo.ResidualSummary RS
WHERE RS.Date_Processed < LCD.LastCloseDate
GROUP BY Merchid
) R ON R.MerchID = M.MerchID
I am having trouble performing the following join. I have run into this problem before and used temp tables but would like to find out what I am doing wrong. Basically the line that is not working is the 3rd to last. The "< LCD.LastClostDate" says that it cannot be bound. Is it possible to use the value from LCD which I created in a nested query above (in that query I used the M table in a similar way but I didnt run into any issue)? I am thinking becasue the LCD table is dynamically created here it cannot be used in the nested query but this is just my guess.
Any ideas?
On a side note I have also seen people using a CROSS and OVER. Not to farmiliar with how this works but may be applicable here?
I think though haven't tested you can just change your JOIN to a CROSS APPLY in SQL 2005+
SELECT MID, FAD.FirstOpenedDate ,LCD.LastCloseDate
FROM mwMaster.dbo.Merchant M
JOIN (
SELECT MerchID, MIN(moddate) AS FirstOpenedDate
FROM mwMaster.dbo.MerchantStatusHistory
GROUP BY MerchID
) FAD ON FAD.MerchID = M.MerchID
LEFT JOIN (
SELECT MerchID, MAX(moddate) AS LastCloseDate
FROM mwMaster.dbo.MerchantStatusHistory
GROUP BY MerchID
) LCD ON LCD.MerchID = M.MerchID
CROSS APPLY(
SELECT merchid ,avg(Transactions) ,avg(Profit)
FROM mwMaster.dbo.ResidualSummary RS
WHERE RS.Date_Processed < LCD.LastCloseDate
GROUP BY Merchid
) R ON R.MerchID = M.MerchID
But it might be easier to use CTEs
WITH LCD AS (SELECT MerchID, MAX(moddate) AS LastCloseDate
FROM mwMaster.dbo.MerchantStatusHistory
GROUP BY MerchID),
R AS (
SELECT merchid ,avg(Transactions) ,avg(Profit)
FROM mwMaster.dbo.ResidualSummary RS
INNER JOIN LCD on
LCD.MERCHID = RS.MERCHID
WHERE RS.Date_Processed < LCD.LastCloseDate
GROUP BY Merchid
)
SELECT MID, FAD.FirstOpenedDate ,LCD.LastCloseDate
FROM mwMaster.dbo.Merchant M
JOIN (
SELECT MerchID, MIN(moddate) AS FirstOpenedDate
FROM mwMaster.dbo.MerchantStatusHistory
GROUP BY MerchID
) FAD ON FAD.MerchID = M.MerchID
LEFT JOIN LCD ON LCD.MerchID = M.MerchID
LEFT JOIN R ON R.MerchID = M.MerchID
I can't really test this without your data, but here's one way you could do it:
SELECT MID,
MIN(moddate) OVER (PARTITION BY MerchID) as FirstOpenedDate,
MAX(moddate) OVER (PARTITION BY MerchID) as LastCloseDate
FROM mwMaster.dbo.Merchant
HAVING DateProcessed < LastCloseDate