I'm stuck on a query with a join. The client's site is running mysql4, so a subquery isn't an option. My attempts to rewrite using a join aren't going too well.
I need to select all of the contractors listed in the contractors table who are not in the contractors2label table with a given label ID & county ID. Yet, they might be listed in
contractors2label with other label and county IDs.
Table: contractors
cID (primary, autonumber)
company (varchar)
...etc...
Table: contractors2label
cID
labelID
countyID
psID
This query with a subquery works:
SELECT company, contractors.cID
FROM contractors
WHERE contractors.complete = 1
AND contractors.archived = 0
AND contractors.cID NOT IN (
SELECT contractors2label.cID FROM contractors2label
WHERE labelID <> 1 AND countyID <> 1
)
I thought this query with a join would be the equivalent, but it returns no results. A manual scan of the data shows I should get 34 rows, which is what the subquery above returns.
SELECT company, contractors.cID
FROM contractors
LEFT OUTER JOIN contractors2label ON contractors.cID = contractors2label.cID
WHERE contractors.complete = 1
AND contractors.archived = 0
AND contractors2label.labelID <> 1
AND contractors2label.countyID <> 1
AND contractors2label.cID IS NULL
When doing a LEFT JOIN, you need to put all conditions of the JOIN into the ON clause.
In your example you get NULL for left joined columns that do not exist, but you then compare them to values again (<> 1) which does not work.
SELECT c.company, c.cID
FROM contractors c
LEFT JOIN contractors2label c2
ON ( c2.cID = c.cID AND c2.labelID <> 1 AND c2.countyID <> 1 )
WHERE c.complete = 1
AND c.archived = 0
AND c2.cID IS NULL
BTW: Using aliases (like c in my example) makes reading and writing your queries easier.
When you restrict on a where clause using the columns in a table that's LEFT joined, you are effectively removing the LEFT OUTER part of the join, because you're filtering on columns that have to be there. Try this instead:
SELECT company, contractors.cID
FROM contractors
LEFT OUTER JOIN contractors2label
ON (contractors.cID = contractors2label.cID
AND contractors2label.labelID <> 1
AND contractors2label.countyID <> 1)
WHERE contractors.complete = 1
AND contractors.archived = 0
AND contractors2label.cID IS NULL
This does the restriction as part of the join, so nulls can still be used in the larger query.
Related
I have a query that I have worked on and only one section has caused me fits. I am trying to create a column within the query based on the values of two tables. I have tried CASE WHEN and it functions, but due to the non-unique values involved, the row count in the query between the original query without increases. For example, this is the case when that I have written:
Select r.Id,
r.RequiredOn AT TIME ZONE 'UTC' AT TIME ZONE 'Central Standard Time'
as RequiredDate,
Concat(vs.Salutation, ' ',vs.FirstName, ' ', vs.LastName) as Name,
oo.Name as RequestingOrganization,
o.Name as Location,
Case
When r.IntendedOutcome = '1' Then 'T'
When r.IntendedOutcome = '2' Then 'R'
End as RequestType,
etr.TypeRequested,
Case
When etr.Identifier is not null then etr.Identifier
When etr.Identifier is null then ' '
End as Identifier,
f.OfferedOn,
f.OfferResponse,
r.DestinationCountryCodes,
o.Id,
CASE
WHEN o.Id = oir.OrganizationId AND oir.OrganizationRoleId =
'de51c814-f86d-49c9-941b-999a98be4894'
THEN 1
ELSE NULL
END AS Bk1
From [Request] r
Left Join Recovered etr
on etr.DistributionRequestId = r.Id
Left Join [Offer] f
on f.Id = etr.Id
Left Join [dbo].Contact vs
on vs.Id = r.SId
Left Join [dbo].Organization o
on o.Id = r.SLocationId or o.Id = r.RLocationId
Left Join [dbo].Organization oo
on oo.Id = r.RequestingOrganizationId
Left Join dbo.OrganizationInRole oir
on oir.OrganizationId = o.Id
Where f.Response = 'Accepted' or f.Response is NULL
The picture shows that the OrganizationId is not unique with this table and therefore when an OrganizationId is matched and the OrganizationRoleId is found, it is bringing all of the OrganizationRoleId's over in the query and adding to it rather than just seeing that it has the particular Role ID and adding to the one row I need it to.
The Organization Role column in non-unique and every organization can multiple roles(sometimes 4-5). I need that if the OrganizationId is A and the matching OrganizationId in Table 2 has the identifier in the OrganizationRole column, then add a 1.
The Organization table (Table 2) has a OrganizationId column and a OrganizationRole column. The OrganizationId is non-unique as the OrgnanizationId could be used in 5 consecutive rows since that organization has 5 Roles.
The results that I am getting are that the query is pulling all of the Roles from Organizations that do match that table. It basically added 33% more rows to the query versus the original.
When you say
... if the OrganizationId is A and the matching OrganizationId in Table 2 has the identifier in the OrganizationRole column, then add a 1.
Are you wanting to create a count of the number if times this condition is true? If so, you need to wrap your CASE in an aggregate function and group on the other rows.
Alternatively, as Stu suggests in the comments, you could pre-aggregate the OrganizationInRole table, filtering for the role you are actually interested in; something like
SELECT r.Id,
...
oir.RoleCount AS Bk1
FROM [Request] r
...
LEFT JOIN [dbo].Organization o
ON o.Id = r.SLocationId or o.Id = r.RLocationId
...
LEFT JOIN (
SELECT OrganizationId, COUNT(*) AS RoleCount
FROM dbo.OrganizationInRole
WHERE OrganizationRoleId = 'de51c814-f86d-49c9-941b-999a98be4894'
GROUP BY OrganizationId) AS oir ON oir.OrganizationId = o.Id
...
You can do this for any other table which has multiple related rows, reducing them to a single row to join to and removing the need for aggregation and grouping in the main query.
I have a legacy system that has a sales table and a customer table, CMS and CUST respectively. I need to query for the shipped to address based on different criteria. The customer table treats each address as its own customer. So if I have a billing address, then a shipping address, those will both be different CUSTNUM's. The CMS table has columns CUSTNUM and SHIPNUM. If the sales order uses the billing address as the shipping address, SHIPNUM = 0. If those 2 address are different, SHIPNUM = a different customer number than CUSTNUM. I'm trying to write a query that joins CUST to CMS based on the case of SHIPNUM being > 0 or not. My original query just used CUSTNUM, and ignored the SHIPNUM. My new query is syntactically correct and executes, but the row count returned is 2860 vs 3590 for the old query. The old join statement is just the commented out line :ON CMS.CUSTNUM = CUST.CUSTNUM.
from
KGI_LOTNOS as LOT
INNER JOIN CMS
ON LOT.ORDERNO = CMS.ORDERNO
JOIN CUST
ON CUST.CUSTNUM =
CASE
WHEN CMS.SHIPNUM > 0
THEN CMS.SHIPNUM
Else CMS.CUSTNUM
END
-- ON CMS.CUSTNUM = CUST.CUSTNUM
INNER JOIN COUNTRY as C
ON CUST.COUNTRY = C.COUNTRY
Here is an example from the CMS table;
CUSTNUM SHIPNUM ORDERNO
41863 77394 828509 <--Different billing and shipping address
43242 69291 776888 <--Different billing and shipping address
2356 0 765022 <--Same billing and shipping address
Any thoughts on how to make this work?
PS Here is the original query in its entirety.
select
CUST.CUSTNUM as Customer,
CMS.CUSTNUM,
CMS.SHIPNUM,
CUST.CTYPE,
CMS.ORDERNO,
CMS.ODR_DATE,
LTRIM(RTRIM(CUST.FIRSTNAME)) as First,
LTRIM(RTRIM(CUST.LASTNAME)) as Last,
LTRIM(RTRIM(CUST.COMPANY)) as Company,
LTRIM(RTRIM(CUST.PHONE)) as Phone,
LTRIM(RTRIM(CUST.EMAIL)) as Email,
LTRIM(RTRIM(CUST.ADDR)) as ADDR1,
LTRIM(RTRIM(CUST.ADDR2)) as ADDR2,
LTRIM(RTRIM(CUST.ADDR3)) as ADDR3,
LTRIM(RTRIM(CUST.CITY)) as City,
LTRIM(RTRIM(CUST.State)) as State,
LTRIM(RTRIM(CUST.ZIPCODE)) as Zip,
LTRIM(RTRIM(C.NAME)) as Country,
LOT.ITEMNO,
LOT.LOTNO,
COUNT(LOT.ITEMNO) as Quantity
from
KGI_LOTNOS as LOT
INNER JOIN CMS
ON LOT.ORDERNO = CMS.ORDERNO
LEFT JOIN CUST
ON CMS.CUSTNUM = CUST.CUSTNUM
INNER JOIN COUNTRY as C
ON CUST.COUNTRY = C.COUNTRY
where
(
CUST.CTYPE IN ('P','W','Z')
)
AND
(
LOT.LOTNO IN ('1000001','20001','300001')
)
GROUP BY
CMS.ORDERNO,
CUST.CUSTNUM,
CMS.CUSTNUM,
CMS.SHIPNUM,
CUST.CTYPE,
CUST.FIRSTNAME,
CMS.ODR_DATE,
CUST.LASTNAME,
CUST.COMPANY,
CUST.PHONE,
CUST.EMAIL,
CUST.ADDR,
CUST.ADDR2,
CUST.ADDR3,
LOT.ITEMNO,
CUST.CITY,
CUST.STATE,
CUST.ZIPCODE,
C.NAME,
LOT.LOTNO
ORDER BY
Customer,
CMS.ORDERNO,
LOT.ITEMNO,
LOT.LOTNO
If you use INNER JOIN you have risk to exclude raws which have no reference in another table. This could be caused by any of 2 another joins in your expression - comment them and try again. If you still receive less records you should check consistency of your data - one table has values which not correspond to values in another table.
BTW, I don't like CASE in JOIN expression simply because it looks ugly. What do you thinK about this expression which seemed to do the job too:
LEFT JOIN CUST
ON CUST.CUSTNUM = COALESCE(NULLIF(CMS.SHIPNUM, 0), CMS.CUSTNUM)
You could use a CTE like this.
WITH cte (ORDERNO, SHIPNUM) AS
(
SELECT ORDERNUM, SHIPNUM = CASE
WHEN CMS.SHIPNUM > 0
FROM CMS
Fewer records join using your altered criteria, there are some CMS.SHIPNUM values that don't have matching CUSTNUM in the CUST table.
To find the problematic entries change from INNER to OUTER join and add WHERE criteria, something like:
LEFT JOIN CUST
ON CUST.CUSTNUM = CASE WHEN CMS.SHIPNUM > 0 THEN CMS.SHIPNUM
ELSE CMS.CUSTNUM
END
WHERE CUST.CUSTNUM IS NULL
AND CMS.SHIPNUM > 0
Edit: You'll have to remove the INNER JOIN to COUNTRY to see the unmatched from your updated JOIN since it joins on a field from the customer table, and make sure to have the SHIPNUM field in your SELECT.
your query looks correct but not sure why it is not working, try left joinCUST table twiceone on shipping and the other on billing and then write the case statement for each customer column.
select
LTRIM(RTRIM(case when CMS.SHIPNUM > 0 THEN CUST.FIRSTNAME else CUST_BILL.FIRSTNAME end )) as First,
from
KGI_LOTNOS as LOT
INNER JOIN CMS
ON LOT.ORDERNO = CMS.ORDERNO
left JOIN CUST CUST
ON CUST.CUSTNUM = CMS.SHIPNUM
left JOIN CUST CUST_BILL
ON CMS.CUSTNUM = CUST_BILL.CUSTNUM
INNER JOIN COUNTRY as C
ON CUST.COUNTRY = C.COUNTRY
if it still outputs less rows then something else is wrong
I want to pull back results from one table that match ALL specified values where the specified values are in another table. I can do it like this:
SELECT * FROM Contacts
WHERE
EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = '8C62E5DE-00FC-4994-8127-000B02E10DA5')
AND EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = 'D2E90AA0-AC93-4406-AF93-0020009A34BA')
AND EXISTS etc...
However that falls over when I get up to about 40 EXISTS clauses. The error message is "The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query."
The gist of this is to
Select all contacts with any GUID from the IN statement
Use a DISTINCT COUNT to get a count for each contactid on matching GUID's
Use the HAVING to retain only those contacts that equal the amount of matching GUID's you've put into the IN statement
SQL Statement
SELECT *
FROM dbo.Contacts c
INNER JOIN (
SELECT c.ID
FROM dbo.Contacts c
INNER JOIN dbo.ContactClassifications cc ON c.ID = cc.ContactID
WHERE cc.ClassificationID IN ('..', '..', 38 other GUIDS)
GROUP BY
c.ID
HAVING COUNT(DISTINCT cc.ClassificationID) = 40
) cc ON cc.ID = c.ID
Test script at data.stackexchange
One solution is to demand that no classification exists without a matching contact. That's a double negation:
select *
from contacts c
where not exists
(
select *
from ContactClassifications cc
where not exists
(
select *
from ContactClassifications cc2
where cc2.ContactID = c.ID
and cc2.ClassificationID = cc.ClassificationID
)
)
This type of problem is known as relational division.
SELECT c.*
FROM Contacts c
INNER JOIN
(cc.ContactID, COUNT(DISTINCT cc.ClassificationID) as num_class
FROM ContactClassifications
WHERE ClassificationID IN (....)
GROUP BY cc.ContactID
) b ON c.ID = b.ContactID
WHERE b.num_class = [number of distinct values - how many different values you put in "IN"]
If you run SQLServer 2005 and higher, you can do pretty much the same with CROSS APPLY, supposedly more efficiently
Please go thourgh Attached Image where i descirbed my scenario:
I want SQL Join query.
Have a look at something like
SELECT *
FROM Orders o
WHERE EXISTS (
SELECT 1
FROM OrderBooks ob INNER JOIN
Books b ON ob.BookID = b.BookID
WHERE o.OrderID = ob.OrderID
AND b.IsBook = #IsBook
)
The query will return all orders based on the given criteria.
So, what it does is, when #IsBook = 1 it will return all Orders where there exists 1 or more entries linked to this order that are Books. And if #IsBook = 0 it will return all Orders where there exists 1 or more entries linked to this order that are not Books.
Inner join is a method that is used to combine two or more tables together on base of common field from both tables. the both keys must be of same type and of length in regardless of name.
here is an example,
Table1
id Name Sex
1 Akash Male
2 Kedar Male
similarly another table
Table2
id Address Number
1 Nadipur 18281794
2 Pokhara 54689712
Now we can perform inner join operation using the following Sql statements
select A.id, A.Name, B.Address, B.Number from Table1 A
INNER JOIN Table2 B
ON A.id = B.id
Now the above query gives one to one relation details.
I have a case where I wanna choose any database entry that have an invalid Country, Region, or Area ID, by invalid, I mean an ID for a country or region or area that no longer exists in my tables, I have four tables: Properties, Countries, Regions, Areas.
I was thinking to do it like this:
SELECT * FROM Properties WHERE
Country_ID NOT IN
(
SELECT CountryID FROM Countries
)
OR
RegionID NOT IN
(
SELECT RegionID FROM Regions
)
OR
AreaID NOT IN
(
SELECT AreaID FROM Areas
)
Now, is my query right? and what do you suggest that i can do and achieve the same result with better performance?!
Your query in fact is optimal.
LEFT JOIN's proposed by others are worse, as they select ALL values and then filter them out.
Most probably your subquery will be optimized to this:
SELECT *
FROM Properties p
WHERE NOT EXISTS
(
SELECT 1
FROM Countries i
WHERE i.CountryID = p.CountryID
)
OR
NOT EXISTS
(
SELECT 1
FROM Regions i
WHERE i.RegionID = p.RegionID
)
OR
NOT EXISTS
(
SELECT 1
FROM Areas i
WHERE i.AreaID = p.AreaID
)
, which you should use.
This query selects at most 1 row from each table, and jumps to the next iteration right as it finds this row (i. e. if it does not find a Country for a given Property, it will not even bother checking for a Region).
Again, SQL Server is smart enough to build the same plan for this query and your original one.
Update:
Tested on 512K rows in each table.
All corresponding ID's in dimension tables are CLUSTERED PRIMARY KEY's, all measure fields in Properties are indexed.
For each row in Property, PropertyID = CountryID = RegionID = AreaID, no actual missing rows (worst case in terms of execution time).
NOT EXISTS 00:11 (11 seconds)
LEFT JOIN 01:08 (68 seconds)
You could rewrite it differently as follows:
SELECT p.*
FROM Properties p
LEFT JOIN Countries c ON p.Country_ID = c.CountryID
LEFT JOIN Regions r on p.RegionID = r.RegionID
LEFT JOIN Areas a on p.AreaID = a.AreaID
WHERE c.CountryID IS NULL
OR r.RegionID IS NULL
OR a.AreaID IS NULL
Test the performance difference (if there is any - there should be as NOT IN is a nasty search, especially over a lot of items as it HAS to test every single one).
You can also make this faster by indexing the IDS being searched - in each master table (Country, Region, Area) they should be clustered primary keys.
Since this seems to be cleanup sql, this should be ok. But how about using foreign keys so that it does not bother you next time around?
Well, you could try things like UNION (instead of OR) - but I expect that the optimizer is already doing the best it can given the information available:
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
UNION
SELECT * FROM Properties
WHERE NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
Subqueries in the conditions can be quite inefficient. Instead you can do left joins against the related tables. Where there are no matching record you get a null value. You can use this in the condition to select only the records where there is a matching record missing:
select p.*
from Properties p
left join Countries c on c.CountryID = p.Country_ID
left join Regions r on r.RegionID = p.RegionID
left join Areas a on a.AreaID = p.AreaID
where c.CountryID is null or r.RegionID is null or a.AreaID is null
If you're not grabbing the row data from countries/regions/areas you can try using "exists":
SELECT Properties.*
FROM Properties
WHERE Properties.CountryID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Countries WHERE Countries.CountryID = Properties.CountryID)
OR Properties.RegionID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Regions WHERE Regions.RegionID = Properties.RegionID)
OR Properties.AreaID IS NOT NULL AND NOT EXISTS (SELECT 1 FROM Areas WHERE Areas.AreaID = Properties.AreaID)
This will typically hint to use the pkey indices of countries et al for the existence check... but whether that is an improvement depends on your data stats, you simply have to plug it into query analyzer and try it.