"Where IN" with multiple columns (SQL Server) - sql

I'm using SQL Server 2005.
The query would look like this
Select col1, col2, col3 from <subquery> where (col1,col2) in <subquery>
SQL Server doesn't seem to like that. Any way of implementing that that anyone knows of that doesn't involve converting to varchars or anything else messy?
This is the actual query.
SELECT *
FROM
(
SELECT NEWID() AS guid, UserID, RoleId, ClubId, 0 AS GroupRole
FROM dbo.Portal_UserRoles
UNION
SELECT NEWID() AS guid, dbo.Portal_UserGroups.UserId,
dbo.Portal_GroupRoles.RoleId, dbo.Portal_UserGroups.ClubId,
dbo.Portal_GroupRoles.GroupId AS GroupRole
FROM dbo.Portal_GroupRoles
INNER JOIN dbo.Portal_UserGroups
ON dbo.Portal_GroupRoles.GroupId = dbo.Portal_UserGroups.GroupId
) AS derivedtbl_1
WHERE (derivedtbl_1.RoleId,derivedtbl_1.ClubId) IN
(
SELECT Portal_GroupRoles.RoleId, Portal_ClubGroups.ClubId
FROM Portal_GroupRoles
INNER JOIN Portal_ClubGroups
ON Portal_GroupRoles.GroupId = Portal_ClubGroups.GroupId
)

The standard, classic way to do what you seek is an EXISTS clause:
SELECT *
FROM
(
SELECT NEWID() AS guid, UserID, RoleId, ClubId, 0 AS GroupRole
FROM dbo.Portal_UserRoles
UNION
SELECT NEWID() AS guid, dbo.Portal_UserGroups.UserId,
dbo.Portal_GroupRoles.RoleId, dbo.Portal_UserGroups.ClubId,
dbo.Portal_GroupRoles.GroupId AS GroupRole
FROM dbo.Portal_GroupRoles
INNER JOIN dbo.Portal_UserGroups
ON dbo.Portal_GroupRoles.GroupId = dbo.Portal_UserGroups.GroupId
) AS derivedtbl_1
WHERE EXISTS
(
SELECT Portal_GroupRoles.RoleId, Portal_ClubGroups.ClubId
FROM (Portal_GroupRoles
INNER JOIN Portal_ClubGroups
ON Portal_GroupRoles.GroupId = Portal_ClubGroups.GroupId) AS cgr
WHERE derivedtbl_1.RoleID = cgr.RoleId
AND derivedtbl_1.ClubId = cgr.ClubId
)
Be wary of splitting the two-column condition into two separate IN clauses; it does not give you the same answer (in general) as the applying the two-column condition in one EXISTS clause.

Do a join on the derived table instead of using the in
SELECT *
FROM
(
SELECT NEWID() AS guid, UserID, RoleId, ClubId, 0 AS GroupRole
FROM dbo.Portal_UserRoles
UNION
SELECT NEWID() AS guid, dbo.Portal_UserGroups.UserId,
dbo.Portal_GroupRoles.RoleId, dbo.Portal_UserGroups.ClubId,
dbo.Portal_GroupRoles.GroupId AS GroupRole
FROM dbo.Portal_GroupRoles
INNER JOIN dbo.Portal_UserGroups
ON dbo.Portal_GroupRoles.GroupId = dbo.Portal_UserGroups.GroupId
) AS derivedtbl_1
INNER JOIN
(
SELECT Portal_GroupRoles.RoleId, Portal_ClubGroups.ClubId
FROM Portal_GroupRoles
INNER JOIN Portal_ClubGroups
ON Portal_GroupRoles.GroupId = Portal_ClubGroups.GroupId
) derivedtbl_2
ON derivedtbl_1.RoleID = derivedtbl_2.RoleID
AND derivedtbl_1.ClubId = derivedtbl_2.ClubId

SELECT
/*
your selected fields, joins here
*/
WHERE -- (derivedtbl_1.RoleId,derivedtbl_1.ClubId) IN
EXISTS
(
-- actually you can change these two fields to * (asterisk ) or 1, whatever, even your name, what matters only is the testing of existence(see below)
SELECT Portal_GroupRoles.RoleId, Portal_ClubGroups.ClubId
FROM Portal_GroupRoles
INNER JOIN Portal_ClubGroups
ON Portal_GroupRoles.GroupId = Portal_ClubGroups.GroupId
-- here is your IN (the testing of existence):
WHERE Portal_GroupRoles.RoleId = derivedtbl_1.RoleId AND
AND derivedtbl_1.ClubId = derivedtbl_1.ClubId
)
alternatively:
SELECT
/*
your selected fields, joins here
*/
JOIN
(
SELECT Portal_GroupRoles.RoleId, Portal_ClubGroups.ClubId
FROM Portal_GroupRoles
INNER JOIN Portal_ClubGroups
ON Portal_GroupRoles.GroupId = Portal_ClubGroups.GroupId
) X
-- here is your IN:
ON X.RoleId = derivedtbl_1.RoleId
AND X.ClubId = derivedtbl_1.ClubId

Yours is useful syntax and is supported in standard SQL. It is not yet implemented in SQL Server but you may vote for its inclusion here.

You have to separate in two clauses
where col1 in (...) AND col2 in (...)
or you could refactor it a little bit
select * FROM (
SELECT NEWID() AS guid, UserID, RoleId, ClubId, 0 AS GroupRole FROM dbo.Portal_UserRoles
UNION
SELECT NEWID() AS guid, dbo.Portal_UserGroups.UserId, dbo.Portal_GroupRoles.RoleId, dbo.Portal_UserGroups.ClubId, dbo.Portal_GroupRoles.GroupId AS GroupRoles FROM dbo.Portal_GroupRoles INNER JOIN dbo.Portal_UserGroups ON dbo.Portal_GroupRoles.GroupId = dbo.Portal_UserGroups.GroupId)
AS derivedtbl_1, Portal_GroupRoles, Portal_ClubGroup
where derivedtbl_1.RoleId = Portal_GroupRoles.RoleId
and derivedtbl_1.ClubId = Portal_ClubGroups.ClubId
and Portal_GroupRoles.GroupId = Portal_ClubGroups.GroupId

Related

SQL split repeating rows caused by UNION

I am writing a query to look through and get two seperate averages based on where conditions.
I tried two select statetments but ended up with lots of duplicates.
Now I have a union which works pretty well, although I have my two fields in alternating rows instead of seperate columns.
Can anyone suggest a fix, sorry for the dodgy code!
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `cohortPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
teacherGroup = '9JS2/Cp'
AND tblTestScores.testDetailsID = 1
GROUP BY
skillName
UNION ALL
SELECT
tblSkillName.skillName,
tblTestScores.skillUID,
AVG(tblTestScores.percentage) AS `groupPercentage`
FROM
(
(
(
tblTestScores
INNER JOIN tblUsers ON tblUsers.email = tblTestScores.email
)
INNER JOIN tblTestDetails ON tblTestScores.testDetailsID = tblTestDetails.testDetailsID
)
INNER JOIN tblSkillName ON tblSkillName.skillUID = tblTestScores.skillUID
)
WHERE
tblTestScores.testDetailsID = 1
GROUP BY
skillName
ORDER BY
skillUID ASC

How to improve sql script performance

The following script is very slow when its run.
I have no idea how to improve the performance of the script.
Even with a view takes more than quite a lot minutes.
Any idea please share to me.
SELECT DISTINCT
( id )
FROM ( SELECT DISTINCT
ct.id AS id
FROM [Customer].[dbo].[Contact] ct
LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
WHERE hnci.customer_id IN (
SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218' )
UNION
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
UNION
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
UNION
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
UNION
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
) AS tbl
Wow, where to start...
--this distinct does nothing. Union is already distinct
--SELECT DISTINCT
-- ( id )
--FROM (
SELECT DISTINCT [Customer_ID] as ID
FROM [Transactions].[dbo].[Transaction_Header]
where actual_transaction_date > '20120218' )
UNION
SELECT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
-- not sure that you are getting the date range you want. Should these be >=
-- if you want everything that occurred on the 18th or after you want >= '2012-02-18 00:00:00.000'
-- if you want everything that occurred on the 19th or after you want >= '2012-02-19 00:00:00.000'
-- the way you have it now, you will get everything on the 18th unless it happened exactly at midnight
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
-- all of this does nothing because we already have every id in the contact table from the first query
-- UNION
-- SELECT
-- ( ct.id )
-- FROM [Customer].[dbo].[Contact] ct
-- INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
-- AND startconnection > '2012-02-17'
UNION
-- cleaned this up with isnull function and coalesce
SELECT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( isnull(opt_out_Mail,0) <> 1
OR isnull(opt_out_email,0) <> 1
OR isnull(opt_out_SMS,0) <> 1
)
AND coalesce(address_1 , email, mobile) IS NOT NULL
UNION
SELECT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
-- ) AS tbl
Where exists is generally faster than in as well.
Or conditions are generally slower as well, use more union statements instead.
And learn to use left joins correctly. If you have a where condition (other than where id is null) on the table on teh right side of a left join, it will convert to an inner join. If this is not what you want, then your code is currently giving you an incorrect result set.
See http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN for an explanation of how to fix.
As stated in a comment optimize one at a time. See which one takes the longest and focus on that one.
union will remove duplicates so you don't need the distinct on the individual queries
On you first I would try this:
The left join is killed by the WHERE hnci.customer_id IN so you might as well have a join.
The sub-query is not efficient as cannot use an index on the IN.
The query optimizer does not know what in ( select .. ) will return so it cannot optimize use of indexes.
SELECT ct.id AS id
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci
ON ct.id = hnci.contact_id
JOIN [Transactions].[dbo].[Transaction_Header] th
on hnci.customer_id = th.[Customer_ID]
and th.actual_transaction_date > '20120218'
On that second join the query optimizer has the opportunity of which condition to apply first. Let say [Customer].[dbo].[Customer_ids].[customer_id] and [Transactions].[dbo].[Transaction_Header] each have indexes. The query optimizer has the option to apply that before [Transactions].[dbo].[Transaction_Header].[actual_transaction_date].
If [actual_transaction_date] is not indexed then for sure it would do the other ID join first.
With your in ( select ... ) the query optimizer has no option but to apply the actual_transaction_date > '20120218' first. OK some times query optimizer is smart enough to use an index inside the in outside the in but why make it hard for the query optimizer. I have found the query optimizer make better decisions if you make the decisions easier.
A join on a sub-query has the same problem. You take options away from the query optimizer. Give the query optimizer room to breathe.
try this, temptable should help you:
IF OBJECT_ID('Tempdb..#Temp1') IS NOT NULL
DROP TABLE #Temp1
--Low perfomance because of using "WHERE hnci.customer_id IN ( .... ) " - loop join must be
--and this "where" condition will apply to two tables after left join,
--so result will be same as with two inner joints but with bad perfomance
--SELECT DISTINCT
-- ct.id AS id
--INTO #temp1
--FROM [Customer].[dbo].[Contact] ct
-- LEFT JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
--WHERE hnci.customer_id IN (
-- SELECT DISTINCT
-- ( [Customer_ID] )
-- FROM [Transactions].[dbo].[Transaction_Header]
-- WHERE actual_transaction_date > '20120218' )
--------------------------------------------------------------------------------
--this will give the same result but with better perfomance then previouse one
--------------------------------------------------------------------------------
SELECT DISTINCT
ct.id AS id
INTO #temp1
FROM [Customer].[dbo].[Contact] ct
JOIN [Customer].[dbo].[Customer_ids] hnci ON ct.id = hnci.contact_id
JOIN ( SELECT DISTINCT
( [Customer_ID] )
FROM [Transactions].[dbo].[Transaction_Header]
WHERE actual_transaction_date > '20120218'
) T ON hnci.customer_id = T.[Customer_ID]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
INSERT INTO #temp1
( id
)
SELECT DISTINCT
contact_id AS id
FROM [Customer].[dbo].[Restaurant_Attendance]
WHERE ( created > '2012-02-18 00:00:00.000'
OR modified > '2012-02-18 00:00:00.000'
)
AND ( [Fifth_Floor_London] = 1
OR [Fourth_Floor_Leeds] = 1
OR [Second_Floor_Bristol] = 1
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( ct.id )
FROM [Customer].[dbo].[Contact] ct
INNER JOIN [Customer].[dbo].[Wifinity_Devices] wfd ON ct.wifinity_uniqueID = wfd.[CustomerUniqueID]
AND startconnection > '2012-02-17'
INSERT INTO #temp1
( id
)
SELECT DISTINCT
comdt.id AS id
FROM [Customer].[dbo].[Complete_dataset] comdt
LEFT JOIN [Customer].[dbo].[Aggregate_Spend_Counts] agsc ON comdt.id = agsc.contact_id
WHERE agsc.contact_id IS NULL
AND ( opt_out_Mail <> 1
OR opt_out_email <> 1
OR opt_out_SMS <> 1
OR opt_out_Mail IS NULL
OR opt_out_email IS NULL
OR opt_out_SMS IS NULL
)
AND ( address_1 IS NOT NULL
OR email IS NOT NULL
OR mobile IS NOT NULL
)
INSERT INTO #temp1
( id
)
SELECT DISTINCT
( contact_id ) AS id
FROM [Customer].[dbo].[VIP_Card_Holders]
WHERE VIP_Card_number IS NOT NULL
SELECT DISTINCT
id
FROM #temp1 AS T

SQL WHERE In a many-to-many or many-to-many empty

Does anyone know a way to simplify this WHERE expression?
WHERE (
(#UserSpecialtyID in
(
SELECT CharacteristicSpecialties_Id
FROM ModalityVariantSpecialty
WHERE ModalityVariants_Id = ModalityVariants.Id
)
)
OR
NOT EXISTS
(
SELECT CharacteristicSpecialties_Id
FROM ModalityVariantSpecialty
WHERE ModalityVariants_Id = ModalityVariants.Id
)
)
Something like this should probably work but Im not exactly clear on the relationships for your tables. I could probably give a better example if you could explain the relationships.
SELECT
*
FROM MadalityVariants mv
LEFT JOIN ModalityVariantSpecialty mvs on mvs.ModalityVariants_ID = mv.ID
WHERE
#UserSpecialtyID = mvs.CharacteristicSpecialties_ID
OR
mvs.CharacteristicSpecialties_ID is null
WHERE (
#UserSpecialtyID in
(
SELECT COALESCE(CharacteristicSpecialties_Id, A.A)
FROM (SELECT #UserSpecialtyID A) A LEFT JOIN ModalityVariantSpecialty
ON ModalityVariants_Id = ModalityVariants.Id
)
)
this works well if CharacteristicSpecialties_Id is a NON NULLABLE field.
I am assuming that this is a WHERE clause of a SELECT on the table ModalityVariants
Would this work (The SQL is not tested)?
SELECT *
FROM ModalityVariants
LEFT OUTER JOIN ModalityVariantSpeciality
ON ModalityVariants.Id = ModalityVariants_ID
WHERE CharacteristicSpecialities_Id = #UserSpecialityID or
CharacteristicSpecialities_Id is NULL
Here's my attempt:
WHERE #UserSpecialtyID = COALESCE
(
SELECT TOP 1 CharacteristicSpecialties_Id
FROM ModalityVariantSpecialty
WHERE ModalityVariants_Id = ModalityVariants.Id
ORDER BY
CASE WHEN CharacteristicSpecialties_Id = UserSpecialtyID THEN 1
ELSE 2 END ASC
), #UserSpecialtyID)
If both ModalityVariants_Id and UserSpecialtyID match, the subquery returns CharacteristicSpecialties_Id, and the where succeeds
If only ModalityVariants_Id matches, the subquery returns a different ID, and the where fails
If neither matches, the subquery returns NULL, the COALESCE returns #UserSpecialtyID, and the where succeeds
Probably clearest is a variety of John Hartsock's answer, with a subquery to ensure the left join doesn't add any rows.
select *
from ModalityVariants mv
left join
(
select distinct ModalityVariants_ID
, CharacteristicSpecialties_ID
from ModalityVariantSpecialty
) as mvs
on mvs.ModalityVariants_ID = mv.ID
where #UserSpecialtyID = mvs.CharacteristicSpecialties_ID
OR
mvs.CharacteristicSpecialties_ID is null
I'll vote for John's answer :)

SQL show records that don't exist in my table variable

I have a table variable that holds orderID, UnitID and OrderServiceId (it is already populated via a query with insert statement).
I then have a query under this that returns 15 columns which also include the OrderId, UnitId, OrderServiceId
I need to only return the rows from this query where the same combination of OrderId, UnitId, and OrderServiceId are not in the table variable.
You can use NOT EXISTS. e.g.
FROM YourQuery q
WHERE NOT EXISTS
(
SELECT * FROM #TableVar t
WHERE t.OrderId = q.OrderId
and t.UnitId = q.UnitId
and t.OrderServiceId=q.OrderServiceId
)
select q.*
from (
MyQuery
) q
left outer join MyTableVariable t on q.ORDERID = t.ORDERID
and q.UNITID= t.UNITID
and q.ORDERSERVICESID = t.ORDERSERVICESID
where t.ORDERID is null
You can use EXCEPT | INTERSECT operators for this (link).
Example:
(select 3,4,1
union all
select 2,4,1)
intersect
(select 1,2,9
union all
select 3,4,1)

MySQL/SQL - When are the results of a sub-query avaliable?

Suppose I have this query
SELECT * FROM (
SELECT * FROM table_a
WHERE id > 10 )
AS a_results LEFT JOIN
(SELECT * from table_b
WHERE id IN
(SElECT id FROM a_results)
ON (a_results.id = b_results.id)
I would get the error "a_results is not a table". Anywhere I could use the re-use the results of the subquery?
Edit: It has been noted that this query doesn't make sense...it doesn't, yes. This is just to illustrate the question which I am asking; the 'real' query actually looks something like this:
SELECT SQL_CALC_FOUND_ROWS * FROM
( SELECT wp_pod_tbl_hotel . *
FROM wp_pod_tbl_hotel, wp_pod_rel, wp_pod
WHERE wp_pod_rel.field_id =12
AND wp_pod_rel.tbl_row_id =1
AND wp_pod.tbl_row_id = wp_pod_tbl_hotel.id
AND wp_pod_rel.pod_id = wp_pod.id
) as
found_hotel LEFT JOIN (
SELECT COUNT(*) as review_count, avg( (
location_rating + staff_performance_rating + condition_rating + room_comfort_rating + food_rating + value_rating
) /6 ) AS average_score, hotelid
FROM (
SELECT r. * , wp_pod_rel.tbl_row_id AS hotelid
FROM wp_pod_tbl_review r, wp_pod_rel, wp_pod
WHERE wp_pod_rel.field_id =11
AND wp_pod_rel.pod_id = wp_pod.id
AND r.id = wp_pod.tbl_row_id
AND wp_pod_rel.tbl_row_id
IN (
SELECT wp_pod_tbl_hotel .id
FROM wp_pod_tbl_hotel, wp_pod_rel, wp_pod
WHERE wp_pod_rel.field_id =12
AND wp_pod_rel.tbl_row_id =1
AND wp_pod.tbl_row_id = wp_pod_tbl_hotel.id
AND wp_pod_rel.pod_id = wp_pod.id
)
) AS hotel_reviews
GROUP BY hotel_reviews.hotelid
ORDER BY average_score DESC
AS sorted_hotel ON (id = sorted_hotel.hotelid)
As you can see, the sub-query which makes up the found_query table is repeated elsewhere downward as another sub-query, so I was hoping to re-use the results
You can not use a sub-query like this.
I'm not sure I understand your query, but wouldn't that be sufficient?
SELECT * FROM table_a a
LEFT JOIN table_b b ON ( b.id = a.id )
WHERE a.id > 10
It would return all rows from table_a where id > 10 and LEFT JOIN rows from table_b where id matches.