Equality of "select ... where in" and joins - sql

Suppose I have a table1 like this:
id | itemcode
-------------
1 | c1
2 | c2
...
And a table2 like this:
item | name
-----------
c1 | acme
c2 | foo
...
Would the following two queries return the same result set under every condition?
SELECT id, itemcode
FROM table1
WHERE itemcode IN (SELECT DISTINCT item
FROM table2
WHERE name [some arbitrary test])
SELECT id, itemcode
FROM table1
JOIN (SELECT DISTINCT item
FROM table2
WHERE name [some arbitrary test]) items
ON table1.itemcode = items.item
Unless I'm really missing something stupid, I'd say yes. But I've done two queries which boil down to this form and I am getting different results. There are some nested queries using WHERE IN, but for the last step I've noticed a JOIN is much faster. The nested queries are all entirely isolated so I don't believe they are the problem, so I just want to eliminate the possibility that I've got a misconception regarding the above.
Thanks for any insights.
EDIT
The two original queries:
SELECT imitm, imlitm, imglpt
FROM jdedata.F4101
WHERE imitm IN
(SELECT DISTINCT ivitm AS itemno
FROM jdedata.F4104
WHERE ivcitm IN
(SELECT DISTINCT ivcitm AS legacycode
FROM jdedata.F4104
WHERE ivitm IN
(SELECT DISTINCT tritm
FROM trigdata.F4101_TRIG)
)
)
SELECT orig.imitm, orig.imlitm, orig.imglpt
FROM jdedata.F4101 orig
JOIN
(SELECT DISTINCT ivitm AS itemno
FROM jdedata.F4104
WHERE ivcitm IN
(SELECT DISTINCT ivcitm AS legacycode
FROM jdedata.F4104
WHERE ivitm IN
(SELECT DISTINCT tritm
FROM trigdata.F4101_TRIG))) itemns
ON orig.imitm = itemns.itemno
EDIT 2
Although I still don't understand why the queries returned different results, it would seem our logic was flawed from the beginning since we were using the wrong columns in some parts. Mind that I'm not saying I made a mistake interpreting the queries as written above or had some typo, we just needed to select on some different stuff.
Normally I don't rest until I get to the bottom of things like these, but I'm very tired and am entering my first vacation since January that spans more than one day, so I can't really be bothered searching further right now. I'm sure the tips given here will come in handy later. Upvotes have been distributed for all the help and I've accepted Ypercube's answer, mostly because his comments have led me the furthest. But thanks all round! If I do find out more later, I'll try to remember pinging back in.

Since table2.item is not nullable, the 2 versions are equivalent. You can remove the distinct from the IN version, it's not needed. You can check these 3 versions and their execution plans:
SELECT id, itemcode FROM table1 WHERE itemcode IN
( SELECT item FROM table2 WHERE name [some arbitrary test] )
SELECT id, itemcode FROM table1 JOIN
( SELECT DISTINCT item FROM table2 WHERE name [some arbitrary test] )
items ON table1.itemcode = items.item
SELECT id, itemcode FROM table1 WHERE EXISTS
( SELECT * FROM table2 WHERE table1.itemcode = table2.item
AND (name [some arbitrary test]) )

Ideally I would want to see the differences between the result sets.
- Are you getting duplication of records
- Is one set always a sub-set of the other
- Does one set have both 'additional' and 'missing' records in comparison to the other?
That said, the logic should be equivilent. My best guess would be that you have some empty string entries in there; because Oracle's version of a NULL CHAR/VARCHAR is just an empty string. This can give very funky results if you're not prepared for it.

Both queries perform a semijoin i.e. no attributes from table2 appear in the topmost SELECT (the resultset).
To my eye, your first query is easiest to identify as a semijoin, EXISTS even more so. On the other hand, an optimizer would no doubt see it differently ;)

You can also try to do a direct join to the second table
SELECT DISTINCT id, itemcode
FROM table1
INNER JOIN table2 ON table1.itemcode = table2.item
WHERE name [some arbitrary test] )
You don't need distinct if item is primary key or unique
Exists and Inner Join should have the same execution speed, while IN is more expensive.

I'd look for some data type conversion in there.
create table t_vc (val varchar2(6));
create table t_c (val char(6));
insert into t_vc values ('12345');
insert into t_vc values ('12345 ');
insert into t_c values ('12345');
insert into t_c values ('12345');
select t_c.val||':'
from t_c
where val in (select distinct val from t_vc);
select c.val||':'
from t_vc v join (select distinct val from t_c) c on v.val=c.val;

Related

Comparing two sum function in where clause

I want to check that an amount of likes the users received in all their personal pictures is at least twice as large as the number of likes received in the group pictures in which they are tagged.
In case the user is not tagged in any group photo but is tagged in a personal picture that has received at least one like, it will be returned.
My Question is:
How can I make a comparison between 2 sum functions
Where one result of the sum is returned in the nested query and compared with the external query.
Can I set an auxiliary variable to enter the sum value in it and compare it?
Thanks for the helpers:)
Select distinct UIP.userID
From tblUserInPersonalPic UIP
where **sum(UIP.numOfLikes) over (Partition by UIP.userID)*0.5** >
(Select distinct U.userID, sum(P.numOfLikes) over (Partition by U.userID)
From tblgroupPictures P left outer join
tblUserInGroupPic U On P.picNum=U.picNum
group by U.userID,P.numOfLikes,P.picNum)
It's kinda hard to know for sure, and of course I can't test my answer,
but I think you can do it with a couple of left joins, group by and having:
SELECT Personal.UserId
FROM tblUserInPersonalPic Personal
LEFT JOIN tblUserInGroupPic UserInGroup ON Personal.userID = UserInGroup.UesrId
LEFT JOIM tblgroupPictures GroupPictures ON UserInGroup.picNum = GroupPictures.picNum
GROUP BY Personal.userID
HAVING SUM(GroupPictures.numOfLikes) * 2 < SUM(Personal.numOfLikes)
Please note: When posting sql questions it's always best to provide sample data as DDL + DML (Create table + insert into statements) and desired results, so that who ever answers you can test the answer before posting it.
Try using two ctes..pseudo code.Also note distinct in second query will not even work,since you are returning two columns,so i changed it it below,so that you can get that column as well
;with tbl1
as
(
select a,sum(col1) as summ
from
tbl1
)
,tbl2
as
(
select userid,sum(Anothersmcol) as sum2
from tbl2
)
select tbl1.columns,tbl2.columns
from
tbl1 t1
join
tbl2 t2
on t1.sumcol>t2.sumcol
You can't use window functions in a where clause. Define it in a subquery:
select *
from (
select sum(...) over (...) as Sum1
, OtherColumn
from YourTable
) sub
where Sum1 < (...your subquery...)

Select Combination of columns from Table A not in Table B

I have two sets of table with all the contacts on an Account their titles and etc. For data migration purposes I need to select All ContactsIds with their AccountID from Table A that do not exist in TableB. Its the combination of both the ContactId and the AccountID. I have tried the following:
Select * from Final_Combined_Result wfcr
WHERE NOT EXISTS (Select Contact_ID, Account_ID from Temp_WFCR)
I know this is completely off, but I have looked at a couple of other questions on here but was unable to find an appropriate solution.
I have also tried this:
Select * from Final_Combined_Result wfcr
WHERE NOT EXISTS
(Select Contact_ID, Account_ID from Temp_WFCR as tc
where tc.Account_ID=wfcr.Account_InternalID
AND tc.Account_ID=wfcr.Contact_InternalID)
This seems to be correct but I would like to make sure.
Select wfcr.ContactsId, wfcr.AccountID
from Final_Combined_Result wfcr
left join Temp_WFCR t_wfcr ON t_wfcr.ContactsIds = wfcr.ContactsId
AND t_wfcr.AccountID = wfcr.AccountID
WHERE t_wfcr.AccountID is null
See this great explanation of joins
#juergend's answer shows the left join.
Using a not exists you join in the subselect, it would look like this:
Select wfcr.*
from
Final_Combined_Result wfcr
WHERE NOT EXISTS
(Select 1 --select values dont matter here, only the join restricts.
from
Temp_WFCR t
where t.Contact_ID = wfcr.Contact_InternalID
and t.account_id = wfcr.Account_InternalID
)

SQL, only if matching all foreign key values to return the record?

I have two tables
Table A
type_uid, allowed_type_uid
9,1
9,2
9,4
1,1
1,2
24,1
25,3
Table B
type_uid
1
2
From table A I need to return
9
1
Using a WHERE IN clause I can return
9
1
24
SELECT
TableA.type_uid
FROM
TableA
INNER JOIN
TableB
ON TableA.allowed_type_uid = TableB.type_uid
GROUP BY
TableA.type_uid
HAVING
COUNT(distinct TableB.type_uid) = (SELECT COUNT(distinct type_uid) FROM TableB)
Join the two tables togeter, so that you only have the records matching the types you are interested in.
Group the result set by TableA.type_uid.
Check that each group has the same number of allowed_type_uid values as exist in TableB.type_uid.
distinct is required only if there can be duplicate records in either table. If both tables are know to only have unique values, the distinct can be removed.
It should also be noted that as TableA grows in size, this type of query will quickly degrade in performance. This is because indexes are not actually much help here.
It can still be a useful structure, but not one where I'd recommend running the queries in real-time. Rather use it to create another persisted/cached result set, and use this only to refresh those results as/when needed.
Or a slightly cheaper version (resource wise):
SELECT
Data.type_uid
FROM
A AS Data
CROSS JOIN
B
LEFT JOIN
A
ON Data.type_uid = A.type_uid AND B.type_uid = A.allowed_type_uid
GROUP BY
Data.type_uid
HAVING
MIN(ISNULL(A.allowed_type_uid,-999)) != -999
Your explanation is not very clear. I think you want to get those type_uid's from table A where for all records in table B there is a matching A.Allowed_type_uid.
SELECT T2.type_uid
FROM (SELECT COUNT(*) as AllAllowedTypes FROM #B) as T1,
(SELECT #A.type_uid, COUNT(*) as AllowedTypes
FROM #A
INNER JOIN #B ON
#A.allowed_type_uid = #B.type_uid
GROUP BY #A.type_uid
) as T2
WHERE T1.AllAllowedTypes = T2.AllowedTypes
(Dems, you were faster than me :) )

Selecting only duplicate rows, kinda

I'm really not sure on the best way to explain this, but we have a database which has a unique ID for each employee, a description, and a flag for current.
SELECT COUNT("Current_Flag"),
"Employee_Number"
FROM "Employment_History"
WHERE "Current_Flag" = 'Y'
GROUP BY "Employee_Number" ;
I'm trying to return the unique ID for every case where they have two current flags set, but I don't even know where to begin. I've tried a subselect which didn't work, I'm sure the answer is quite simple but I've just been told I only have 7 minutes to do it, so panicked and thought I'd ask here. Sorry all :/
Add a HAVING clause to your current query - like so:
select count("Current_Flag"),
"Employee_Number"
from "Employment_History"
where "Current_Flag" = 'Y'
group by "Employee_Number"
having count("Current_Flag") >= 2
(Change the condition to =2 if you only want exactly 2 matches.)
Try something like this (untested!). You might have to play around with HAVING/COUNT to get it to work
SELECT unique_id, description, flag FROM table GROUP BY unique_id HAVING COUNT(unique_id) > 1
select employee_name, count(employee_name) as cnt
group by employee_name
where cnt>1
I guess this will do the trick. Adapt it to the fields on your DB.
Self Join table return only rows where 2 "Flags" are alike exclude rows that match on UniqueID returning al instances of ID having matches on flags.
Select A.ID
from table A
INNER JOIN table B
on A.Flag1=B.Flag1 and A.Flag2=B.Flag2
Where A.ID <> B.ID
(UNTESTED STATEMENT)
SELECT Id FROM TABLE1 T1 WHERE T1.Id NOT IN(
SELECT Id FROM TABLE2 T2 WHERE (T1.Flag1 = T2.Flag1) AND (T1.Id <> T2.Id))

Converting a nested sql where-in pattern to joins

I have a query that is returning the correct data to me, but being a developer rather than a DBA I'm wondering if there is any reason to convert it to joins rather than nested selects and if so, what it would look like.
My code currently is
select * from adjustments where store_id in (
select id from stores where original_id = (
select original_id from stores where name ='abcd'))
Any references to the better use of joins would be appreciated too.
Besides any likely performance improvements, I find following much easier to read.
SELECT *
FROM adjustments a
INNER JOIN stores s ON s.id = a.store_id
INNER JOIN stores s2 ON s2.original_id = s.original_id
WHERE s.name = 'abcd'
Test script showing my original fault in ommitting original_id
DECLARE #Adjustments TABLE (store_id INTEGER)
DECLARE #Stores TABLE (id INTEGER, name VARCHAR(32), original_id INTEGER)
INSERT INTO #Adjustments VALUES (1), (2), (3)
INSERT INTO #Stores VALUES (1, 'abcd', 1), (2, '2', 1), (3, '3', 1)
/*
OP's Original statement returns store_id's 1, 2 & 3
due to original_id being all the same
*/
SELECT * FROM #Adjustments WHERE store_id IN (
SELECT id FROM #Stores WHERE original_id = (
SELECT original_id FROM #Stores WHERE name ='abcd'))
/*
Faulty first attempt with removing original_id from the equation
only returns store_id 1
*/
SELECT a.store_id
FROM #Adjustments a
INNER JOIN #Stores s ON s.id = a.store_id
WHERE s.name = 'abcd'
If you would use joins, it would look like this:
select *
from adjustments
inner join stores on stores.id = adjustments.store_id
inner join stores as stores2 on stores2.original_id = stores.original_id
where stores2.name = 'abcd'
(Apparently you can omit the second SELECT on the stores table (I left it out of my query) because if I'm interpreting your table structure correctly,
select id from stores where original_id = (select original_id from stores where name ='abcd')
is the same as
select * from stores where name ='abcd'.)
--> edited my query back to the original form, thanks to Lieven for pointing out my mistake in his answer!
I prefer using joins, but for simple queries like that, there is normally no performance difference. SQL Server treats both queries the same internally.
If you want to be sure, you can look at the execution plan.
If you run both queries together, SQL Server will also tell you which query took more resources than the other (in percent).
A slightly different approach:
select * from adjustments a where exists
(select null from stores s1, stores s2
where a.store_id = s1.id and s1.original_id = s2.original_id and s2.name ='abcd')
As say Microsoft here:
Many Transact-SQL statements that include subqueries can be
alternatively formulated as joins. Other questions can be posed only
with subqueries. In Transact-SQL, there is usually no performance
difference between a statement that includes a subquery and a
semantically equivalent version that does not. However, in some cases
where existence must be checked, a join yields better performance.
Otherwise, the nested query must be processed for each result of the
outer query to ensure elimination of duplicates. In such cases, a join
approach would yield better results.
Your case is exactly when Join and subquery gives the same performance.
Example when subquery can not be converted to "simple" JOIN:
select Country,TR_Country.Name as Country_Translated_Name,TR_Country.Language_Code
from Country
JOIN TR_Country ON Country.Country=Tr_Country.Country
where country =
(select top 1 country
from Northwind.dbo.Customers C
join
Northwind.dbo.Orders O
on C.CustomerId = O.CustomerID
group by country
order by count(*))
As you can see, every country can have different name translations so we can not just join and count records (in that case, countries with larger quantities of translations will have more record counts)
Of cource, you can can transform this example to:
JOIN with derived table
CTE
but it is an other tale-)