INNER JOIN vs IN

INNER JOIN vs IN - sql

SELECT C.* FROM StockToCategory STC
INNER JOIN Category C ON STC.CategoryID = C.CategoryID
WHERE STC.StockID = #StockID
VS
SELECT * FROM Category
WHERE CategoryID IN
(SELECT CategoryID FROM StockToCategory WHERE StockID = #StockID)
Which is considered the correct (syntactically) and most performant approach and why?
The syntax in the latter example seems more logical to me but my assumption is the JOIN will be faster.
I have looked at the query plans and havent been able to decipher anything from them.
Query Plan 1
Query Plan 2

The two syntaxes serve different purposes. Using the Join syntax presumes you want something from both the StockToCategory and Category table. If there are multiple entries in the StockToCategory table for each category, the Category table values will be repeated.
Using the IN function presumes that you want only items from the Category whose ID meets some criteria. If a given CategoryId (assuming it is the PK of the Category table) exists multiple times in the StockToCategory table, it will only be returned once.
In your exact example, they will produce the same output however IMO, the later syntax makes your intent (only wanting categories), clearer.
Btw, yet a third syntax which is similar to using the IN function:
Select ...
From Category
Where Exists (
Select 1
From StockToCategory
Where StockToCategory.CategoryId = Category.CategoryId
And StockToCategory.Stock = #StockId
)

Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.

T think There are just two ways to specify the same desired result.

for sqlite
table device_group_folders contains 10 records
table device_groups contains ~100000 records
INNER JOIN: 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs INNER JOIN device_groups ON device_groups.parent = select_childs.uuid;
WHERE 31 ms
WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT device_groups.uuid FROM select_childs, device_groups WHERE device_groups.parent = select_childs.uuid;
IN <1 ms
SELECT device_groups.uuid FROM device_groups WHERE device_groups.parent IN (WITH RECURSIVE select_childs(uuid) AS (
SELECT uuid FROM device_group_folders WHERE uuid = '000B:653D1D5D:00000003'
UNION ALL
SELECT device_group_folders.uuid FROM device_group_folders INNER JOIN select_childs ON parent = select_childs.uuid
) SELECT * FROM select_childs);

Related

How to return rows matched in a table without multiple EXISTS clauses?

I want to pull back results from one table that match ALL specified values where the specified values are in another table. I can do it like this:
SELECT * FROM Contacts
WHERE
EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = '8C62E5DE-00FC-4994-8127-000B02E10DA5')
AND EXISTS (SELECT 1 FROM dbo.ContactClassifications WHERE ContactID = Contacts.ID AND ClassificationID = 'D2E90AA0-AC93-4406-AF93-0020009A34BA')
AND EXISTS etc...
However that falls over when I get up to about 40 EXISTS clauses. The error message is "The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query."

The gist of this is to
Select all contacts with any GUID from the IN statement
Use a DISTINCT COUNT to get a count for each contactid on matching GUID's
Use the HAVING to retain only those contacts that equal the amount of matching GUID's you've put into the IN statement
SQL Statement
SELECT *
FROM dbo.Contacts c
INNER JOIN (
SELECT c.ID
FROM dbo.Contacts c
INNER JOIN dbo.ContactClassifications cc ON c.ID = cc.ContactID
WHERE cc.ClassificationID IN ('..', '..', 38 other GUIDS)
GROUP BY
c.ID
HAVING COUNT(DISTINCT cc.ClassificationID) = 40
) cc ON cc.ID = c.ID
Test script at data.stackexchange

One solution is to demand that no classification exists without a matching contact. That's a double negation:
select *
from contacts c
where not exists
(
select *
from ContactClassifications cc
where not exists
(
select *
from ContactClassifications cc2
where cc2.ContactID = c.ID
and cc2.ClassificationID = cc.ClassificationID
)
)
This type of problem is known as relational division.

SELECT c.*
FROM Contacts c
INNER JOIN
(cc.ContactID, COUNT(DISTINCT cc.ClassificationID) as num_class
FROM ContactClassifications
WHERE ClassificationID IN (....)
GROUP BY cc.ContactID
) b ON c.ID = b.ContactID
WHERE b.num_class = [number of distinct values - how many different values you put in "IN"]
If you run SQLServer 2005 and higher, you can do pretty much the same with CROSS APPLY, supposedly more efficiently

Converting a nested sql where-in pattern to joins

I have a query that is returning the correct data to me, but being a developer rather than a DBA I'm wondering if there is any reason to convert it to joins rather than nested selects and if so, what it would look like.
My code currently is
select * from adjustments where store_id in (
select id from stores where original_id = (
select original_id from stores where name ='abcd'))
Any references to the better use of joins would be appreciated too.

Besides any likely performance improvements, I find following much easier to read.
SELECT *
FROM adjustments a
INNER JOIN stores s ON s.id = a.store_id
INNER JOIN stores s2 ON s2.original_id = s.original_id
WHERE s.name = 'abcd'
Test script showing my original fault in ommitting original_id
DECLARE #Adjustments TABLE (store_id INTEGER)
DECLARE #Stores TABLE (id INTEGER, name VARCHAR(32), original_id INTEGER)
INSERT INTO #Adjustments VALUES (1), (2), (3)
INSERT INTO #Stores VALUES (1, 'abcd', 1), (2, '2', 1), (3, '3', 1)
/*
OP's Original statement returns store_id's 1, 2 & 3
due to original_id being all the same
*/
SELECT * FROM #Adjustments WHERE store_id IN (
SELECT id FROM #Stores WHERE original_id = (
SELECT original_id FROM #Stores WHERE name ='abcd'))
/*
Faulty first attempt with removing original_id from the equation
only returns store_id 1
*/
SELECT a.store_id
FROM #Adjustments a
INNER JOIN #Stores s ON s.id = a.store_id
WHERE s.name = 'abcd'

If you would use joins, it would look like this:
select *
from adjustments
inner join stores on stores.id = adjustments.store_id
inner join stores as stores2 on stores2.original_id = stores.original_id
where stores2.name = 'abcd'
(Apparently you can omit the second SELECT on the stores table (I left it out of my query) because if I'm interpreting your table structure correctly,
select id from stores where original_id = (select original_id from stores where name ='abcd')
is the same as
select * from stores where name ='abcd'.)
--> edited my query back to the original form, thanks to Lieven for pointing out my mistake in his answer!
I prefer using joins, but for simple queries like that, there is normally no performance difference. SQL Server treats both queries the same internally.
If you want to be sure, you can look at the execution plan.
If you run both queries together, SQL Server will also tell you which query took more resources than the other (in percent).

A slightly different approach:
select * from adjustments a where exists
(select null from stores s1, stores s2
where a.store_id = s1.id and s1.original_id = s2.original_id and s2.name ='abcd')

As say Microsoft here:
Many Transact-SQL statements that include subqueries can be
alternatively formulated as joins. Other questions can be posed only
with subqueries. In Transact-SQL, there is usually no performance
difference between a statement that includes a subquery and a
semantically equivalent version that does not. However, in some cases
where existence must be checked, a join yields better performance.
Otherwise, the nested query must be processed for each result of the
outer query to ensure elimination of duplicates. In such cases, a join
approach would yield better results.
Your case is exactly when Join and subquery gives the same performance.
Example when subquery can not be converted to "simple" JOIN:
select Country,TR_Country.Name as Country_Translated_Name,TR_Country.Language_Code
from Country
JOIN TR_Country ON Country.Country=Tr_Country.Country
where country =
(select top 1 country
from Northwind.dbo.Customers C
join
Northwind.dbo.Orders O
on C.CustomerId = O.CustomerID
group by country
order by count(*))
As you can see, every country can have different name translations so we can not just join and count records (in that case, countries with larger quantities of translations will have more record counts)
Of cource, you can can transform this example to:
JOIN with derived table
CTE
but it is an other tale-)

Filter a SQL Server table dynamically using multiple joins

I am trying to filter a single table (master) by the values in multiple other tables (filter1, filter2, filter3 ... filterN) using only joins.
I want the following rules to apply:
(A) If one or more rows exist in a filter table, then include only those rows from the master that match the values in the filter table.
(B) If no rows exist in a filter table, then ignore it and return all the rows from the master table.
(C) This solution should work for N filter tables in combination.
(D) Static SQL using JOIN syntax only, no Dynamic SQL.
I'm really trying to get rid of dynamic SQL wherever possible, and this is one of those places I truly think it's possible, but just can't quite figure it out. Note: I have solved this using Dynamic SQL already, and it was fairly easy, but not particularly efficient or elegant.
What I have tried:
Various INNER JOINS between master and filter tables - works for (A) but fails on (B) because the join removes all records from the master (left) side when the filter (right) side has no rows.
LEFT JOINS - Always returns all records from the master (left) side. This fails (A) when some filter tables have records and some do not.
What I really need:
It seems like what I need is to be able to INNER JOIN on each filter table that has 1 or more rows and LEFT JOIN (or not JOIN at all) on each filter table that is empty.
My question: How would I accomplish this without resorting to Dynamic SQL?

In SQL Server 2005+ you could try this:
WITH
filter1 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable1 f ON join_condition
),
filter2 AS (
SELECT DISTINCT
m.ID,
HasMatched = CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END,
AllHasMatched = MAX(CASE WHEN f.ID IS NULL THEN 0 ELSE 1 END) OVER ()
FROM masterdata m
LEFT JOIN filtertable2 f ON join_condition
),
…
SELECT m.*
FROM masterdata m
INNER JOIN filter1 f1 ON m.ID = f1.ID AND f1.HasMatched = f1.AllHasMatched
INNER JOIN filter2 f2 ON m.ID = f2.ID AND f2.HasMatched = f2.AllHasMatched
…
My understanding is, filter tables without any matches simply must not affect the resulting set. The output should only consist of those masterdata rows that have matched all the filters where matches have taken place.

SELECT *
FROM master_table mt
WHERE (0 = (select count(*) from filter_table_1)
OR mt.id IN (select id from filter_table_1)
AND (0 = (select count(*) from filter_table_2)
OR mt.id IN (select id from filter_table_2)
AND (0 = (select count(*) from filter_table_3)
OR mt.id IN (select id from filter_table_3)
Be warned that this could be inefficient in practice. Unless you have a specific reason to kill your existing, working, solution, I would keep it.

Do inner join to get results for (A) only and do left join to get results for (B) only (you will have to put something like this in the where clause: filterN.column is null) combine results from inner join and left join with UNION.

Left Outer Join - gives you the MISSING entries in master table ....
SELECT * FROM MASTER M
INNER JOIN APPRENTICE A ON A.PK = M.PK
LEFT OUTER JOIN FOREIGN F ON F.FK = M.PK
If FOREIGN has keys that is not a part of MASTER you will have "null columns" where the slots are missing
I think that is what you looking for ...
Mike

First off, it is impossible to have "N number of Joins" or "N number of filters" without resorting to dynamic SQL. The SQL language was not designed for dynamic determination of the entities against which you are querying.
Second, one way to accomplish what you want (but would be built dynamically) would be something along the lines of:
Select ...
From master
Where Exists (
Select 1
From filter_1
Where filter_1 = master.col1
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_1
)
Intersect
Select 1
From filter_2
Where filter_2 = master.col2
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_2
)
...
Intersect
Select 1
From filter_N
Where filter_N = master.colN
Union All
Select 1
From ( Select 1 )
Where Not Exists (
Select 1
From filter_N
)
)

I have previously posted a - now deleted - answer based on wrong assumptions on you problems.
But I think you could go for a solution where you split your initial search problem into a matter of constructing the set of ids from the master table, and then select the data joining on that set of ids. Here I naturally assume you have a kind of ID on your master table. The filter tables contains the filter values only. This could then be combined into the statement below, where each SELECT in the eligble subset provides a set of master ids, these are unioned to avoid duplicates and that set of ids are joined to the table with data.
SELECT * FROM tblData INNER JOIN
(
SELECT id FROM tblData td
INNER JOIN fa on fa.a = td.a
UNION
SELECT id FROM tblData td
INNER JOIN fb on fb.b = td.b
UNION
SELECT id FROM tblData td
INNER JOIN fc on fc.c = td.c
) eligible ON eligible.id = tblData.id
The test has been made against the tables and values shown below. These are just an appendix.
CREATE TABLE tblData (id int not null primary key identity(1,1), a varchar(40), b datetime, c int)
CREATE TABLE fa (a varchar(40) not null primary key)
CREATE TABLE fb (b datetime not null primary key)
CREATE TABLE fc (c int not null primary key)

Since you have filter tables, I am assuming that these tables are probably dynamically populated from a front-end. This would mean that you have these tables as #temp_table (or even a materialized table, doesn't matter really) in your script before filtering on the master data table.
Personally, I use the below code bit for filtering dynamically without using dynamic SQL.
SELECT *
FROM [masterdata] [m]
INNER JOIN
[filter_table_1] [f1]
ON
[m].[filter_column_1] = ISNULL(NULLIF([f1].[filter_column_1], ''), [m].[filter_column_1])
As you can see, the code NULLs the JOIN condition if the column value is a blank record in the filter table. However, the gist in this is that you will have to actively populate the column value to blank in case you do not have any filter records on which you want to curtail the total set of the master data. Once you have populated the filter table with a blank, the JOIN condition NULLs in those cases and instead joins on itself with the same column from the master data table. This should work for all the cases you mentioned in your question.
I have found this bit of code to be faster in terms of performance.
Hope this helps. Please let me know in the comments.

How to avoid large in clause?

I have 3 tables :
table_product (30 000 row)
---------
ID
label
_
table_period (225 000 row)
---------
ID
date_start
date_end
default_price
FK_ID_product
and
table_special_offer (10 000 row)
-----
ID
label
date_start,
date_end,
special_offer_price
FK_ID_period
So I need to load data from all these table, so here it's what I do :
1/ load data from "table_product" like this
select *
from table_product
where label like 'gun%'
2/ load data from "table_period" like this
select *
from table_period
where FK_ID_product IN(list of all the ids selected in the 1)
3/ load data from "table_special_offer" like this
select *
from table_special_offer
where FK_ID_period IN(list of all the ids selected in the 2)
As you may think the IN clause in the point 3 can be very very big (like 75 000 big), so I got a lot of chance of getting either a timeout or something like " An expression services limit has been reached".
Have you ever had something like this, and how did you manage to avoid it ?
PS :
the context : SQL server 2005, .net 2.0
(please don't tell me my design is bad, or I shouldn't do "select *", I just simplified my problem so it is a little bit simpler than 500 pages describing my business).
Thanks.

Switch to using joins:
SELECT <FieldList>
FROM Table_Product prod
JOIN Table_Period per ON prod.Id = per.FK_ID_Product
JOIN Table_Special_Offer spec ON per.ID = spec.FK_ID_Period
WHERE prod.label LIKE 'gun%'
Something you should be aware of is the difference of IN vs JOIN vs EXISTS - great article here.

In finally have my answer : table variable (a bit like #smirkingman's solution but not with cte) so:
declare #product(id int primary key,label nvarchar(max))
declare #period(id int primary key,date_start datetime,date_end datetime,defaultprice real)
declare #special_offer(id int,date_start datetime,date_end datetime,special_offer_price real)
insert into #product
select *
from table_product
where label like 'gun%'
insert into #period
select *
from table_period
where exists(
select * from #product p where p.id = table_period.FK_id_product
)
insert into #special_offer
select *
from table_special_offer
where exists(
select * from #period p where p.id = table_special_offer.fk_id_period
)
select * from #product
select * from #period
select * from #special_offer
this is for the sql, and with c# I use ExecuteReader, Read, and NextResult of the class sqldatareader
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqldatareader.aspx
I got all I want :
- my datas
- i don't have too much data (unlike the solutions with join)
- i don't execute twice the same query (like solution with subquery)
- i don't have to change my mapping code (1row = 1 business object)

Don't use explicit list of values in IN clause. Instead, write your query like
... FK_ID_product IN (select ID
from table_product
where label like 'gun%')

SELECT *
FROM
table_product tp
INNER JOIN table_period tper
ON tp.ID = tper.FK_ID_product
INNER JOIN table_special_offer so
ON tper.ID = so.FK_ID_period
WHERE
tp.label like 'gun%'"

First some code...
Using JOIN:
SELECT
table_product.* --'Explicit table calls just for organisation sake'
, table_period.*
, table_special_offer.*
FROM
table_product
INNER JOIN table_period
ON table_product.ID = table_period.FK_ID_product
INNER JOIN table_special_offer
ON table_period.ID = table_special_offer.FK_ID_period
WHERE
tp.label like 'gun%'"
Using IN :
SELECT
*
FROM
table_special_offer
WHERE FK_ID_period IN
(
SELECT
FK_ID_period
FROM
table_period
WHERE FK_ID_product IN
(
SELECT
FK_ID_product
FROM
table_product
WHERE label like '%gun'
) AS ProductSub
) AS PeriodSub
Depending on how well your tables get indexed both can be used. Inner Joins as the others have suggested are definitely efficient at doing your query and returning all data for the 3 tables. If you are only needing To use the ID's from table_product and table_period Then using the nested "IN" statements can be good for adapting search criteria on indexed tables (Using IN can be ok if the criteria used are integers like I assume your FK_ID_product is).
An important thing to remember is every database and relational table setup is going to act differently, you wont have the same optimised results in one db to another. Try ALL the possibilities at hand and use the one that is best for you. The query analyser can be incredibly useful in times like these when you need to check performance.
I had this situation when we were trying to join up customer accounts to their appropriate addresses via an ID join and a linked table based condition (we had another table which showed customers with certain equipment which we had to do a string search on.) Strangely enough it was quicker for us to use both methods in the one query:
--The query with the WHERE Desc LIKE '%Equipment%' was "joined" to the client table using the IN clause and then this was joined onto the addresses table:
SELECT
Address.*
, Customers_Filtered.*
FROM
Address AS Address
INNER JOIN
(SELECT Customers.* FROM Customers WHERE ID IN (SELECT CustomerID FROM Equipment WHERE Desc LIKE '%Equipment search here%') AS Equipment ) AS Customers_Filtered
ON Address.CustomerID = Customers_Filtered.ID
This style of query (I apologise if my syntax isn't exactly correct) ended up being more efficient and easier to organise after the overall query got more complicated.
Hope this has helped - Follow #AdaTheDev 's article link, definitely a good resource.

A JOIN gives you the same results.
SELECT so.Col1
, so.Col2
FROM table_product pt
INNER JOIN table_period pd ON pd.FK_ID_product = pt.ID_product
INNER JOIN table_special_offer so ON so.FK_ID_Period = pd.ID_Period
WHERE pt.lable LIKE 'gun%'

I'd be interested to know if this might make an improvement:
WITH products(prdid) AS (
SELECT
ID
FROM
table_product
WHERE
label like 'gun%'
),
periods(perid) AS (
SELECT
ID
FROM
table_period
INNER JOIN products
ON id = prdid
),
offers(offid) AS (
SELECT
ID
FROM
table_special_offer
INNER JOIN periods
ON id = perid
)
... just a suggestion...

SQL query look for exact match using inner join

I have 2 tables in SQL Server 2005 db with structures represented as such:
CAR:
CarID bigint,
CarField bigint,
CarFieldValue varchar(50);
TEMP: CarField bigint, CarFieldValue varchar(50);
Now the TEMP table is actually a table variable containing data collected through a search facility. Based on the data contained in TEMP, I wish to filter out and get all DISTINCT CarID's from the CAR table exactly matching those rows in the TEMP table. A simple Inner Join works well, but I want to only get back the CarID's that match ALL the rows in TEMP exactly. Basically, each row in TEMP is supposed to be denote an AND filter, whereas, with the current inner join query, they are acting more like OR filters. The more rows in TEMP, the less rows I expect showing in my result-set for CAR. I hope Im making sense with this...if not please let me know and I'll try to clarify.
Any ideas on how I can make this work?
Thank u!

You use COUNT, GROUP BY and HAVING to find the cars that have exactly that many mathicng rows as you expect:
select CarID
from CAR c
join TEMP t on c.CarField = t.CarField and c.CarFieldValue = t.CarFieldValue
group by CarID
having COUNT(*) = <the number you expect>;
You can even make <the number you expect> be a scalar subquery like select COUNT(*) from TEMP.

SELECT *
FROM (
SELECT CarID,
COUNT(CarID) NumberMatches
FROM CAR c INNER JOIN
TEMP t ON c.CarField = t.CarField
AND c.CarFieldValue = t.CarFieldValue
GROUP BY CarID
) CarNums
WHERE NumberMatches = (SELECT COUNT(1) FROM TEMP)

Haven't tested this, but I don't think you need a count to do what you want. This query ought to be substantially faster because it avoids a potentially huge number of counts. This query finds all the cars which are missing a value and then filters them out.
select distinct carid from car where carid not in
(
select
carid
from
car c
left outer join temp t on
c.carfield = t.carfield
and c.carfieldvalue = t.carfieldvalue
where
t.carfield is null
)

Hrm...
;WITH FilteredCars
AS
(
SELECT C.CarId
FROM Car C
INNER JOIN Temp Criteria
ON C.CarField = Criteria.CarField
AND C.CarFieldValue = Critera.CarFieldValue
GROUP BY C.CarId
HAVING COUNT(*) = (SELECT COUNT(*) FROM Temp)
)
SELECT *
FROM FilteredCars F
INNER JOIN Car C ON F.CarId = C.CarId
The basic premise is that for ALL criteria to match an INNER JOIN against your temp table must produce as many records as there are within that table. The HAVING clause at the end of the FilteredCars query should widdle the results down to those that match all criteria.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

INNER JOIN vs IN - sql

Syntactically (semantically too) these are both correct. In terms of performance they are effectively equivalent, in fact I would expect SQL Server to generate the exact same physical plans for these two queries.

T think There are just two ways to specify the same desired result.

Related

How to return rows matched in a table without multiple EXISTS clauses?

Converting a nested sql where-in pattern to joins

Filter a SQL Server table dynamically using multiple joins

How to avoid large in clause?

SQL query look for exact match using inner join

Categories

Resources