Equivalent Subquery for a Join - sql

I am looking for an answer which is actually
Is It possible to rewrite every Join to equivalent Subquery
I know that Subquery columns can not be selected outer query.
I run a query in sql server which is
select DISTINct A.*,B.ParentProductCategoryID from [SalesLT].[Product] as
A inner join [SalesLT].[ProductCategory] as B on
A.ProductCategoryID=B.ProductCategoryID
select A.*
from [SalesLT].[Product] as A
where EXISTS(select B.ParentProductCategoryID from [SalesLT].
[ProductCategory] as B where A.ProductCategoryID=B.ProductCategoryID)
Both of these query giving me output 293 rows which I expected.
Now Problem is How do I select [SalesLT].[ProductCategory] the column in the 2nd case?
Do I need to co-relate this subquery in the select clause to get this column to be shown in output?

Is It possible to rewrite every Join to equivalent Subquery
No, because joins can 1) remove rows or 2) multiply rows
ex 1)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3)
SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
The row containing value 1 from t1 was removed. This does not happen in a subquery.
ex 2)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3), (3), (3), (3)
SELECT t1.num AS t1num, t2.num as t2num FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
3 3
3 3
3 3
A subquery would not change the number of rows in the table being queried.
In your example, you do an exists... this is not going to return the value from the 2nd table.
This is how I would subquery:
select A.*
,(SELECT B.ParentProductCategoryID
FROM [SalesLT].[ProductCategory] B
WHERE B.ProductCategoryID = A.ProductCategoryID) AS [2nd table ProductCategoryID]
from [SalesLT].[Product] as A

You might use
select A.*,
(
select B.ParentProductCategoryID
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID
) ParentProductCategoryID
from [SalesLT].[Product] as A
where EXISTS(select 1
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID)
however, I find the JOIN version much more intuitive.
There is no way for you to use any data from the EXISTS subquery in the outer query. The only purpose of the subquery is to evaluate whether the EXISTS is true or false for each product.

Related

SQL server - Finding duplicates on linked servers

I need to find duplicates of the same database/table/column on two linked SQL servers
Note that the column may also have duplicates inside the table itself in each individual SQL server !
ie
server1.tableName.columnName:
john
john
mary
kate
kate
server2.tableName.columnName:
kate
I want the result in this case to be kate as it is the only entry that exists in both
I tried this:
select table1.columnName, table2.columnName, count(*)
from [server1].[dbName].[dbo].[tableName] table1
inner join [server2].[dbName].[dbo].[tableName] table2
ON table1.columnName = table2.columnName
group by table1.columnName, table2.columnName having count(table1.columnName) > 1
Which gives a set of results
My question is is this correct ? will I get an entry for any value in columnName that exists in dbName.tableName on both server1 and server2 ?
will I get an entry for any value in columnName that exists in dbName.tableName on both server1 and server2 ?
Not exactly. This would do what you want without the having -- the join is determining whether anything matches.
If you can leave the count out entirely, an alternative formulation uses exists:
select t1.columnName
from [server1].[dbName].[dbo].[tableName] t1
where exists (select 1
from [server2].[dbName].[dbo].[tableName] t2
where t1.columnName = t2.columnName
);
No, you'd only get results which have a duplicate within one of the two servers and is present in both.
The linked servers here are irrelevant, your question functions the same on any two tables. I believe you're asking for one of the two below queries.
DECLARE #table1 TABLE (Id INT)
DECLARE #table2 TABLE (Id INT)
INSERT INTO #table1 (Id)
VALUES (1), (2), (3)
INSERT INTO #table2 (Id)
VALUES (2), (2), (3), (4), (5), (5)
-- original query - 1 result
select table1.Id, table2.Id, count(*)
from #table1 table1
inner join #table2 table2
ON table1.Id = table2.Id
group by table1.Id, table2.Id having count(table1.Id) > 1
-- cross table duplicates - 2 results
select table1.id
from #table1 table1
where exists (select 1 from #table2 table2 where table1.Id = table2.Id)
-- cross/within table duplicates - 3 results
select unioned.Id
from (
select table1.Id
from #table1 table1
union all
select table2.Id
from #table2 table2
) unioned
group by unioned.Id
having count(*) > 1
you can left join the two tables on column name then select the column name from each table and exclude the null values. This will show duplicates if they exist in the output otherwise use a distinct to remove the duplicates.

Which performs first WHERE clause or JOIN clause

Which clause performs first in a SELECT statement?
I have a doubt in select query on this basis.
consider the below example
SELECT *
FROM #temp A
INNER JOIN #temp B ON A.id = B.id
INNER JOIN #temp C ON B.id = C.id
WHERE A.Name = 'Acb' AND B.Name = C.Name
Whether, First it checks WHERE clause and then performs INNER JOIN
First JOIN and then checks condition?
If it first performs JOIN and then WHERE condition; how can it perform more where conditions for different JOINs?
The conceptual order of query processing is:
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
But this is just a conceptual order. In fact the engine may decide to rearrange clauses. Here is proof. Let's make 2 tables with 1000000 rows each:
CREATE TABLE test1 (id INT IDENTITY(1, 1), name VARCHAR(10))
CREATE TABLE test2 (id INT IDENTITY(1, 1), name VARCHAR(10))
;WITH cte AS(SELECT -1 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) d FROM
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t1(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t2(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t3(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t4(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t5(n) CROSS JOIN
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t6(n))
INSERT INTO test1(name) SELECT 'a' FROM cte
Now run 2 queries:
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id AND t2.id = 100
WHERE t1.id > 1
SELECT * FROM dbo.test1 t1
JOIN dbo.test2 t2 ON t2.id = t1.id
WHERE t1.id = 1
Notice that the first query will filter most rows out in the join condition, but the second query filters in the where condition. Look at the produced plans:
1 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(100)
2 TableScan - Predicate:[Test].[dbo].[test2].[id] as [t2].[id]=(1)
This means that in the first query optimized, the engine decided first to evaluate the join condition to filter out rows. In the second query, it evaluated the where clause first.
Logical order of query processing phases is:
FROM - Including JOINs
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You can have as many as conditions even on your JOINs or WHERE clauses. Like:
Select * from #temp A
INNER JOIN #temp B ON A.id = B.id AND .... AND ...
INNER JOIN #temp C ON B.id = C.id AND .... AND ...
Where A.Name = 'Acb'
AND B.Name = C.Name
AND ....
you can refer to this join optimization
SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
INNER JOIN T3 ON P2(T2,T3)
WHERE P(T1,T2,T3)
The nested-loop join algorithm would execute this query in the following manner:
FOR each row t1 in T1 {
FOR each row t2 in T2 such that P1(t1,t2) {
FOR each row t3 in T3 such that P2(t2,t3) {
IF P(t1,t2,t3) {
t:=t1||t2||t3; OUTPUT t;
}
}
}
}
You can refer MSDN
The rows selected by a query are filtered first by the FROM clause
join conditions, then the WHERE clause search conditions, and then the
HAVING clause search conditions. Inner joins can be specified in
either the FROM or WHERE clause without affecting the final result.
You can also use the SET SHOWPLAN_ALL ON before executing your query to show the execution plan of your query so that you can measure the performance difference in the two.
If you come to this site for the question about logical query processing, you really need to read this article on ITProToday by Itzik Ben-Gan.
Figure 3: Logical query processing order of query clauses
1 FROM
2 WHERE
3 GROUP BY
4 HAVING
5 SELECT
5.1 SELECT list
5.2 DISTINCT
6 ORDER BY
7 TOP / OFFSET-FETCH

SQL joining on >=

I have a table like this in ORACLE
a b
-- --
1000 1
100 2
10 3
1 4
My other table has numbers like '67' or '112' in a column called numbers for example.
How can I join to this table using those values and get the correct result where >=1000 would be 1 and >= 100 would be 2 >=10 would be 3 etc.
I tried to do a
select g.b
from table o
join table2 g on o.column >= g.a
when I do this say 1002 was the value of g I would get the back these results.
1
2
3
4
when I just want 1
Easiest would be if your lookup table had ranges instead of just one number, such as row 1 = 1000,9999,1 and row 2 = 100,999,2 etc.
Then your join might be
SELECT OtherTable.n, lookup.b
from OtherTable
LEFT JOIN lookup on OtherTable.n between lookup.low and lookup.high
But, if you really want to use your original table, then on SQL Server, do this:
/*CREATE TABLE #A (a int,b int)
INSERT INTO #A VALUES (1000,1),(100,2),(10,3),(1,4)
CREATE TABLE #B (n INT)
INSERT INTO #B VALUES (67),(112),(4),(2001)
*/
SELECT B.n, A1.b
FROM #B B OUTER APPLY (SELECT TOP 1 a,b FROM #A A WHERE A.a<B.n ORDER BY A.a DESC) A1
Here's one way to do it using a subquery to get the MAX of column a, and then rejoining on the same table to get b:
select t.numericvalue, t2.b
from (
select t.numericvalue, max(t2.a) maxb
from table1 t
join table2 t2 on t.numericvalue >= t2.a
group by numericvalue
) t join table2 t2 on t.maxb = t2.a
SQL Fiddle Demo

SQL -- Remove duplicate pairs

I'm using an SQLite to store a set of undirected edges of a graph using two columns, u and v. For example:
u v
1 2
3 2
2 1
3 4
I have already been through it with SELECT DISTINCT * FROM edges and removed all duplicate rows.
However, there are still duplicates if we remember these are undirected edges. In the above example, the edge (1,2) appears twice, once as (1,2) and once as (2,1) which are both equivalent.
I wish to remove all such duplicates leaving only one of them, either (1,2) or (2,1) -- it doesn't really matter which.
Any ideas how to achieve this? Thanks!
If the same pair (reversed) exists take the one where u>v.
SELECT DISTINCT u,v
FROM table t1
WHERE t1.u > t1.v
OR NOT EXISTS (
SELECT * FROM table t2
WHERE t2.u = t1.v AND t2.v = t1.u
)
This will find all the duplicates:
SELECT t1.u, t1.v FROM table t1 INNER JOIN table t2
ON t1.u = t2.v AND t1.v = t2.u
This will delete the duplicates:
DELETE FROM table t1 WHERE
EXISTS (SELECT * FROM table t2 WHERE t2.u = t1.v AND t2.v = t1.u AND t1.u > t2.u)
Note that this will not delete duplicates like (2, 2) but I think you got those already with SELECT DISTINCT.
Testing for 9 numbers so I'm adding 9 numbers to two tables:
declare #num int
set #num =1
while #num<10
begin
insert into t2 values (#num)
insert into t1 values (#num)
set #num += 1
end
Then coupling uniques without any repetition:
select t1.u, t2.v
from t1 cross join t2
where t1.u>t2.v

SQLServer join two tables

I've gotta question for you, I'm getting hard times trying to combine two tables, I can't manage to find the correct query.
I have two tables:
T1: 1column, Has X records
T2: 1column, Has Y records
Note: Y could never be greater than X but it often lesser than this one
I want to join those tables in order to have a table with two columns
t3: ColumnFromT1, columnFromT2.
When Y is lesser than X, the T2 field values gets repeated and are spread over all my other values, but I want to get NULL when ALL the columns from T2 are used.
How could I achieve that?
Thanks
You could give each table a row number in a subquery. Then you can left join on that row number. To recycle rows from the second table, take the modulus % of the first table's row number.
Example:
select Sub1.col1
, Sub2.col1
from (
select row_number() over (order by col1) as rn
, *
from #T1
) Sub1
left join
(
select row_number() over (order by col1) as rn
, *
from #T2
) Sub2
on (Sub1.rn - 1) % (select count(*) from #T2) + 1 = Sub2.rn
Test data:
declare #t1 table (col1 int)
declare #t2 table (col1 datetime)
insert #t1 values (1), (2), (3), (4), (5)
insert #t2 values ('2010-01-01'), ('2012-02-02')
This prints:
1 2010-01-01
2 2012-02-02
3 2010-01-01
4 2012-02-02
5 2010-01-01
You are looking for a LEFT JOIN (http://www.w3schools.com/sql/sql_join_left.asp) eg . T1 LEFT JOIN T2
say they both have column CustomerID in common
SELECT *
FROM T1
LEFT JOIN
T2 on t1.CustomerId = T2.CustomerId
This will return all records in T1 and those that match in T2 with nulls for the T2 values where they do not match.
Make sure you are joining the tables on a common column (or common column set if more than one column are necessary to perform the join). If not, you are doing a cartesian join ( http://ezinearticles.com/?What-is-a-Cartesian-Join?&id=3560672 )