SQL Query to get the rows not part of join - sql

I am having below tables.
create table test(int id,int data1);
create table test1(int id,int data2);
insert into test values(1,1,);
insert into test1 values(2,2);
insert into test1 values(3,3);
insert into test1 values(1,1);
Now I want the rows of test, that don't participate in join. i.e I want rows (2,2) and (3,3). I want to be able to do this in mysql.
I don't want to use inner query because of performance.
Thank you

Using LEFT JOIN/IS NULL:
SELECT t1.*
FROM TEST1 t1
LEFT JOIN TEST t ON t.id = t1.id
AND t.data1 = t1.data2
WHERE t.id IS NULL
Assuming the columns being joined on, this is the fastest/most efficient method on MySQL. Otherwise, NOT IN/NOT EXISTS are better choices.
Using NOT EXISTS:
SELECT t1.*
FROM TEST1 t1
WHERE NOT EXISTS(SELECT NULL
FROM TEST t
WHERE t.id = t1.id
AND t.data1 = t1.data2)

Without using sub queries (even the EXISTS variety which I love) you'll need to do a left join and grab the records that didn't join, like so:
select a.* from test1 a
left join test b on a.id = b.id and a.data2 = b.data1
where b.id IS NULL

Perhaps something with the union?
select * from test as a
left outer join test1 as o on a.id = o.id
union all
select * from test as a
right outer join test1 as o on a.id = o.id
where a.id is null;
I assume what you want to achieve if an exclusive join.
http://www.xaprb.com/blog/2005/09/23/how-to-write-a-sql-exclusion-join/

Related

SQL antijoin with multiple keys

I'd like to implement an antijoin on two table but using two keys so that the result is all rows in Table A that do not contain the combinations of [key_1, key_2] found in Table B. How can I write this query in SQL?
If you want an anti-left join, the logic is:
select a.*
from tablea a
left join tableb b on b.key_1 = a.key_1 and b.key_2 = a.key_2
where b.key_1 is null
As for me, I like to implement such logic with not exists, because I find that it is more expressive about the intent:
select a.*
from tablea a
where not exists (
select 1 from tableb b where b.key_1 = a.key_1 and b.key_2 = a.key_2
)
The not exists query would take advantage of an index on tableb(key_1, key_2).
select a.*
from table_a a
left anti join table_b b on a.key_1 = b.key_1 and a.key_2 = b.key_2;

How to limit records considered in a nested select within a join?

Curious to see if there is a way to write the following T-SQL statement (this one errors with cannot bind TableA in the nested select.) Removing the error line seems to consider all records from TableB then performs the join.
select *
from TableA A
join (
select TableAid, TableBinfo
from TableB
where TableB.TableAid = A.TableAid -- error line
group by TableAid, TableBinfo
) B on
A.TableAid = B.TableAid
where A.TableAid = 123
Is the following SQL the best I can hope for?
I'd really like to limit the distinct comparison to just the one column in the one table rather than all the columns I select. I don't control the database and it doesn't have indexes on anything but primary keys.
select A.TableAid, B.TableBinfo
from TableA A
join TableB B on
A.TableAid = B.TableAid
where A.TableAid = 123
group by A.TableAid, B.TableBinfo
Your first example looks like you're trying to do an APPLY over a correlated subquery:
SELECT *
FROM TableA a
CROSS APPLY
(
SELECT t.TableBInfo
FROM TableB t
WHERE a.TableAId = b.TableBId
GROUP BY b.TableBInfo
) b
WHERE a.TableAId = 123

SQL alternative to sub-query in SELECT Item list

I have RDBMS table and Queries which are working perfectly. I have offloaded data from RDBMS to HIVE table.To run the existing queries on HIVE, we need first to make them compatible to HIVE.
Let's take below example with sub-query in select item list. It is syntactically valid and working fine on RDBMS system. But It Will not work on HIVE As per HIVE manual , Hive supports subqueries only in the FROM and WHERE clause.
Example 1 :
SELECT t1.one
,(SELECT t2.two
FROM TEST2 t2
WHERE t1.one=t2.two) t21
,(SELECT t3.three
FROM TEST3 t3
WHERE t1.one=t3.three) t31
FROM TEST1 t1 ;
Example 2:
SELECT a.*
, CASE
WHEN EXISTS
(SELECT 1
FROM tblOrder O
INNER JOIN tblProduct P
ON O.Product_id = P.Product_id
WHERE O.customer_id = C.customer_id
AND P.Product_Type IN (2, 5, 6, 9)
)
THEN 1
ELSE 0
END AS My_Custom_Indicator
FROM tblCustomer C
INNER JOIN tblOtherStuff S
ON C.CustomerID = S.CustomerID ;
Example 3 :
Select component_location_id, component_type_code,
( select clv.LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'AXLE'
and component_location_id = cl.component_location_id ) as AXLE,
( select clv.LOCATION_VALUE
from stg_dev.component_location_values clv
where identifier_code = 'SIDE'
and component_location_id = cl.component_location_id ) as SIDE
from stg_dev.component_locations cl ;
I want to know the possible alternative of sub-queries in select item list to make it compatible to hive. Apparently I will be able to transform existing queries in HIVE format.
Any help and guidance is highly appreciated !
The query you provided could be transformed to a simple query with LEFT JOINs.
SELECT
t1.one, t2.two AS t21, t3.three AS t31
FROM
TEST1 t1
LEFT JOIN TEST2 t2
ON t1.one = t2.two
LEFT JOIN TEST3 t3
ON t1.one = t3.three
Since there is no limitation in the subqueries, the joins will return the same data. (The subqueries should return only one or no row for each row in TEST1.)
Please note, that your original query could not handle 1..n connections. In most DBMS, subqueries in the SELECT list should return only with a resultset with one columns and one or no row.
Based on HIVE manual:
SELECT t1.one, t2.two, t3.three
FROM TEST1 t1,TEST2 t2, TEST3 t3
WHERE t1.one=t2.two AND t1.one=t3.three;
SELECT t1.one,t2.two,t3.three FROM TEST1 t1 INNER
JOIN TEST2 t2 ON t1.one=t2.two INNER JOIN TEST3 t3
ON t1.one=t3.three WHERE t1.one=t2.two AND t1.one=t3.three;
SELECT t1.one,t2.two as t21,t3.three as t31 FROM TEST1 t1
INNER JOIN TEST2 t2 ON t1.one=t2.two
INNER JOIN TEST3 t3 ON t1.one=t3.three

SQL (sybase) query using TOP N performs very badly when inserted into table

I have a performance issue using SYBASE ASE when you try to insert the following code into a table or temp table:
INSERT INTO #temp (Id)
SELECT TOP 100 a.Id
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
WHERE a.SomeColumn = 'blah' and b.SomeColumn = 'Blah'
ORDER BY a.Id
The WHERE clause isn't that important...the important thing is that the SELECT query runs in a split second on it's own, but as soon as you try to insert it into a table, it take 2 mintutes!!!!
Looking at the query plan, the optimiser does not seem to take into account that the estimated rows should be 100 and does a table scan of TableB. The select statement on it's own seems to render a sensible plan where the TOP 100 is taken into account, but the insert seems to make the optimiser take a very inefficient route. Have tried many permutations to this query and to no avail. Tables A and B are very large and the TOP N is a must. Have also tried set rowcount 100 and same result.
Can anyone suggest a work around for this?
Thanks
Have you tried:
INSERT INTO #temp (Id)
SELECT * FROM
(
SELECT TOP 100 a.Id
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
WHERE a.SomeColumn = 'blah' and b.SomeColumn = 'Blah'
ORDER BY a.Id
)
It may produce a slightly different execution plan.
Why dont you try this and see if you still have issues. Technically, the insert into also should have been quite fast if the select statement itself is fast.
--INSERT INTO #temp (Id)
SELECT TOP 100 a.Id
INTO #temp
FROM TableA a
INNER JOIN TableB b ON a.Id = b.Id
WHERE a.SomeColumn = 'blah' and b.SomeColumn = 'Blah'
ORDER BY a.Id

Is it possible to use IF or CASE in sql FROM statement

I have a long stored procedure and I would like to make a slight modification to the procedure without having to create a new one(for maintenance purposes).
Is it possible to use a IF or CASE in the FROM statement of the select statement to join other tables?
Like this:
from tableA a
join tableB b a.indexed = c.indexed
IF #Param='Y'
BEGIN
join tableC c a.indexed = c.indexed
END
It didn't seem to work for me. But I am wondering if this is even possible and/or if this even makes sense to do.
Thanks.
No, it is not possible. You can only accomplish this through the use of dynamic SQL.
The Curse and Blessings of Dynamic SQL
An Intro to Dynamic SQL
I would not advise using Dynamic SQL, there are most likely better ways to perform this operation but you would have to provide more info.
You can achieve something like it if you have a left outer join
Consider
declare #param bit = 1
select a.*, b.*, c.* from a
inner join b on a.id = b.a_id
left outer join c on b.id = c.b_id and #param = 1
This will return all columns from a, b, c.
Now try with
declare #param bit = 0
This will return all columns from a and b, and nulls for columns of c.
It won't work if both joins are inner.
No this is not possible. Your best bet would probably be to select from both tables and only include the data your care about. If you provide an example of what you are trying to do I can provide a better answer.
Attempt at an example:
SELECT t1.id, COALESCE(t2.name, t3.name)
FROM Table1 as t1
LEFT JOIN Table2 as t2
ON t1.id = t2.id
LEFT JOIN Table2 as t3
ON t1.id = t3.id
While what you proposed is not possible, you can play with your where conditions:
from tableA a
inner join tableB b ON a.indexed = c.indexed
left join tableC c ON a.indexed = c.indexed AND 1 = CASE #Param WHEN 'Y' THEN 1 ELSE 0 END
More performant would be to just doing a big
IF #Param='Y' THEN
from tableA a
inner join tableB b ON a.indexed = c.indexed
ELSE
from tableA a
inner join tableB b ON a.indexed = c.indexed
left join tableC c ON a.indexed = c.indexed
You haven't revealed you SELECT clause. The essence of what you want is as follows:
SELECT indexed
FROM tableA
INTERSECT
SELECT indexed
FROM tableB
INTERSECT
SELECT indexed
FROM tableC
WHERE #Param = 'Y'
Then use this table expression as dictated by your SELECT clause e.g. say you only want to project tableA:
WITH T
AS
(
SELECT indexed
FROM tableA
INTERSECT
SELECT indexed
FROM tableB
INTERSECT
SELECT indexed
FROM tableC
WHERE #Param = 'Y'
)
SELECT *
FROM tableA
WHERE indexed IN ( SELECT indexed FROM T );