SQL LEFT outer join with only some rows from the right? - sql

I have two tables TABLE_A and TABLE_B having the joined column as the employee number EMPNO.
I want to do a normal left outer join. However, TABLE_B has certain records that are soft-deleted (status='D'), I want these to be included. Just to clarify, TABLE_B could have active records (status= null/a/anything) as well as deleted records, in this case i don't want that employee in my result. If however there are only deleted records of the employee in TABLE_B i want the employee to be included in the result.I hope i'm making my requirement clear. (I could do a lengthy qrslt kind of thingy and get what I want, but I figure there has to be a more optimized way of doing this using the join syntax). Would appreciate any suggestions(even without the join). His newbness is trying the following query without the desired result:
SELECT TABLE_A.EMPNO
FROM TABLE_A
LEFT OUTER JOIN TABLE_B ON TABLE_A.EMPNO = TABLE_B.EMPNO AND TABLE_B.STATUS<>'D'
Much appreciate any help.

Just to clarify -- all records from TABLE_A should appear, unless there are rows in table B with statues other than 'D'?
You'll need at least one non-null column on B (I'll use 'B.ID' as an example, and this approach should work):
SELECT TABLE_A.EMPNO
FROM TABLE_A
LEFT OUTER JOIN TABLE_B ON
(TABLE_A.EMPNO = TABLE_B.EMPNO)
AND (TABLE_B.STATUS <> 'D' OR TABLE_B.STATUS IS NULL)
WHERE
TABLE_B.ID IS NULL
That is, reverse the logic you might think -- join onto TABLE_B only where you have rows that would exclude TABLE_A entries, and then use the IS NULL at the end to exclude those. This means that only those which didn't match (those with no row in TABLE_B, or with only 'D' rows) get included.
An alternative might be
SELECT TABLE_A.EMPNO
FROM TABLE_A
WHERE NOT EXISTS (
SELECT * FROM TABLE_B
WHERE TABLE_B.EMPNO = TABLE_A.EMPNO
AND (TABLE_B.STATUS <> 'D' OR TABLE_B.STATUS IS NULL)
)

The following query will get you the employee records that aren't deleted, or only the employ only has deleted records.
select
a.*
from
table_a a
left join table_b b on
a.empno = b.empno
where
b.status <> 'D'
or (b.status = 'D' and
(select count(distinct status) from table_b where empno = a.empno) = 1)
This is in ANSI SQL, but if I knew your RDBMS, I could give a more specific solution that may be a bit more elegant.

ah crud, this apparently works ><
SELECT TABLE_A.EMPNO
FROM TABLE_A
LEFT OUTER JOIN TABLE_B ON TABLE_A.EMPNO = TABLE_B.EMPNO
where TABLE_B.STATUS<>'D'
If you guys have any extra info to chime in with though, please feel free.
UPDATE:
Saw this question after sometime and thought i'll add more helpful info: This link has good info regarding ANSI syntax - http://www.oracle-base.com/articles/9i/ANSIISOSQLSupport.php
In particular this part from the linked page is informative:
Extra filter conditions can be added to the join to using AND to form a complex join. These are often necessary when filter conditions are required to restrict an outer join. If these filter conditions are placed in the WHERE clause and the outer join returns a NULL value for the filter column the row would be thrown away. if the filter condition is coded as part of the join the situation can be avoided.

SELECT A.*, B.*
FROM
Table_A A
INNER JOIN Table_B B
ON A.EmpNo = B.EmpNo
WHERE
NOT EXISTS (
SELECT *
FROM Table_B X
WHERE
A.EmpNo = X.EmpNo
AND X.Status <> 'D'
)
I think this does the trick. The left join is not needed because you only want to include employees with all (and at least one) deleted rows.

This is how I understand the question. You need to include only those employees for which either of the following is true:
an employee has only (soft-)deleted rows in TABLE_B;
an employee has only non-deleted rows in TABLE_B;
an employee has no rows in TABLE_B at all.
In other words, if an employee has both deleted and non-deleted rows in TABLE_B, omit that employee, otherwise include them.
This is how I think it could be solved:
SELECT DISTINCT a.EMPNO
FROM TABLE_A a
LEFT JOIN TABLE_B b1 ON a.EMPNO = b1.EMPNO
LEFT JOIN TABLE_B b2 ON b1.EMPNO = b2.EMPNO
AND (b1.STATUS = 'D' AND (b2.STATUS <> 'D' OR b2 IS NULL) OR
b2.STATUS = 'D' AND (b1.STATUS <> 'D' OR b1 IS NULL))
WHERE b2.EMPNO /* or whatever non-nullable column there is */ IS NULL
Alternatively, though, you could use grouping:
SELECT a.EMPNO
FROM TABLE_A a
LEFT JOIN TABLE_B b ON a.EMPNO = b1.EMPNO
GROUP BY a.EMPNO
HAVING 0 IN (COUNT(CASE b.STATUS WHEN 'D' THEN 1 ELSE NULL END),
COUNT(CASE b.STATUS WHEN 'D' THEN NULL ELSE 1 END))

Related

SQL inner join with conditional selection

I am new in SQL. Lets say I have 2 tables one is table_A and the other one is table_B. And I want to create a view with two of them which is view_1.
table_A:
id
foo
1
d
2
e
null
f
table_B
id
name
1
a
2
b
3
c
and when I use this query :
SELECT DISTINCT table_A.id, table_B.name
FROM table_A
INNER JOIN table_B ON table_B.id = table_A.id
the null value in table_A can't be seen in the view_1 since it is not found in table_B. I want view_1 to show also this null row like :
id
name
1
a
2
b
null
no entry
Should I create a 4. table? I couldn't find a way.
Try this Query:
SELECT DISTINCT a.id,(CASE When b.name IS NULL OR b.name = '' Then 'No Entry' else b.name end) name FROM table_A a
LEFT JOIN table_B b on a.id = b.id
You are looking for an outer join. Thus you keep all table_A rows and join table_B rows where they exist. If no match exists, the table_B columns in the joined row are NULL.
You replace NULLs with a value with COALESCE.
SELECT a.id, COALESCE(b.name, 'no entry') AS name
FROM table_a a
LEFT OUTER JOIN table_b b ON b.id = a.id
ORDER BY a.id NULLS LAST;
You haven't tagged your request with your DBMS. Not all DBMS support the NULLS LAST clause.
Please note that there is no DISTINCT in my query. It is not needed. And every time you think you must use DISTINCT, think twice. SELECT DISTINCT is very seldom needed. Most often it is used, because the query is kind of flawed and causes the undesired duplicates itself.

SQL function to create a one-to-one match between two tables?

I am trying to join 2 tables. Table_A has ~145k rows whereas Table_B has ~205k rows.
They have two columns in common (i.e. ISIN and date). However, when I execute this query:
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.date = B.date
WHERE A.isin = B.isin
I get a table with more than 147k rows. How is it possible? Shouldn't it return a table with at most ~145k rows?
What you are seeing indicates that, for some of the records in Table_A, there are several records in Table_B that satisfy the join conditions (equality on the (date, isin) tuple).
To exhibit these records, you can do:
select B.date, B.isin
from Table_A
join Table_B on A.date = B.date and A.isin = B.isin
group by B.date, B.isin
having count(*) > 1
It's up to you to define how to handle those duplicates. For example:
if the duplicates have different values in column column_name, then you can decide to pull out the maximum or minimum value
or use another column to filter on the top or lower record within the duplicates
if the duplicates are true duplicates, then you can use select distinct in a subquery to dedup them before joining
... other solutions are possible ...
If you want one row per table A, then use outer apply:
SELECT A.*,
B.column_name
FROM Table_A a OUTER APPLY
(SELECT TOP (1) b.*
FROM Table_B b
WHERE A.date = B.date AND A.isin = B.isin
ORDER BY ? -- you can specify *which* row you want when there are duplicates
) b;
OUTER APPLY implements a lateral join. The TOP (1) ensures that at most one row is returned. The OUTER (as opposed to CROSS) ensures that nothing is filtered out. In this case, you could also phrase it as a correlated subquery.
All that said, your data does not seem to be what you really expect. You should figure out where the duplicates are coming from. The place to start is:
select b.date, b.isin, count(*)
from tableb b
group by b.date, b.isin
having count(*) >= 2;
This will show you the duplicates, so you can figure out what to do about them.
Duplicate possibilities is already discuss.
When millions of records are use in join then often due to poor Cardianility Estimate,
record return are not accurate.
For this just change join order,
SELECT A.*,
B.column_name
FROM Table_A
JOIN
Table_B ON A.isin = B.isin
and
A.date = B.date
Also create non clustered index on both table.
Create NonClustered index isin_date_table_A on Table_A(isin,date)include(*Table_A)
*Table_A= comma seperated list Table_A column which is require in resultset
Create NonClustered index isin_date_table_B on Table_B(isin,date)include(column_nameA)
Update STATISTICS Table_A
Update STATISTICS Table_B
Keeping the DATE columns of both tables in the same format in the JOIN condition you should be getting the result as expected.
Select A.*, B.column_name
from Table_A
join Table_B on to_date(a.date,'DD-MON-YY') = to_date(b.date,'DD-MON-YY')
where A.isin = B.isin

Outer join not showing extra entries

I have made a query in which 3 tables are used. The first table has all the desired names which I need. The 2nd and 3rd table give me those names on which there is some bill amount. But I need all the names from the 1st table as well.
SELECT a.name,
nvl(c.bill_amount,0)
FROM table_1 a left outer join table_2 b
ON a.name = b.name
left outer join table_3 c on B.phone_number = C.phone_number
AND B.email = C.email
where b.status = 'YES'
and a.VALID = 'Y';
Now, the tables b and c give me limited number of names, lets say 5 on which bill is there. But in table_1, there are 10 names. I want to display them also with 0 bill_amount on their name. I'm using Oracle.
Applying a where clause on the right hand tale basiically makes it an inner join. To keep it OUTER, put the condition in the join conditions
Try:
SELECT a.name,
nvl(c.bill_amount,0)
FROM table_1 a
left outer join table_2 b
ON a.name = b.name
and b.status = 'YES' -- Put it here
left outer join table_3 c
on B.phone_number = C.phone_number
AND B.email = C.email
where a.VALID = 'Y'; -- Only items from the left hand table should go in the where clause
The answer above is right , I would just want to be more precise. The fact is when a left join does not match, the column of the right hand table are set to NULL.
Actually NULL is always propagating value in SQL, so b.status = 'YES' have the value NULL if the join does not math, and then the predicate does not match neither.
The general way to handle this would be (b.status = 'YES' or b.name IS NULL) : because b.name is the join column it is null if and only if join does not match, which may not be the case for b.status.
Because NULL is propagating you cannot use field = NULL but field IS NULL instead.
But it is okay to have it in the join clause when it is more clear.

Join Tables on Date Range in Hive

I need to join tableA to tableB on employee_id and the cal_date from table A need to be between date start and date end from table B. I ran below query and received below error message, Would you please help me to correct and query. Thank you for you help!
Both left and right aliases encountered in JOIN 'date_start'.
select a.*, b.skill_group
from tableA a
left join tableB b
on a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
RTFM - quoting LanguageManual Joins
Hive does not support join conditions that are not equality conditions
as it is very difficult to express such conditions as a map/reduce
job.
You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.
If your situation allows, do it in two queries.
First with the full join, which can have the range; Then with an outer join, matching on all the columns, but include a where clause for where one of the fields is null.
Ex:
create table tableC as
select a.*, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end;
with c as (select * from TableC)
insert into tableC
select a.*, cast(null as string) as skill_group
from tableA a
left join c
on (a.employee_id= c.employee_id
and a.cal_date = c.cal_date)
where c.employee_id is null ;
MarkWusinich had a great solution but with one major issue. If table a has an employee ID twice within the date range table c will also have that employee_ID twice (if b was unique if not more) creating 4 records after the join. As such if A is not unique on employee_ID a group by will be necessary. Corrected below:
with C as
(select a.employee_id, b.skill_group
from tableA a
, tableB b
where a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
group by a.employee_id, b.skill_group
) C
select a.*, c.skill_group
from tableA a
left join c
on a.employee_id = c.employee_id
and a.cal_date = c.cal_date;
Please note: If B was somehow intentionally not distinct on (employee_id, skill_group), then my query above would also have to be modified to appropriately reflect that.

SQL Where Condition with IF Statement

So I'm trying to write a query to pull some data and I have one condition that needs to be met and I can't seem to figure out how to actually execute it. What I'm trying to achieve is that if a column is not null in one table, then I want to check another table and see if there is a specific value in one those columns. So in a psuedo code type of way I'm trying to do this
SELECT id, user_name, created_date, transaction_number
FROM TableA
WHERE (IF TableA.response_id IS NULL OR
IF (SELECT type_id from TableB WHERE (type_id NOT IN ('4)) AND (id = TableA.response_id))
So from here what I'm trying to do is pull all transactions for customers that have no responses in them, but from those that do have responses I still want to grab transaction that's don't have a specific code associated to them. I'm not sure if it's possible to do it in this manner or if I need to create some temporary tables that can then be manipulated but I'm stuck on this one condition.
At first I thought you wanted the CASE statement from the wording of your question, but I think you're just looking for an OUTER JOIN with an OR statement:
SELECT DISTINCT a.id, a.user_name, a.created_date, a.transaction_number
FROM TableA A
LEFT JOIN TableB B ON A.response_id = B.Id
WHERE A.response_id IS NULL
OR B.type_id NOT IN (4)
A Visual Explanation of SQL Joins
where TableA.Response_id is null or (select count(1) from TableB WHERE (type_id NOT IN ('4)) AND (id = TableA.response_id)) = 0
provided that your subquery is logically correct.
Well I'm not 100% certain I follow, but assuming what you want is to see if there are response entries for a particular ID in Table A I think you want something like this.
SELECT a.id, user_name, created_date, transaction_number
FROM TableA a
LEFT JOIN TableB b
ON a.id=b.id
LEFT JOIN TableC c
ON a.id=c.id
WHERE isnull(b.id,c.id) IS NOT NULL
GROUP BY a.id, user_name, created_date, transaction_number
ISNULL will return b.id if it is not null, c.id if c.id is not null and NULL otherwise. That will tell you if there's a response for a.id in either TableB or TableC. That's assuming TableB & TableC are more like logs. If you're saying those table will certainly have an entry for a.id then it's just a matter of replacing b.id & c.id with b.[response_column] & c.[response_column] respectively.