How to exclude rows that don't join with another table? - sql

I have two tables, one has primary key other has it as a foreign key.
I want to pull data from the primary table, only if the secondary table does not have an entry containing it's key. Sort of an opposite of a simple inner join, which returns only rows that join together by that key.

SELECT <select_list>
FROM Table_A A
LEFT JOIN Table_B B
ON A.Key = B.Key
WHERE B.Key IS NULL
Full image of join
From aticle : http://www.codeproject.com/KB/database/Visual_SQL_Joins.aspx

SELECT
*
FROM
primarytable P
WHERE
NOT EXISTS (SELECT * FROM secondarytable S
WHERE
P.PKCol = S.FKCol)
Generally, (NOT) EXISTS is a better choice then (NOT) IN or (LEFT) JOIN

use a "not exists" left join:
SELECT p.*
FROM primary_table p LEFT JOIN second s ON p.ID = s.ID
WHERE s.ID IS NULL

Another solution is:
SELECT * FROM TABLE1 WHERE id NOT IN (SELECT id FROM TABLE2)

SELECT P.*
FROM primary_table P
LEFT JOIN secondary_table S on P.id = S.p_id
WHERE S.p_id IS NULL

If you want to select the columns from First Table "which are also present in Second table, then in this case you can also use EXCEPT. In this case, column names can be different as well but data type should be same.
Example:
select ID, FName
from FirstTable
EXCEPT
select ID, SName
from SecondTable

This was helpful to use in COGNOS because creating a SQL "Not in" statement in Cognos was allowed, but it took too long to run. I had manually coded table A to join to table B in in Cognos as A.key "not in" B.key, but the query was taking too long/not returning results after 5 minutes.
For anyone else that is looking for a "NOT IN" solution in Cognos, here is what I did. Create a Query that joins table A and B with a LEFT JOIN in Cognos by selecting link type: table A.Key has "0 to N" values in table B, then added a Filter (these correspond to Where Clauses) for: table B.Key is NULL.
Ran fast and like a charm.

Related

What's the purpose of a JOIN where no column from 2nd table is being used?

I am looking through some hive queries we are running as part of analytics on our hadoop cluster, but I am having trouble understanding one. This is the Hive QL query
SELECT
c_id, v_id, COUNT(DISTINCT(m_id)) AS participants,
cast(date_sub(current_date, ${window}) as string) as event_date
from (
select
a.c_id, a.v_id, a.user_id,
case
when c.id1 is not null and a.timestamp <= c.stitching_ts then c.id2 else a.m_id
end as m_id
from (
select * from first
where event_date <= cast(date_sub(current_date, ${window}) as string)
) a
join (
select * from second
) b on a.c_id = b.c_id
left join third c
on a.user_id = c.id1
) dx
group by c_id, v_id;
I have changed the names but otherwise this is the select statement being used to insert overwrite to another table.
Regarding the join
join (
select * from second
) b on a.c_id = b.c_id
b is not used anywhere except for join condition, so is this join serving any purpose at all?
Is it for making sure that this join only has entries where c_id is present in second table? Would a where IN condition be better if thats all this is doing.
Or I can just remove this join and it won't make any difference at all.
Thanks.
Join (any inner, left or right) can duplicate rows if join key in joined dataset is not unique. For example if a contains single row with c_id=1 and b contains two rows with c_id=1, the result will be two rows with a.c_id=1.
Join (inner) can filter rows if join key is absent in joined dataset. I believe this is what it meant to do.
If the goal is to get only rows with keys present in both datasets(filter) and you do not want duplication, and you do not use columns from joined dataset, then better use LEFT SEMI JOIN instead of JOIN, it will work as filter only even if there are duplicated keys in joined dataset:
left semi join (
select c_id from second
) b on a.c_id = b.c_id
This is much safer way to filter rows only which exist in both a and b and avoid unintended duplication.
You can replace join with WHERE IN/EXISTS, but it makes no difference, it is implemented as the same JOIN, check the EXPLAIN output and you will see the same query plan. Better use LEFT SEMI JOIN, it implements uncorrelated IN/EXISTS in efficient way.
If you prefer to move it to the WHERE:
WHERE a.c_id IN (select c_id from second)
or correlated EXISTS:
WHERE EXISTS (select 1 from second b where a.c_id=b.c_id)
But as I said, all of them are implemented internally using JOIN operator.

Delete records from Postgresql

I need to delete records from one table based on inner join condition on other two tables, however query runs for ages.
DELETE FROM public.blacklist WHERE subject_id NOT IN(
SELECT DISTINCT a.id FROM public.subject a
INNER JOIN stg.blacklist_init b ON a.subject_id=b.customer_code);
Any ideas how to achieve this?
Thank you.
You can use NOT EXISTS instead of NOT IN, and I think you don't need a DISTINCT
DELETE FROM public.blacklist bl
WHERE NOT EXISTS (
SELECT 0
FROM public.subject a
INNER JOIN stg.blacklist_init b
ON a.subject_id=b.customer_code
WHERE a.id = bl.subject_id
);

Are the SQL concepts LEFT OUTER JOIN and WHERE NOT EXISTS basically the same?

Whats the difference between using a LEFT OUTER JOIN, rather than a sub-query that starts with a WHERE NOT EXISTS (...)?
No they are not the same thing, as they will not return the same rowset in the most simplistic use case.
The LEFT OUTER JOIN will return all rows from the left table, both where rows exist in the related table and where they does not. The WHERE NOT EXISTS() subquery will only return rows where the relationship is not met.
However, if you did a LEFT OUTER JOIN and looked for IS NULL on the foreign key column in the WHERE clause, you can make equivalent behavior to the WHERE NOT EXISTS.
For example this:
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* IS NULL in the WHERE clause */
WHERE t_related.id IS NULL
Is equivalent to this:
SELECT
t_main.*
FROM t_main
WHERE
NOT EXISTS (
SELECT t_related.id
FROM t_related
WHERE t_main.id = t_related.id
)
But this one is not equivalent:
It will return rows from t_main both having and not having related rows in t_related.
SELECT
t_main.*
FROM
t_main
LEFT OUTER JOIN t_related ON t_main.id = t_related.id
/* WHERE clause does not exclude NULL foreign keys */
Note This does not speak to how the queries are compiled and executed, which differs as well -- this only addresses a comparison of the rowsets they return.
As Michael already answered your question here is a quick sample to illustrate the difference:
Table A
Key Data
1 somedata1
2 somedata2
Table B
Key Data
1 data1
Left outer join:
SELECT *
FROM A
LEFT OUTER JOIN B
ON A.Key = B.Key
Result:
Key Data Key Data
1 somedata1 1
2 somedata2 null null
EXISTS use:
SELECT *
FROM A WHERE EXISTS ( SELECT B.Key FROM B WHERE A.Key = B.Key )
Not Exists In:
SELECT *
FROM A WHERE NOT EXISTS ( SELECT B.Key FROM B WHERE A.Key = B.Key )
Result:
Key Data
2 somedata2
Left outer join is more flexible than where not exists. You must use a left outer join if you want to return any of the columns from the child table. You can also use the left outer join to return records that match the parent table as well as all records in the parent table that have no match. Where not exists only lets you return the records with no match.
However in the case where they do return the equivalent rows and you do not need any of the columns in the right table, then where exists is likely to be the more performant choice (at least in SQL server, I don't know about other dbs).
I suspect the answer ultimately is, both are used (among other constructs) to perform the relational operation antijoin in SQL.
I suspect the OP wanted to know which construct is better when they are functionally the same (ie I want to see only rows where there is no match in the secondary table).
As such, WHERE NOT EXISTS will always be as quick or quicker, so is a good habit to get into.

How to make a one to one left outer join?

I was wondering, is there a way to make a kind of one to one left outer join:
I need a join that matches say table A with table B, for each record on table A it must search for its pair on table B, but there exists only 1 record that matches that condition, so when it has found its pair on B, it must stop and continue with the next row at table A.
What I have is a simple LEFT OUTER JOIN.
select * from A left outer join B on A.ID = B.ID order by (NAME) asc
Thanks in advance!
SQL doesn't work this way. In the first place it does not look at things row-by-row. In the second place what defines the record you want to match on?
Assuming you don't really care which row is selcted, something like this might work:
SELECT *
From tableA
left outer join
(select b.* from tableb b1
join (Select min(Id) from tableb group by id) b2 on b1.id - b2.id) b
on a.id = b.id
BUt it still is pretty iffy that you wil get the records you want when there are multiple records with the id in table b.
The syntax you present in your question is correct. There is no difference in the query for joining on a one-to-one relationship than on a one-to-many.

How can I implement SQL INTERSECT and MINUS operations in MS Access

I have researched and haven't found a way to run INTERSECT and MINUS operations in MS Access. Does any way exist
INTERSECT is an inner join. MINUS is an outer join, where you choose only the records that don't exist in the other table.
INTERSECT
select distinct
a.*
from
a
inner join b on a.id = b.id
MINUS
select distinct
a.*
from
a
left outer join b on a.id = b.id
where
b.id is null
If you edit your original question and post some sample data then an example can be given.
EDIT: Forgot to add in the distinct to the queries.
INTERSECT is NOT an INNER JOIN. They're different. An INNER JOIN will give you duplicate rows in cases where INTERSECT WILL not. You can get equivalent results by:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
on a.PK = b.PK
Note that PK must be the primary key column or columns. If there is no PK on the table (BAD!), you must write it like so:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
AND a.Col3 = b.Col3 ...
With MINUS, you can do the same thing, but with a LEFT JOIN, and a WHERE condition checking for null on one of table b's non-nullable columns (preferably the primary key).
SELECT DISTINCT a.*
FROM a
LEFT JOIN b
on a.PK = b.PK
WHERE b.PK IS NULL
That should do it.
They're done through JOINs. The old fashioned way :)
For INTERSECT, you can use an INNER JOIN. Pretty straightforward. Just need to use a GROUP BY or DISTINCT if you have don't have a pure one-to-one relationship going on. Otherwise, as others had mentioned, you can get more results than you'd expect.
For MINUS, you can use a LEFT JOIN and use the WHERE to limit it so you're only getting back rows from your main table that don't have a match with the LEFT JOINed table.
Easy peasy.
Unfortunately MINUS is not supported in MS Access - one workaround would be to create three queries, one with the full dataset, one that pulls the rows you want to filter out, and a third that left joins the two tables and only pulls records that only exist in your full dataset.
Same thing goes for INTERSECT, except you would be doing it via an inner join and only returning records that exist in both.
No MINUS in Access, but you can use a subquery.
SELECT DISTINCT a.*
FROM a
WHERE a.PK NOT IN (SELECT DISTINCT b.pk FROM b)
I believe this one does the MINUS
SELECT DISTINCT
a.CustomerID,
b.CustomerID
FROM
tblCustomers a
LEFT JOIN
[Copy Of tblCustomers] b
ON
a.CustomerID = b.CustomerID
WHERE
b.CustomerID IS NULL