get all rows from A plus missing rows from B - sql

This seems so obvious but I am failing.
In Teradata SQL, how to get all rows from table A, plus those from table B, that do not occur in table A, based on key field key?
This must have been asked a thousand times. But honestly I do not find the answer.
Full outer join seems to give me duplicate "inner join" results.
--Edit , based on first comment (thanks) --
so if I would do
select * from A
union all
select * from B
left join A
on A.key = B.key
where A.key IS NULL
I guess that would work (untested) but is that the most performant way?

Sometimes EXISTS or NOT EXISTS performs better than joins:
select * from A
union all
select * from B
where not exists (
select 1 from A
where A.key = B.key
)
I assume the key columns are already indexed.

Your version is fine . . . if you select the right columns:
select A.* from A
union all
select B.*
from B left join
A
on A.key = B.key
where A.key IS NULL;
I think Teradata does a good job optimizing joins. That said, EXISTS is also a very reasonable option.

Related

In SQL is there a way to use select * on a join?

Using Snowflake,have 2 tables, one with many columns and the other with a few, trying to select * on their join, get the following error:
SQL compilation error:duplicate column name
which makes sense because my joining columns are in both tables, could probably use select with columns names instead of *, but is there a way I could avoid that? or at least have the query infer the columns names dynamically from any table it gets?
I am quite sure snowflake will let you choose all from both halves of two+ tables via
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
what you will not be able to do is refer to the named of the columns in GROUP BY indirectly, thus this will not work
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY x
even though some databases know because you have JOIN ON a.x = b.x there is only one x, snowflake will not allow it (well it didn't last time I tried this)
but you can with the above use the alias name or the output column position thus both the following will work.
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY a.x
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY 1 -- assuming x is the first column
in general the * and a.* forms are super convenient, but are actually bad for performance.
when selecting you are now are risk of getting the columns back in a different order if the table has been recreated, thus making reading code unstable. Which also impacts VIEWs.
It also means all meta data for the table need to be loaded to know what the complete form of the data will be in. Where if you want x,y,z only and later a w was added to the table, the whole query plan can be compiled faster.
Lastly if you are selecting SELECT * FROM table in a sub-select and only a sub-set of those columns are needed the execution compiler doesn't need to prune these. And if all variables are attached to a correctly aliased table, if later a second table adds the same named column, naked columns are not later ambiguous. Which will only occur when that SQL is run, which might be an "annual report" which doesn't happen that often. wow, what a long use alias rant.
You can prefix the name of the column with the name of the table:
select table_a.id, table_b.name from table_a join table_b using (id)
The same works in combination with *:
select table_a.id, table_b.* from table_a join table_b using (id)
It works in "join" and "where" parts of the statement as well
select table_a.id, table_b.* from table_a join table_b
on table_a.id = table_b.id where table_b.name LIKE 'b%'
You can use table aliases to make the statement sorter:
select a.id, b.* from table_a a join table_b b
on a.id = b.id
Aliases could be applies on fields to use in subqueries, client software and (depending on the SQL server) in the other parts of the statements, for example 'order by':
select a.id as a_id, b.* from table_a a join table_b b
on a.id = b.id order by a_id
If you're after a result that includes all the distinct non-join columns from each table in the join with the join columns included in the output only once (given they will be identical for an inner-join) you can use NATURAL JOIN.
e.g.
select * from d1 natural inner join d2 order by id;
See examples: https://docs.snowflake.com/en/sql-reference/constructs/join.html#examples

LEFT JOIN THREE tables

How to create sql query to select the distinct table A data
as in the image
Thanks
One method is minus:
select . . .
from a
minus
select . . .
from b
minus
select . . .
from c;
Or, not exists:
select a.*
from a
where not exists (select 1 from b where . . . ) and
not exists (select 1 from c where . . . );
You don't clarify what the matching conditions are, so I've used . . . for generality.
These two versions are not the same. The first returns unique combinations of columns from a where those same columns are not in b or c. The second returns all columns from a, where another set is not in b or c.
If you must use LEFT JOIN to implement what is really an anti join, then do this:
SELECT *
FROM a
LEFT JOIN b ON b.a_id = a.a_id
LEFT JOIN c ON c.a_id = a.a_id
WHERE b.a_id IS NULL
AND c.a_id IS NULL
This reads:
FROM: Get all rows from a
LEFT JOIN: Optionally get the matching rows from b and c as well
WHERE: In fact, no. Keep only those rows from a, for which there was no match in b and c
Using NOT EXISTS() is a more elegant way to run an anti-join, though. I tend to not recommend NOT IN() because of the delicate implications around three valued logic - which can lead to not getting any results at all.
Side note on using Venn diagrams for joins
A lot of people like using Venn diagrams to illustrate joins. I think this is a bad habit, Venn diagrams model set operations (like UNION, INTERSECT, or in your case EXCEPT / MINUS) very well. Joins are filtered cross products, which is an entirely different kind of operation. I've blogged about it here.
Select what isn't in B nor C nor in A inner join B inner join C
Select * from A
where A.id not in ( select coalesce(b.id,c.id) AS ID
from b full outer join c on (b.id=c.id) )
or also: --- you don't need a join so jou can avoid doing it
select * from A
where a.id not in (select coalesce (B.ID,C.ID) AS ID from B,C)
I would do like this:
SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL
Someone already ask something related to your question, you should see it
here

How to exclude rows that don't join with another table?

I have two tables, one has primary key other has it as a foreign key.
I want to pull data from the primary table, only if the secondary table does not have an entry containing it's key. Sort of an opposite of a simple inner join, which returns only rows that join together by that key.
SELECT <select_list>
FROM Table_A A
LEFT JOIN Table_B B
ON A.Key = B.Key
WHERE B.Key IS NULL
Full image of join
From aticle : http://www.codeproject.com/KB/database/Visual_SQL_Joins.aspx
SELECT
*
FROM
primarytable P
WHERE
NOT EXISTS (SELECT * FROM secondarytable S
WHERE
P.PKCol = S.FKCol)
Generally, (NOT) EXISTS is a better choice then (NOT) IN or (LEFT) JOIN
use a "not exists" left join:
SELECT p.*
FROM primary_table p LEFT JOIN second s ON p.ID = s.ID
WHERE s.ID IS NULL
Another solution is:
SELECT * FROM TABLE1 WHERE id NOT IN (SELECT id FROM TABLE2)
SELECT P.*
FROM primary_table P
LEFT JOIN secondary_table S on P.id = S.p_id
WHERE S.p_id IS NULL
If you want to select the columns from First Table "which are also present in Second table, then in this case you can also use EXCEPT. In this case, column names can be different as well but data type should be same.
Example:
select ID, FName
from FirstTable
EXCEPT
select ID, SName
from SecondTable
This was helpful to use in COGNOS because creating a SQL "Not in" statement in Cognos was allowed, but it took too long to run. I had manually coded table A to join to table B in in Cognos as A.key "not in" B.key, but the query was taking too long/not returning results after 5 minutes.
For anyone else that is looking for a "NOT IN" solution in Cognos, here is what I did. Create a Query that joins table A and B with a LEFT JOIN in Cognos by selecting link type: table A.Key has "0 to N" values in table B, then added a Filter (these correspond to Where Clauses) for: table B.Key is NULL.
Ran fast and like a charm.

Sql Join Query using MS Access

Hello I Have a problem in getting rows from one table after comparing both. Detail of Both Table are as follows:-
I am using Ms Access database.
TableA is having a data of numeric type (Field Name is A it is primary key)
----------
Field A
==========
1
2
3
4
5
Table B is having data of numeric type ( Field Name is A it is foreign key)
--------
Field A
========
2
4
Now I am using below query which is this
select a.a
from a a
, b b
where a.a <> b.b
I want to show all the data from Table A which is not equal to Table B. But the above query is not working as I described.
Can you help me in this regard.
Regards,
Fawad Munir
In an attempt at clarity, I've used upper case for tables and lower case for fields:
Select A.a
FROM A LEFT OUTER JOIN B ON A.a=B.b
WHERE B.b is null
This will show all the records in A that are not in B (I assume that's what you want).
Read up on Access outer joins. In the query designer you double click the join and select something like "all records from table a and only the matching records in table b".
In your question you said that the name of the field in table B is 'A'. Given that, I'd say that your query should be something like
select a.a
from a, b
where a.a <> b.a
But I'm not sure this will do what you want. I think you're trying to find rows in table A which do not have a matching row in table B, in which case you might try
SELECT A.A
FROM A
LEFT OUTER JOIN B
ON (B.A = A.A)
WHERE B.A IS NULL
Try that and see if it does what you want.
Share and enjoy.
I don't know exactly if Access would accept the syntax, but here how I would do in SQL Server.
select a.a
from TableA a
where a.a NOT IN (
select b.a
from TableB b
)
or even as above-mentioned:
select a.a
from TableA a
left outer join TableB b on b.a = a.a
where b.a IS NULL
Its not entirely clear what you are trying to achieve, but its sounds like you are attempting to solve the common problem of finding rows in Table A missing associated data in Table B. If this is the case, it appears you misunderstand the semantics of the join you tried. In which case, you have 2 problems, because the understanding the the JOIN operation is critical to working with relational databases.
In relation to the first problem, please research how to express a subquery using the IN operator. Something like
... WHERE a NOT IN (SELECT a from b)
In relation to the second problem, try your query without the WHERE restriction, and see what is returned. Once you understand what the join is doing, you will see why applying a WHERE restriction to it will not solve your problem.
If I understand you correctly, you want to see every row in A for which column a contains a value that cannot be found in any column b value of B. You can get this data in several ways.
I think using NOT IN is the clearest, personally:
SELECT * FROM tableA WHERE columnA NOT IN
(SELECT columnB FROM tableB WHERE columnB IS NOT NULL)
Many people prefer a filtered JOIN:
SELECT tableA.* FROM tableA LEFT OUTER JOIN tableB
ON tableA.columnA = tableB.columnB WHERE tableB.columnB IS NULL
There is a NOT EXISTS variant as well:
SELECT * FROM tableA WHERE columnA NOT EXISTS
(SELECT * FROM tableB WHERE columnB = tableA.columnA)

How do I find records that are not joined?

I have two tables that are joined together.
A has many B
Normally you would do:
select * from a,b where b.a_id = a.id
To get all of the records from a that has a record in b.
How do I get just the records in a that does not have anything in b?
select * from a where id not in (select a_id from b)
Or like some other people on this thread says:
select a.* from a
left outer join b on a.id = b.a_id
where b.a_id is null
select * from a
left outer join b on a.id = b.a_id
where b.a_id is null
The following image will help to understand SQL LET JOIN :
Another approach:
select * from a where not exists (select * from b where b.a_id = a.id)
The "exists" approach is useful if there is some other "where" clause you need to attach to the inner query.
SELECT id FROM a
EXCEPT
SELECT a_id FROM b;
You will probably get a lot better performance (than using 'not in') if you use an outer join:
select * from a left outer join b on a.id = b.a_id where b.a_id is null;
SELECT <columnns>
FROM a WHERE id NOT IN (SELECT a_id FROM b)
In case of one join it is pretty fast, but when we are removing records from database which has about 50 milions records and 4 and more joins due to foreign keys, it takes a few minutes to do it.
Much faster to use WHERE NOT IN condition like this:
select a.* from a
where a.id NOT IN(SELECT DISTINCT a_id FROM b where a_id IS NOT NULL)
//And for more joins
AND a.id NOT IN(SELECT DISTINCT a_id FROM c where a_id IS NOT NULL)
I can also recommended this approach for deleting in case we don't have configured cascade delete.
This query takes only a few seconds.
The first approach is
select a.* from a where a.id not in (select b.ida from b)
the second approach is
select a.*
from a left outer join b on a.id = b.ida
where b.ida is null
The first approach is very expensive. The second approach is better.
With PostgreSql 9.4, I did the "explain query" function and the first query as a cost of cost=0.00..1982043603.32.
Instead the join query as a cost of cost=45946.77..45946.78
For example, I search for all products that are not compatible with no vehicles. I've 100k products and more than 1m compatibilities.
select count(*) from product a left outer join compatible c on a.id=c.idprod where c.idprod is null
The join query spent about 5 seconds, instead the subquery version has never ended after 3 minutes.
Another way of writing it
select a.*
from a
left outer join b
on a.id = b.id
where b.id is null
Ouch, beaten by Nathan :)
This will protect you from nulls in the IN clause, which can cause unexpected behavior.
select * from a where id not in (select [a id] from b where [a id] is not null)