Join SQL query to get data from two tables - sql

I'm a newbie, just learning SQL and have this question: I have two tables with the same columns. Some registers are in the two tables but others only are in one of the tables. To illustrate, suppose table A = (1,2,3,4), table B=(3,4,5,6), numbers are registers. I need to select all registers in table B if they are not in table A, that is result=(5,6). What query should I use? Maybe a join. Thanks.

You can either use a NOT IN query like this:
SELECT col from A where col not in (select col from B)
or use an outer join:
select A.col
from A LEFT OUTER JOIN B on A.col=B.col
where B.col is NULL
The first is easier to understand, but the second is easier to use with more tables in the query.

Select register from TABLE_B b
Where not exists (Select register from TABLE_A a where a.register = b.register)
I assumed you have a column named register in TABLE_A and TABLE_B

Related

In SQL is there a way to use select * on a join?

Using Snowflake,have 2 tables, one with many columns and the other with a few, trying to select * on their join, get the following error:
SQL compilation error:duplicate column name
which makes sense because my joining columns are in both tables, could probably use select with columns names instead of *, but is there a way I could avoid that? or at least have the query infer the columns names dynamically from any table it gets?
I am quite sure snowflake will let you choose all from both halves of two+ tables via
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
what you will not be able to do is refer to the named of the columns in GROUP BY indirectly, thus this will not work
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY x
even though some databases know because you have JOIN ON a.x = b.x there is only one x, snowflake will not allow it (well it didn't last time I tried this)
but you can with the above use the alias name or the output column position thus both the following will work.
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY a.x
SELECT a.*, b.*
FROM table_a AS a
JOIN table_b AS b
ON a.x = b.x
ORDER BY 1 -- assuming x is the first column
in general the * and a.* forms are super convenient, but are actually bad for performance.
when selecting you are now are risk of getting the columns back in a different order if the table has been recreated, thus making reading code unstable. Which also impacts VIEWs.
It also means all meta data for the table need to be loaded to know what the complete form of the data will be in. Where if you want x,y,z only and later a w was added to the table, the whole query plan can be compiled faster.
Lastly if you are selecting SELECT * FROM table in a sub-select and only a sub-set of those columns are needed the execution compiler doesn't need to prune these. And if all variables are attached to a correctly aliased table, if later a second table adds the same named column, naked columns are not later ambiguous. Which will only occur when that SQL is run, which might be an "annual report" which doesn't happen that often. wow, what a long use alias rant.
You can prefix the name of the column with the name of the table:
select table_a.id, table_b.name from table_a join table_b using (id)
The same works in combination with *:
select table_a.id, table_b.* from table_a join table_b using (id)
It works in "join" and "where" parts of the statement as well
select table_a.id, table_b.* from table_a join table_b
on table_a.id = table_b.id where table_b.name LIKE 'b%'
You can use table aliases to make the statement sorter:
select a.id, b.* from table_a a join table_b b
on a.id = b.id
Aliases could be applies on fields to use in subqueries, client software and (depending on the SQL server) in the other parts of the statements, for example 'order by':
select a.id as a_id, b.* from table_a a join table_b b
on a.id = b.id order by a_id
If you're after a result that includes all the distinct non-join columns from each table in the join with the join columns included in the output only once (given they will be identical for an inner-join) you can use NATURAL JOIN.
e.g.
select * from d1 natural inner join d2 order by id;
See examples: https://docs.snowflake.com/en/sql-reference/constructs/join.html#examples

duplicate query result when join table

I face issue about duplicate data when join table, here my sample data table I have
-- Table A
I want to join with
-- Table B
this my query notation for join both table,
select a.trans_id, name
from tableA a
inner join tableB b
on a.ID_Trans = b.trans_id
and this the result, why I get the duplicating data which should show only two lines of data, please help me to solve this case.
Firstly, as you have been told multiple times in the comments, this is working exactly as you have written, and (more importantly) as intended. You have 2 rows in tableA and those 2 rows match 2 rows in your table tableB according to the ON clause. This means that each join operation, for the each of the rows in tableA, results in 2 rows as well; thus 4 rows (2 * 2 = 4).
Considering that your table, TableA only has one column then it seems that you should be cleaning up that data and deleting the duplicates. There are plenty of examples on how to do that already (example).
Perhaps the column you show us in TableA is one many, and thus instead you have a denormalisation issue, and instead there should be another table with the details of Id_trans and a PRIMARY KEY or UNIQUE CONSTRAINT/INDEX on it. Then you would join fron that table to TableB.
Finally, what you might be after is an EXISTS, which would look like this:
SELECT B.trans_id, B.[name]
FROM dbo.TableB B
WHERE EXISTS(SELECT 1
FROM dbo.TableA A
WHERE A.ID_Trans = B.trans_id); --Odd that it's called ID_Trans in one table, and Trans_ID in another
As the comments mentioned your query does exactly what you asked it to do but I think you wanted something like:
select a.trans_id, a.name, b.name
from tableA a
inner join tableB b on a.trans_id = b.trans_id
group by a.trans_id, a.name, b.name
Since there are two rows in both table with same ID join will make them four. You can use distinct to remove duplicates:
select distinct a.trans_id, name
from tableA a
inner join tableB b
on a.id_trans = b.trans_id
But I would suggest to use exists:
select trans_id, name
from tableB b
exists (select 1 from tableA a where a.trans_id=b.trans_id)

Does wrapping my Coalesce in a subquery make my query more efficient or does it do nothing?

Lets say I have a query where one field can appear in either Table A or Table B but not both. So to retrieve it I use Coalesce.
Something like
Select
...
Coalesce(A.Number,B.Number) Number
...
From Table A
Left Join Table B on A.C= B.C
Now lets say I want to join another table to that Number field
should I just do
Join Table Z on Z.Z = Coalesce(A.Number,B.Number)
Or is it better to wrap my original table in a query and join on the definite result. So something like
Select * from (
Select
...
Coalesce(A.Number,B.Number) Number
...
From Table A
Left Join Table B on A.C= B.C
) T
left join Table Z on Z.Number= T.Number
Does this make a difference?
if i were joining another table to the result of the first query instead of a sub query i would place the first part in a CTE whenever possible, i believe the performance would be the same as a subquery but CTEs are more readable in my opinion.
with cte1 as
(
Select
...
Coalesce(A.Number,B.Number) Number
...
From Table A
Left Join Table B
on A.C= B.C
)
select *
from cte1 a
Join Table Z
on Z.Z = a.number

Join two tables based on the partial matched column

i want to join two tables in hive based on the partial match. So far, i tried with the below SQL query:
select * from tableA a join tableB b on a.id like '%'+b.id+'%';
and instr but nothing working, is there a way?
JOIN in Hive only support equation conditions. Plus, you should use CONCAT for your string concatenation.
This is one solution instead.
select *
from tableA a, tableB b
where a.id like concat('%',b.id,'%')

Sql Join Query using MS Access

Hello I Have a problem in getting rows from one table after comparing both. Detail of Both Table are as follows:-
I am using Ms Access database.
TableA is having a data of numeric type (Field Name is A it is primary key)
----------
Field A
==========
1
2
3
4
5
Table B is having data of numeric type ( Field Name is A it is foreign key)
--------
Field A
========
2
4
Now I am using below query which is this
select a.a
from a a
, b b
where a.a <> b.b
I want to show all the data from Table A which is not equal to Table B. But the above query is not working as I described.
Can you help me in this regard.
Regards,
Fawad Munir
In an attempt at clarity, I've used upper case for tables and lower case for fields:
Select A.a
FROM A LEFT OUTER JOIN B ON A.a=B.b
WHERE B.b is null
This will show all the records in A that are not in B (I assume that's what you want).
Read up on Access outer joins. In the query designer you double click the join and select something like "all records from table a and only the matching records in table b".
In your question you said that the name of the field in table B is 'A'. Given that, I'd say that your query should be something like
select a.a
from a, b
where a.a <> b.a
But I'm not sure this will do what you want. I think you're trying to find rows in table A which do not have a matching row in table B, in which case you might try
SELECT A.A
FROM A
LEFT OUTER JOIN B
ON (B.A = A.A)
WHERE B.A IS NULL
Try that and see if it does what you want.
Share and enjoy.
I don't know exactly if Access would accept the syntax, but here how I would do in SQL Server.
select a.a
from TableA a
where a.a NOT IN (
select b.a
from TableB b
)
or even as above-mentioned:
select a.a
from TableA a
left outer join TableB b on b.a = a.a
where b.a IS NULL
Its not entirely clear what you are trying to achieve, but its sounds like you are attempting to solve the common problem of finding rows in Table A missing associated data in Table B. If this is the case, it appears you misunderstand the semantics of the join you tried. In which case, you have 2 problems, because the understanding the the JOIN operation is critical to working with relational databases.
In relation to the first problem, please research how to express a subquery using the IN operator. Something like
... WHERE a NOT IN (SELECT a from b)
In relation to the second problem, try your query without the WHERE restriction, and see what is returned. Once you understand what the join is doing, you will see why applying a WHERE restriction to it will not solve your problem.
If I understand you correctly, you want to see every row in A for which column a contains a value that cannot be found in any column b value of B. You can get this data in several ways.
I think using NOT IN is the clearest, personally:
SELECT * FROM tableA WHERE columnA NOT IN
(SELECT columnB FROM tableB WHERE columnB IS NOT NULL)
Many people prefer a filtered JOIN:
SELECT tableA.* FROM tableA LEFT OUTER JOIN tableB
ON tableA.columnA = tableB.columnB WHERE tableB.columnB IS NULL
There is a NOT EXISTS variant as well:
SELECT * FROM tableA WHERE columnA NOT EXISTS
(SELECT * FROM tableB WHERE columnB = tableA.columnA)