SQL Query Duplicating records - sql

I've got two tables.
Let's call them table_A and table_B.
Table_B contains the ForeignKey of table_A.
Table_A
ID Name
1 A
2 B
3 C
Table_B
ID table_a_fk
1 2
2 3
Now I want to get all the names out of table_a IF table_b does not contain the ID of the record in table_a.
I've tried it with this query:
SELECT a.name
FROM table_a a, table_b b
WHERE a.id != b.table_a_fk
With this Query I'm getting the right result I just get this result like 5times and I don't know why.
Hope someone can explain me that.

Your query creates a cartesian product between your two tables A and B. It is the cartesian product that generates those duplicate values. Instead, you want to use an anti-join, which is most commonly written in SQL using NOT EXISTS
SELECT a.name
FROM table_a a
WHERE NOT EXISTS (
SELECT *
FROM table_b b
WHERE a.id = b.table_a_fk
)
Another way to express an anti-join with NOT IN (only if table_b.table_a_fk is NOT NULL):
SELECT a.name
FROM table_a a
WHERE a.id NOT IN (
SELECT b.table_a_fk
FROM table_b b
)
Another, less common way to express an anti-join:
SELECT a.name
FROM table_a a
LEFT OUTER JOIN table_b b ON a.id = b.table_a_fk
WHERE b.id IS NULL

use distinct
SELECT distinct a.name
FROM table_a a, table_b b
WHERE a.id != b.table_a_fk
or better is...
Select distinct name
from tableA a
Where not exists (Select * from tableB
Where table_a_fk = a.id)

Related

SQL inner join with conditional selection

I am new in SQL. Lets say I have 2 tables one is table_A and the other one is table_B. And I want to create a view with two of them which is view_1.
table_A:
id
foo
1
d
2
e
null
f
table_B
id
name
1
a
2
b
3
c
and when I use this query :
SELECT DISTINCT table_A.id, table_B.name
FROM table_A
INNER JOIN table_B ON table_B.id = table_A.id
the null value in table_A can't be seen in the view_1 since it is not found in table_B. I want view_1 to show also this null row like :
id
name
1
a
2
b
null
no entry
Should I create a 4. table? I couldn't find a way.
Try this Query:
SELECT DISTINCT a.id,(CASE When b.name IS NULL OR b.name = '' Then 'No Entry' else b.name end) name FROM table_A a
LEFT JOIN table_B b on a.id = b.id
You are looking for an outer join. Thus you keep all table_A rows and join table_B rows where they exist. If no match exists, the table_B columns in the joined row are NULL.
You replace NULLs with a value with COALESCE.
SELECT a.id, COALESCE(b.name, 'no entry') AS name
FROM table_a a
LEFT OUTER JOIN table_b b ON b.id = a.id
ORDER BY a.id NULLS LAST;
You haven't tagged your request with your DBMS. Not all DBMS support the NULLS LAST clause.
Please note that there is no DISTINCT in my query. It is not needed. And every time you think you must use DISTINCT, think twice. SELECT DISTINCT is very seldom needed. Most often it is used, because the query is kind of flawed and causes the undesired duplicates itself.

Sparksql to select certain records against 3 tables

I have 3 tables and need to fetch the records as below
Table_A,
Table_B,
Table_C
Select only Table_A records which are common in Table_B & Table_C and ignore which are not common in both Table_B & Table_C finally results would be no duplicates.
Approach 1 Tried: inner join Table_A with Table_B and again separate inner join Table_A with Table_C finally did union.
Ab = Table_A.join(Table_B,Table_A["id"] == Table_B["id"], "inner").select(common columns)
Ac = Table_A.join(Table_C,Table_A["id"] == Table_C["id"], "inner").select(common columns)
result = Ab.union(Ac) <<Got more duplicates>>
result = result,dropDuplicates(["id"])
But still I got the duplicates.
Approach 2 Tried with SparkSql:
Table_A
left outer
Table_B
on A.id = B.id
left outer Table_C
on A.id = c.id
In this Approach, no duplicates but more records than Table_A also the uncommon records.
Any suggestion and best approach would be apprciated
In Spark SQL, I would recommend exists:
select a.*
from table_a a
where exists (select 1 from table_b b on b.id = a.id)
and exists (select 1 from table_c c on c.id = a.id)
This does the filtering you want, and will not duplicate records of table_a in the resuletset, even if there are multiple matches in table_b or table_c.

Oracle SQL: Get all users in one table but not another and join to a third table

I am wondering how to use oracle sql to get all the rows that are in one table but not another. The issue I am having is that the two tables don't have a field in common so I need to join to a third master table.
This is what I've tried which doesn't produce any errors but also produces 0 records which isn't possible but clearly I've done something wrong.
SELECT a.USER_ID, c.AD_ID, c.CREATED_DATE_ FROM $A$ a, $C$ c, $B$ b
WHERE (b.USER_ID IS NULL AND a.CUSTOMER_ID = c.CUSTOMER_ID)
I have three tables:
Table A has fields CUSTOMER_ID & USER_ID
Table B has field USER_ID
Table C has field CUSTOMER_ID
I need all the users that are in table C but not table B. They are all in Table A because that is the master list of users.
Any insight would be greatly appreciated.
SELECT
*
FROM
table_a
WHERE
NOT EXISTS (SELECT * FROM table_b WHERE table_b.user_id = table_a.user_id )
AND EXISTS (SELECT * FROM table_c WHERE table_c.customer_id = table_a.customer_id)
My solution:
select * from TableC tc
join TableA ta on tc.CUSTOMER_ID=ta.CUSTOMER_ID
left join TableB tb on tb.USER_ID=ta.USER_ID
where ta.USER_ID is null
I think you want:
select a.USER_ID, c.AD_ID, c.CREATED_DATE_
from a join
c
on a.customer_id = c.customer_id
where not exists (select 1 from b where b.user_id = a.user_id);

Select Name instead OF ID in table with ID-Ref Column SQL

Lets say we have 2 Tables:
Table A Table B
- A_ID - B_ID
- A_Name - A_ID
I need a select statement, that selects * from Table B showing the A_NAME instead of the A_ID.
By trying it I got the following select statement which ... doesn't work to well. It is giving me a lot of nulls, but no names.
SELECT B_ID,
(select A_NAME from TableA as A where A.A_ID = B.A_ID) as Name
FROM TableB as B
Thanks for all your Answers.
The final solution:
The shown query DOES work (even though it may be slow) and the solutions in the answers also do work.
The problem why it didn't give results for me was because of my data. On another database with the same schema all the commands work.
You should try LEFT JOIN
SELECT
B_ID, A_Name
FROM
tableB B LEFT JOIN tableA A
ON B.A_ID = A.A_ID
you can do it with a join:
SELECT B.B_ID, A.A_Name
FROM B
INNER JOIN A
ON A.A_ID = B.A_ID;
Edit:
If you want only the entries from table b you can to it with a left join, like #jarlh said:
SELECT B.B_ID, A.A_Name
FROM B
LEFT JOIN A
ON A.A_ID = B.A_ID;

select a value where it doesn't exist in another table

I have two tables
Table A:
ID
1
2
3
4
Table B:
ID
1
2
3
I have two requests:
I want to select all rows in table A that table B doesn't have, which in this case is row 4.
I want to delete all rows that table B doesn't have.
I am using SQL Server 2000.
You could use NOT IN:
SELECT A.* FROM A WHERE ID NOT IN(SELECT ID FROM B)
However, meanwhile i prefer NOT EXISTS:
SELECT A.* FROM A WHERE NOT EXISTS(SELECT 1 FROM B WHERE B.ID=A.ID)
There are other options as well, this article explains all advantages and disadvantages very well:
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
For your first question there are at least three common methods to choose from:
NOT EXISTS
NOT IN
LEFT JOIN
The SQL looks like this:
SELECT * FROM TableA WHERE NOT EXISTS (
SELECT NULL
FROM TableB
WHERE TableB.ID = TableA.ID
)
SELECT * FROM TableA WHERE ID NOT IN (
SELECT ID FROM TableB
)
SELECT TableA.* FROM TableA
LEFT JOIN TableB
ON TableA.ID = TableB.ID
WHERE TableB.ID IS NULL
Depending on which database you are using, the performance of each can vary. For SQL Server (not nullable columns):
NOT EXISTS and NOT IN predicates are the best way to search for missing values, as long as both columns in question are NOT NULL.
select ID from A where ID not in (select ID from B);
or
select ID from A except select ID from B;
Your second question:
delete from A where ID not in (select ID from B);
SELECT ID
FROM A
WHERE NOT EXISTS( SELECT 1
FROM B
WHERE B.ID = A.ID
)
This would select 4 in your case
SELECT ID FROM TableA WHERE ID NOT IN (SELECT ID FROM TableB)
This would delete them
DELETE FROM TableA WHERE ID NOT IN (SELECT ID FROM TableB)
SELECT ID
FROM A
WHERE ID NOT IN (
SELECT ID
FROM B);
SELECT ID
FROM A a
WHERE NOT EXISTS (
SELECT 1
FROM B b
WHERE b.ID = a.ID)
SELECT a.ID
FROM A a
LEFT OUTER JOIN B b
ON a.ID = b.ID
WHERE b.ID IS NULL
DELETE
FROM A
WHERE ID NOT IN (
SELECT ID
FROM B)