remove duplicates from sql union

remove duplicates from sql union - sql

I'm doing some basic sql on a few tables I have, using a union(rightly or wrongly)
but I need remove the duplicates. Any ideas?
select * from calls
left join users a on calls.assigned_to= a.user_id
where a.dept = 4
union
select * from calls
left join users r on calls.requestor_id= r.user_id
where r.dept = 4

Union will remove duplicates. Union All does not.

Using UNION automatically removes duplicate rows unless you specify UNION ALL:
http://msdn.microsoft.com/en-us/library/ms180026(SQL.90).aspx

Others have already answered your direct question, but perhaps you could simplify the query to eliminate the question (or have I missed something, and a query like the following will really produce substantially different results?):
select *
from calls c join users u
on c.assigned_to = u.user_id
or c.requestor_id = u.user_id
where u.dept = 4

Since you are still getting duplicate using only UNION I would check that:
That they are exact duplicates. I mean, if you make a
SELECT DISTINCT * FROM (<your query>) AS subquery
you do get fewer files?
That you don't have already the duplicates in the first part of the query (maybe generated by the left join). As I understand it UNION it will not add to the result set rows that are already on it, but it won't remove duplicates already present in the first data set.

If you are using T-SQL then it appears from previous posts that UNION removes duplicates. But if you are not, you could use distinct. This doesn't quite feel right to me either but it could get you the result you are looking for
SELECT DISTINCT *
FROM
(
select * from calls
left join users a on calls.assigned_to= a.user_id
where a.dept = 4
union
select * from calls
left join users r on calls.requestor_id= r.user_id
where r.dept = 4
)a

If you are using T-SQL you could use a temporary table in a stored procedure and update or insert the records of your query accordingly.

Related

Combine two queries, which include join operations, into a single query

I have "article" table, and "used" table for registration of rentals.
I want to know which articles are free, or in other words, the ones that have never been rented (table article) or the ones that are returned (table used).
I have 2 seperate queries and they work in the way I expected, but I'd want to combine them into a single query.
First query
SELECT a.article_id, a.mark, a.type, a.description
FROM article a
INNER JOIN used u ON u.article_id = a.article_id
WHERE return_date IS NOT NULL
Second query
SELECT article_id, mark, type
FROM article
WHERE NOT EXISTS
(SELECT *
FROM used
WHERE article.article_id = used.article_id)
The first query returns 25 records, while the second query returns 113 records. The final output should return 138 records.
How can I do it?
Thanks in advance for your help.

This is typically carried out by the UNION ALL operator, that adds up the records of one query to the records of the other. Make sure both the two tables you are making this operation on have the same number of fields and corresponding datatypes.
SELECT a.article_id, a.mark, a.type
FROM article a
INNER JOIN used u ON u.article_id = a.article_id
WHERE u.return_date IS NOT NULL
UNION ALL
SELECT article_id, mark, type
FROM article
WHERE NOT EXISTS
(SELECT *
FROM used
WHERE article.article_id = used.article_id)
Although it seems you can simplify this whole query using a single LEFT JOIN operation, hence avoiding making two queries out of it.
SELECT a.article_id, a.mark, a.type
FROM article a
LEFT JOIN used u
ON u.article_id = a.article_id

Prevent duplicate record when inner join query in SQL

I used the inner join command to get the data from two tables.
But, when I run the SQL query.
I got the same record duplicated 48 times.
The SQL query I created is below
SELECT
ABS_LIMIT.B1_NAME, ABS_LIMIT.B2_NAME, ABS_LIMIT.B3_NAME, ABS_LIMIT.ELEM_NAME
FROM
ABS_LIMIT
INNER JOIN
RTU_SCAN ON RTU+SCAN.B1_NAME = ABS_LIMIT.B1_NAME
WHERE
ABS_LIMIT.B3_NAME LIKE 'AMP%';
Does anyone have any idea how to remove the duplicate from the query result?

You never SELECT any columns from RTU_SCAN so you can use EXISTS rather than an INNER JOIN:
SELECT a.B1_NAME,
a.B2_NAME,
a.B3_NAME,
a.ELEM_NAME
FROM ABS_LIMIT a
WHERE EXISTS (SELECT 1 FROM RTU_SCAN r WHERE r.B1_NAME = a.B1_NAME)
AND a.B3_NAME LIKE 'AMP%';
Then, if there are duplicates in RTU_SCAN they will not propagate duplicate rows in the output.
Alternatively, you could use DISTINCT to remove duplicates:
SELECT DISTINCT
a.B1_NAME,
a.B2_NAME,
a.B3_NAME,
a.ELEM_NAME
FROM ABS_LIMIT a
INNER JOIN RTU_SCAN r
ON r.B1_NAME = a.B1_NAME
AND a.B3_NAME LIKE 'AMP%';
However, it will probably be less efficient to generate duplicates and then filter them out using DISTINCT compared to using EXISTS and not generating the duplicates in the first place.

Using Select * in a SQL JOIN returns the wrong id value for the wrong table

I have two tables (PlayerDTO and ClubDTO) and am using a JOIN to fetch data as follows:
SELECT * FROM PlayerDTO AS pl
INNER JOIN ClubDTO AS cl
ON pl.currentClub = cl.id
WHERE cl.nation = 7
This returns the correct rows from PlayerDTO, but in every row the id column has been changed to the value of the currentClub column (eg instead of pl.id 3,456 | pl.currentClub 97, it has become pl.id 97 | pl.currentClub 97).
So I tried the query listing all the columns by name instead of Select *:
SELECT pl.id, pl.nationality, pl.currentClub, pl.status, pl.lastName FROM PlayerDTO AS pl
INNER JOIN ClubDTO AS cl
ON pl.currentClub = cl.id
WHERE cl.nation = 7
This works correctly and doesn’t change any values.
PlayerDTO has over 100 columns (I didn’t list them all above for brevity, but I included them all in the query) but obviously I don’t want to write every column name in every query.
So could somebody please explain why Select * changes the id value and what I need to do to make it work correctly? All my tables have a column called id, is that something to do with it?

SELECT *... is, according to the docs...
shorthand for “select all columns.” (Source: Dev.MySQL.com
Both your tables have id columns, so which should be returned? It's not indicated, so MySQL makes a guess. So select what you want to select...
SELECT pl.id, *otherfieldsyouwant* FROM PlayerDTO AS pl...
Or...
SELECT pl.* FROM PlayerDTO AS pl...
Typically, SELECT * is bad form. The odds you are using every field is astronomically low. And the more data you pull, the slower it is.

Selecting ambiguous column from subquery with postgres join inside

I have the following query:
select x.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
Since id0 is in both sessions and clicked_products, I get the expected error:
column reference "id0" is ambiguous
However, to fix this problem in the past I simply needed to specify a table. In this situation, I tried:
select sessions.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
However, this results in the following error:
missing FROM-clause entry for table "sessions"
How do I return just the id0 column from the above query?
Note: I realize I can trivially solve the problem by getting rid of the subquery all together:
select sessions.id0
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0;
However, I need to do further aggregations and so do need to keep the subquery syntax.

The only way you can do that is by using aliases for the columns returned from the subquery so that the names are no longer ambiguous.
Qualifying the column with the table name does not work, because sessions is not visible at that point (only x is).
True, this way you cannot use SELECT *, but you shouldn't do that anyway. For a reason why, your query is a wonderful example:
Imagine that you have a query like yours that works, and then somebody adds a new column with the same name as a column in the other table. Then your query suddenly and mysteriously breaks.
Avoid SELECT *. It is ok for ad-hoc queries, but not in code.

select x.id from
(select sessions.id0 as id, clicked_products.* from sessions
inner join
clicked_products on
sessions.id0 = clicked_products.session_id0 ) x;
However, you have to specify other columns from the table sessions since you cannot use SELECT *

I assume:
select x.id from (select sessions.id0 id
from sessions
inner join clicked_products
on sessions.id0 = clicked_products.session_id0 ) x;
should work.
Other option is to use Common Table Expression which are more readable and easier to test.
But still need alias or selecting unique column names.
In general selecting everything with * is not a good idea -- reading all columns is waste of IO.

Best way to tune NOT EXISTS in SQL queries

I am trying to tune SQLs which have NOT EXISTS clause in the queries.My database is Netezza.I tried replacing NOT EXISTS with NOT IN and looked at the query plans.Both are looking similar in execution times.Can someone help me regarding this?I am trying to tune some SQL queries.Thanks in advance.
SELECT ETL_PRCS_DT, COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
WHERE NOT EXISTS (
SELECT *
FROM DEV_AM_EDS_1..AM_STATION
WHERE D1.STN_ID = STN_ID
)
GROUP BY ETL_PRCS_DT;

You can try a JOIN:
SELECT ETL_PRCS_DT, COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
LEFT JOIN DEV_AM_EDS_1..AM_STATION TAB2 ON D1.STN_ID = TAB2.STN_ID
WHERE TAB2.STN_ID IS NULL
Try to compare the execution plans. The JOIN might produce the same you already have.

You can try a join, but you sometimes need to be careful. If the join key is not unique in the second table, then you might end up with multiple rows. The following query takes care of this:
SELECT ETL_PRCS_DT,
COUNT (*) TOTAL_PRGM_HOLD_DUE_TO_STATION
FROM DEV_AM_EDS_1..AM_HOLD_TV_PROGRAM_INSTANCE D1
left outer join
(
select distinct STN_ID
from DEV_AM_EDS_1..AM_STATION ams
) ams
on d1.STN_ID = ams.STN_ID
WHERE ams.STN_ID is NULL

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

remove duplicates from sql union - sql

Union will remove duplicates. Union All does not.

Using UNION automatically removes duplicate rows unless you specify UNION ALL: http://msdn.microsoft.com/en-us/library/ms180026(SQL.90).aspx

If you are using T-SQL you could use a temporary table in a stored procedure and update or insert the records of your query accordingly.

Related

Combine two queries, which include join operations, into a single query

Prevent duplicate record when inner join query in SQL

Using Select * in a SQL JOIN returns the wrong id value for the wrong table

Selecting ambiguous column from subquery with postgres join inside

Best way to tune NOT EXISTS in SQL queries

Categories

Resources