Remove duplicate id with different description in sql - sql

Hi I have a data like duplicated id but the description is different
id
name
1
A
1
B
How to remove the duplicate? since using DISTINCT will still return all the data

Its not really clear from your question in what way really you wish to remove, given that each id has different metadata attached to it.
Do you just want to de-dup the id as single column or u wish to merge its metadata together so only 1 id remains ?
The simplest one is:
select distinct id from ...
But it looks like its not you meant.
So the second option is, you merge the metadata into an array_agg:
select id, array_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will remove the id duplication and get all metadata into an array.
If you are okay with string_agg you can go with that too:
select id, string_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will give you comma separated values of names:
if you want fancier than this , then u can create a struct for your metadata like: (assuming you have more metadata in real project)
select id, array_agg(struct(name, info)) as metadata
from (select 1 as id, 'A' as name, 'X' as info union all select 1 as id, 'B' as name, 'Y' as info)
group by 1
this will give you:
all other options will make you lose some data, like: if you do min(name) or max(name) to consider only one row per id.
If you could clarify your question a bit better, the community can help you more. For now, I see the above options for you.

Consider below simple approach
select any_value(t).*
from your_table t
group by t.id
if you would have some extra column that identify order of entries - for example ts (timestamp) - you could use below
select any_value(t having min ts).*
from your_table t
group by t.id

Related

Is it possible to UNION distinct rows but disregard one column to determine uniqueness?

select d.id, d.registration_number
from DOCUMENTS d
union
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
Would it be possible to union those results based solely on the uniqueness of the registration_number, disregarding the id of the documents?
Or, is it possible to achieve the same result in a different way?
Just to add: actually I'm unioning 5 queries, each ~20 lines long, with 4 columns that should be disregarded in determining uniqueness.
you basically need to wrap the unioned data with something else to get only the ones you want.
SELECT min(id), registration_number
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents)
GROUP BY registration_number
Union will check the combination of all the columns for uniqueness. You could, however, use union all (that does not remove duplicates) and then apply the logic yourself using the row_number window function:
SELECT id, registration_number
FROM (SELECT id, registration_number,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id) AS rn
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents) u
) r
WHERE rn = 1
Since the other answers are already correct, may I ask why do you need to retrieve other columns in that query since the primary purpose appear to gather unique registration numbers?
Wouldn't it be simpler to first gather unique registration number and then retrieve the other info?
Or in your actual query, first gather the info without the columns that should be disregarded and then gather the info in these column if need be?
Like,for example, making a view with
SELECT d.registration_number
FROM DOCUMENT d
UNION
SELECT dd.registration_number
FROM DIFFERENT_DOCUMENT dd
and then gather information using that view and JOINS?
Assuming registration_number is unique in each table, you can use not exists:
select d.id, d.registration_number
from DOCUMENTS d
union all
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
where not exists (select 1
from DOCUMENTS d
where dd.registration_number = d.registration_number
);

SQL query looping for each value in a list

New to SQL here - I am trying to get 1 row from a table matching to a particular criteria
Typically this would look like
SELECT TOP 1 *
FROM myTable
WHERE id = 'abc'
The output may look like
value id
--------------
1 abc
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's. How would I execute something like
SELECT TOP 1 *
FROM myTable
FOR EACH id
WHERE id IN ('abc', 'edf', 'fgh')
Expecting result like
value id
--------------
1 abc
10 edf
12 fgh
I do not know if it is some sort union or concat operation, but would like to learn. I am working on Azure SQL Server
The table has many entries for an 'id', and I am trying to get one entry per 'id'. Now I have list of 'id's.
A typical method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by id) as seqnum
from mytable t
) t
where seqnum = 1;
Note: you can filter on particular ids, if you want. It is unclear if that is really required for your question.
If you happen to be using SQL Server (as select top suggests), you can use the more concise, but somewhat less performant:
select top (1) with ties t.*
from mytable t
order by row_number() over (order by id order by (select null));

SQL - Selecting unique values from one column then filtering based on another

I've had a search around and have seen quite a few questions about selecting distinct values, but none of them seem close enough to my query to be able to help. This is the scenario
ID Product_ID Product_type
123 56789 A
123 78901 B
456 12345 A
789 45612 B
The SQL I need would be to search in a table similar to the above, and bring back the rows where the Product_type is B but only if the ID related to it exists once within the table.
So in this case it would bring back only
789 45612 B
The SQL I have tried based on what I've found so far was
SELECT DISTINCT(ID)
FROM "TABLE"
WHERE "PRODUCT_TYPE" = 'B'
As well as
SELECT *
FROM "TABLE"
WHERE "PRODUCT_TYPE" = 'B'
GROUP BY "ID"
HAVING COUNT(ID) = 1
And neither have worked
One way via a list of IDs appearing once:
select * from T where Product_type = 'B' and id in (
select id from T
group by id
having count(id) = 1)
Soltuion 1: Use a sub-query to count id's.
select * from table t1
where Product_type = 'B'
and (select count(*) from table
where id = t1.id) = 1
You can use group by for this type of query. However, you cannot filter down to the 'B's before the aggregation.
So, try this:
SELECT t.id, MAX(t.product_id) as product_id,
MAX(t.product_type) as product_type
FROM "TABLE" t
GROUP BY "ID"
HAVING COUNT(*) = 1 AND
MAX(PRODUCT_TYPE) = 'B';
This may look a little bit arcane. But the having clause is guaranteeing that there is only one row and that row has a 'B'. Hence the MAX() functions are returning the max from that one row -- which is the value on that row.
EDIT:
Many databases will also allow you to take advantage of window functions for this:
select t.*
from (select t.*, count(*) over (partition by id) as id_cnt
from table t
) t
where t.product_type = 'B' and id_cnt = 1;

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

Get data origin after a UNION

I have a SQL query like this:
SELECT *
FROM (
(SELECT name FROM man)
UNION
(SELECT name FROM woman )
) AS my_table
ORDER BY name
how can I retrieve the source of my data?
For example if my result is like this:
Bob
Alice
Mario
...
I want to know if the name 'Bob' is retrieve from the 'man' table or from the 'woman' table.
SELECT *
FROM (
(SELECT name, 'man' as source FROM man)
UNION ALL
(SELECT name, 'woman' FROM woman )
) AS my_table
ORDER BY name
I added the UNION ALL becasue if these are mutually exclusive tables, it will be faster. If they are not, then adding the source will make the results mutually exclusive and you wil be able to see where the dups are. If they are not mutually exclusive but you only want to show one record, what business rule do you want to show which record you took?
A select can include a literal string, so the simplest way is probably to do:
SELECT *
FROM (
(SELECT name, 'man' as source FROM man)
UNION
(SELECT name, 'woman' as source FROM woman )
) AS my_table
ORDER BY name
These will only work if there is no intersection of Man & Woman.
If you expect duplicates, you will need to add some magic to the where clause.
and perhaps a 3rd query in the union to cover those where both exist.