SQL: Get duplicates in same table [duplicate] - sql

This question already has answers here:
How do I find duplicates across multiple columns?
(10 answers)
Closed 4 years ago.
I have the following table:
name email number type
1 abc#example.com 10 A
1 abc#example.com 10 B
2 def#def.com 20 B
3 ggg#ggg.com 30 B
1 abc#example.com 10 A
4 hhh#hhh.com 60 A
I want the following:
Result
name email number type
1 abc#example.com 10 A
1 abc#example.com 10 B
1 abc#example.com 10 A
Basically, I want to find the first lines where the three columns (name, email, number) are identical and see them, regardless of type.
How can I achieve this in SQL? I don't want a result with every combination once, I want to see every line that is in the table multiple times.
I thought of doing a group by but a group by gives me only the unique combinations and every line once. I tried it with a join on the table itself but somehow it got too bloated.
Any ideas?
EDIT: I want to display the type column as well, so group by isn't working and therefore, it's not a duplicate.

You can use exists for that case :
select t.*
from table t
where exists (select 1
from table
where name = t.name and email = t.email and
number = t.number and type <> t.type);
You can also use window function if your DBMS support
select *
from (select *, count(*) over (partition by name, email, number) Counter
from table
) t
where counter > 1;

Core SQL-99 compliant solution.
Have a sub-query that returns name, email, number combinations having duplicates. JOIN with that result:
select t1.*
from tablename t1
join (select name, email, number
from tablename
group by name, email, number
having count(*) > 1) t2
on t1.name = t2.name
and t1.email = t2.email
and t1.number = t2.number

You can use window functions:
select t.*
from (select t.*, count(*) over (partition by name, email, number) as cnt
from t
) t
where cnt > 1;
If you only want combos that have different types (which might be your real problem), I would suggest exists:
select t.*
from t
where exists (select 1
from t t2
where t2.name = t.name and t2.email = t.email and t2.number = t.number and t2.type <> t.type
);
For performance, you want an index on (name, email, number, type) for this version.

Related

Removing rows from result set where column only has one value against a user

I have a result set
name stage value
---- ----- -----
jim 1 4
jim 1 8
paul 1 8
paul 1 8
want to remove the rows where 8 is the only value against a person
keep the 2 jim rows and lose the 2 paul rows
You can use not exists. For a select query:
select t.*
from t
where not exists (select 1
from t t2
where t2.name = t.name and t2.value = 8
);
Similar logic (except using exists rather than not exists) can be used for a delete -- if you really want to delete the rows from the table.
If you have a complex query that you don't want to repeat, then window functions are helpful:
select t.*
from (select t.*,
sum(case when value = 8 then 1 else 0 end) over (partition by name) as cnt_8
from t
) t
where cnt_8 = 0;
If your database support analytical function then you can use count as follows:
Select * from
(Select t.*,
Count(case when value <> 8 then 1 end) over (partition by name) as cnt
From your_table t) t
Where cnt > 0
Assuming you also have an ID column (defined as an auto increment integer) defined in your table this query would select the row with the highest id for each unique combination:
select max(id) from t group by name,stage,value
In your example this would only return the latest id for rows having values paul,1,8 in columns name,stage,value respectively.
You can then use the prior query to filter out any duplciates using it in the where clause:
select * from t
where id in (select max(id) from t group by name,stage,value)
Finally you can also delete rows that are not unique if that's your goal:
delete from t
where not id in (select max(id) from t group by name,stage,value)

SQL: Count lost values by batch

I have a table test with column Batch and ID. I would like to count how many IDs are missing in every batch compared with the earliest batch, like comparing batch 2 vs batch 1 for the value of batch 2 below.
SELECT COUNT(T1.ID) AS LOST_CNT FROM
(SELECT * FROM TEST WHERE BATCH=1)T1
LEFT JOIN (SELECT * FROM TEST WHERE BATCH=2)T2
ON T1.ID=T2.ID WHERE T2.ID IS NULL
I would like to get lost_cnt for every batch as the number of batch will increase over time. Something like below does not return what I want.(I understand why, just putting it here as failed attempt)
SELECT A.BATCH,
COUNT(DISTINCT CASE WHEN A.ID IS NULL THEN M.ID ELSE NULL END) AS lost_cnt
FROM
(SELECT DISTINCT ID FROM TEST WHERE BATCH=(SELECT MIN(BATCH) FROM TEST)) M
LEFT JOIN TEST A ON M.ID=A.ID
GROUP BY 1;
Is there a way to get what I want?
It's not totally clear what you want to achieve, but I guess you want to find how many ids are missing compared to the first batch. You can just filter the table with the id in the first batch, count the number of id's in each batch and subtract from the count for the first batch.
with t as (
select *
from test
where id in (
select id
from test
where batch = (select min(batch) from test)
)
)
select
batch,
(select count(distinct id)
from t
where batch = (select min(batch) from test)
) - count(distinct id) as missing
from t
group by batch
order by batch;
sample data:
batch id
1 1
1 2
1 3
2 2
2 3
2 4
3 3
3 4
results:
batch missing
1 0
2 1
3 2
You can use lag analytical function to find the prev batch and then find the id if exists in previous batch using NOT EXISTS as follows:
SELECT T.BATCH, T.ID
FROM ( SELECT T.BATCH, T.ID,
LAG(BATCH) OVER( ORDER BY BATCH) AS PREV_BATCH
FROM YOUR_TABLE T ) T
WHERE NOT EXISTS (
SELECT 1
FROM YOUR_TABLE TT
WHERE TT.BATCH = T.PREV_BATCH
AND TT.ID = T.ID)
In Hive, I would approach this using window functions:
with firstbatch (
select t.*, count(*) over () as num_in_first_batch
from (select t.*,
min(batch) over () as min_batch
from t
) t
where min_batch = 1
)
select t.batch,
count(fb.id) as num_in_first_batch,
(fb.num_in_first_batch - count(fb.id)) as num_missing_in_first_batch
from t left join
first_batch fb
on t.id = fb.id
group by t.batch, fb.num_in_first_batch;

Combining access sql tables in a query side by side

I have 2 tables containing different data, linked by a column "id", except the id is repeated multiple times
For example,
Table 1:
id grade
1 A
1 C
Table 2:
Id company
1 Alpha
1 Beta
1 Charlie
The number of rows would be inconsistent, table 1 may sometimes have more/less/equal rows compared to table 2. How am I able to combine/merge them into this outcome:
id grade company
1 A Alpha
1 C Beta
1 Charlie
I am using Microsoft access' query.
This is a real pain in MS Access. But you can do it by using a subquery to generate sequence numbers. Here is one method assuming that the rows are unique:
select id, max(grade) as grade, max(company) as company
from ((select id, grade, null as company,
(select count(*)
from table1 as tt1
where tt1.id = t1.id and tt1.grade <= t1.grade
) as seqnum
from table1 as tt1
) union all
(select id, null as grade, company,
(select count(*)
from table2 as tt2
where tt2.id = t2.id and tt2.company <= t1.company
) as seqnum
from table2 as tt2
)
) t12
group by id, seqnum;
This would be much simpler in almost any other database.

how do I make multiple count under having clause

some sample data:
Id name value ref
1 ab xy
2 aba z
3 ab xy
4 abc def
5 gxr mdy
what I am trying to do is to get the two column that appeared more than once
so row 1 and row 3 would be selected.
select name, value from table_x
where value is not null group by name having count(name) >= 2
and having count(value) >= 2;
got stucked.....
#vkp's answer is correct if you only care about finding the distinct name/value pairs that appear more than once. But if you actually want the individual rows that satisfy the criteria, try this:
SELECT t1.Name, t1.[Value]
FROM Table_X t1
JOIN
(
SELECT Name, [Value]
FROM Table_X
where [Value] IS NOT NULL
GROUP BY Name, [Value]
HAVING COUNT(1) >= 2
) t2 ON t1.Name = t2.Name AND t1.[Value] = t2.[Value]
Your syntax is incorrect. group by name and value and check for count >=2 thereafter.
select name, value
from table_x
where value is not null
group by name, value
having count(*) >= 2;

select duplicate columns in sql server [duplicate]

This question already has answers here:
Select statement to find duplicates on certain fields
(9 answers)
Closed 7 years ago.
Table 1
Id Name
1 xxxxx
1 ccccc
2 uuuuu
3 ddddd
I want to select where the Id have multiple entries with same Id
How to do this?
You can find ids with multiple entries and then use LEFT JOIN/IS NOT NULL pattern to retrieve corresponding data from the original table :
SELECT t1.*
FROM tbl t1
LEFT JOIN ( SELECT id
FROM tbl
GROUP BY id
HAVING COUNT(*) > 1) t2 ON t1.id = t2.id
WHERE t2.id IS NOT NULL
Other possible solutions include using EXISTS or IN clauses instead of LEFT JOIN/IS NOT NULL.
With ranking functions
Y as (
select *, count(*) over (partition by id) counter
from X)
select id, name from Y where counter > 1