Find not unique rows in Oracle SQL - sql

I have a question which looks easy but I can't figure it out.
I have the following:
Name Zipcode
ER 5354
OL 1234
AS 1234
BH 3453
BH 3453
HZ 1234
I want to find those rows where the ID does not define clearly one row.
So here I want to see:
OL 1234
AS 1234
HZ 1234
Or simply the zipcode enough.
I am sorry I forget to mention an important part. If the name is the same its not a problem, only if there are different names for the same zipcode.
So this means: BH 3453 does not return

I think this is what you want
select zipcode
from yourTable
group by zipcode
having count(*) > 1
It selects the zipcodes associated to more than one record
to answer your updated question:
select zipcode
from
(
select name, zipcode
from yourTable
group by name, zipcode
)
group by zipcode
having count(*) > 1
should do it. It might not be optimal in terms of performance in which case you could use window functions as suggested by #a1ex07

Try this:
select yt.*
from YOUR_TABLE yt
, (select zipcode
from YOUR_TABLE
group by zipcode
having count(*) > 1
) m
where yt.zipcode = m.zipcode

If you need just zipcode, use vc 74's solution. For all columns , solution based on window functions supposedly outperforms self join approach:
SELECT a.zipcode, a.name
FROM
(
SELECT zipcode, name, count(1) over(partition by zipcode) as cnt
FROM your_table
)a
WHERE a.cnt >1

Related

sql duplicates showing all data

Given this data
id Name group
1 Jhon 001
2 Paul 002
3 Mary 001
How can I get the duplicates values showing all the fields? The duplicate is only on group, id and name won't be duplicates.
Should end up looking like one of those (any would be valid):
:::::::::::::::::::::::::::::::::::::::::::::::
group count values
001 2 1,3
:::::::::::::::::::::::::::::::::::::::::::::::
id name group
1 Jhon 001
3 Mary 001
I tried with
SELECT
group, COUNT(*)
FROM
people
GROUP BY
group
HAVING
COUNT(*) > 1
But if I try to add id and name to the group by, it won´t find any duplicate.
Thanks in advance.
Try this.
SELECT Id, Name, [Group]
FROM people
WHERE [Group] IN(
SELECT [Group]
FROM people
GROUP BY [Group]
HAVING COUNT(*) > 1)
I would do an inner query to find the groups with more than one member, and then use that inner query to bring back a list of the names.
For example:
SELECT Id, Name, group
FROM people
WHERE group in
(SELECT group
FROM people
GROUP BY group
HAVING count(*) > 1);
Avoid using Group because it is a reserved keyword in SQL :
SELECT *
FROM MyTable
WHERE groups IN(
SELECT groups
FROM MyTable
GROUP BY groups
HAVING COUNT(*) > 1)
Check Execution here
Just use exists:
select p.*
from people p
where exists (select 1
from people p2
where p2.group = p.group and
p2.id <> p.id
);
This should be the most performant solution. With an index on people(group, id), it should have very good performance.
Note: All the advice to avoid using group as a column name is good advice. You should change the name.

listagg function without group by

I have alot of columns in select statement, many of which are derived calculations.
I am trying to group multiple rows into one using listagg() in select statement, but without having to group by rest of columns in select statement. Along the lines of listagg() within group() over (partition by id).
Right now I have something along the lines of:
select id, listagg(distinct annual_bill_rate, ', ') within group (order by bill_rate) as annual_bill_rate, email, state
from table
group by 1,3,4
It doesnt seem to be possible to avoid this group by based on documentation, but are there alternatives? I have 30+ columns, I can't group by all of them. Thank you!
Sample data:
id bill_rate email state
1 0.0035 a#gmail.com NJ
1 0.0045 a#gmail.com NJ
1 0.0055 a#gmail.com NJ
2 0.0065 b#gmail.com NY
2 0.0075 b#gmail.com NY
3 0.0085 c#gmail.com PA
Desired result- WITHOUT GROUP BY:
id bill_rate email state
1 0.0035, 0.0045, 0.0055 a#gmail.com NJ
2 0.0065, 0.0075 b#gmail.com NY
3 0.0085 c#gmail.com PA
Here's a not so great idea to avoid typing the GROUP BY. It will almost definitely be slower and it's much more difficult to read and understand. I would be an unhappy fella if I ran into this in production code:
WITH table_distinct AS
(
SELECT DISTINCT id, email, state
FROM table
)
,table_group_by AS
(
SELECT id, listagg(distinct annual_bill_rate, ', ') within group (order by bill_rate) as annual_bill_rate
FROM table
GROUP BY id
)
SELECT
td.*,
tgb.annual_bill_rate
FROM table_distinct td
INNER JOIN table_group_by tgb
ON td.id = tgb.id;
Now you really only need monkey with that table_distinct CTE to add more columns to your result set.
There is a solution to your problem without a Distinct or Group By clause usage.
You can use LISTAGG as an analytical function as well and then you can remove the duplicates using row_number. PSB -
select * from
(select
id, listagg(annual_bill_rate, ', ') within group (order by bill_rate) over (partition by id order by id) as annual_bill_rate,
email, state, row_number() over (partition by id order by id) RN
from table) Tab where RN=1;

SQL: find duplicates, with a different field

I have to find duplicates in an Access table, where one field is different.
I'll try to explain: assuming to have this data set
ID Country CountryB Customer
====================================================
1 Italy Austria James
2 Italy Austria James
3 USA Austria James
I have to find all the records with duplicated CountryB and Customer, but with different Country.
For instance, with the data above, the ID 1 and 2 are NOT duplicated (as they are from the same Country), while 1 and 3 (or 2 and 3) are.
The "best" query I got is the following one:
SELECT COUNT(*), CountryB, Customer FROM
(SELECT MIN(ID) as MinID, Country, CountryB, Customer FROM myTable GROUP BY Country, CountryB, Customer)
GROUP BY CountryB, Customer
HAVING COUNT(*)>1
I'm not sure if this is the smartest option, anyhow.
Furthermore, since I need to "mark" all the duplicates, I have to do something more, like this:
SELECT ID, a.Country, a.CountryB, a.Customer FROM myTable a
INNER JOIN
(
SELECT COUNT(*), CountryB, Customer FROM
(SELECT MIN(ID) as MinID, Country, CountryB, Customer FROM myTable GROUP BY Country, CountryB, Customer)
GROUP BY CountryB, Customer
HAVING COUNT(*)>1
) dt
ON a.Country=dt.Country and a.CountryB=dt.CountryB and a.Customer=dt.Customer
Any suggestion this approach is greatly appreciated.
I finally found a solution.
The correct solution is in this answer:
SELECT DISTINCT HAVING Count unique conditions
Adapted with this version, since I'm using Access 2010:
Count Distinct in a Group By aggregate function in Access 2007 SQL
Therefore, in my example table above, I can use this query to find duplicate records:
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
or this query to find all the IDs of the duplicated records:
SELECT ID FROM myTable a INNER JOIN
(
SELECT CountryB, Customer, Count(cd.Country)
FROM (SELECT DISTINCT Country, CountryB, Customer FROM myTable) AS cd
GROUP BY CountryB, Customer
HAVING COUNT(*) > 1
) dt
ON a.CountryB=dt.CountryB AND a.Customer=dt.Customer

SQL To Find Most RECENT row Using GROUP BY

SQL newbie hear tearing my hair out trying to work this one out! I have a problem that is similar to this.
I have the following data and all fields are defined as CHARACTER - No DATE or TIME unfortunately thanks to poor design by the original DBA
Surname Name LoginDate LoginTime
Smith John 2014-06-25 13.00
Smith John 2014-06-24 14.00
Smith Susan 2014-06-26 09.00
Smith Susan 2014-06-26 11.30
Jones Bill 2014-06-25 09.30
Jones Bill 2014-06-25 12.30
Jones Bill 2014-06-26 07.00
What i'm trying to get on my output is the most recent log in for each person so I would expect to see
Smith John 2014-06-25 13.00
Smith Susan 2014-06-26 11.30
Jones Bill 2014-06-26 07.00
I've tried different combinations of temporary tables, using CONCAT on the Date and Time and the MAX function but I just keep drawing a blank. I think I know the tools and commands I need to use I just can't seem to string them all together properly.
I know I have to group them by name/surname then somehow combine the date and time in a way that lets me use the MAX function but when I output them I can never seem to get the LoginDate and LoginTime to appear as seperate fields on the output because they're not included in any GROUP BY that I use.
Is anyone able to show me how to do this as I haven't got a lot of hair to start with :)
Try this Query -
With MaxTimeStamp as
(
SELECT Surname, Name, Max(TIMESTAMP(LoginDate, LoginTime)) as LoginDateTime
FROM YourTable
group by Surname, Name
)
select c.Surname, c.Name, d.LoginDate, d.LoginTime,
from MaxTimeStamp c
Join YourTable d
on c.Surname = d.Surname
and c.Name = d.Name
and Date(c.LoginDateTime) = d.LoginDate
and Time(c.LoginDateTime) = d.LoginTime
You don't do this using group by. Here is a method using not exists:
select t.*
from table t
where not exists (select 1
from table t2
where t2.name = t.name and t2.surname = t.surname and
(t2.logindate > t.logindate or
t2.logindate = t.logindate and
t2.logintime > t.logintime
);
This transforms the query to: "Get me all rows from the table where there is no row with the same name and a later login." It would be simpler if the login information were stored as a single datetime.
Also, the above will work in just about all SQL databases. Many databases support window functions which can also be used for this problem.
In addition to Gordon Linoff's answer, if your SQL database supports window functions, you can also use this to get the desired result:
;With Cte As
(
Select *,
Row_Number() Over (Partition By Surname, Name Order By LoginDate Desc, LoginTime Desc) RN
From Table
)
Select Surname, Name, LoginDate, LoginTime
From Cte
Where RN = 1
Why not a group by? i´m missing anything?
Select name, surname, LoginDate, max(LoginTime) from table where (name,surname,login_date) in (select name, surname, max(LoginDate) from table group by name,surname) group by name, surname, LoginDate
If the LoginDate is not a date or similar datatype, it should run i guess...
Br.
I have split it into two parts, to find the newest login per user and do the group by. I then join to the data again to find the record.
select
dt.Surname
, dt.Name
, dt.LoginDate
, dt.LoginTime
from
dataTable dt
join
(select Surname, Name, MAX(LoginDate+LoginTime) ts from #temp group by surname, name) sub
on sub.Name = dt.Name
and sub.Surname = dt.Surname
and dt.LoginDate+dt.LoginTime = ts
WHERE
not sub.Name is null
You can also do it without the need to join, by splitting the timestamp out again.
select
sub.Surname
, sub.Name
, LEFT(ts,10) LoginDate
, RIGHT(ts,5) LoginTime
from
(select Surname, Name, MAX(LoginDate+LoginTime) ts from #temp group by surname, name) sub

Find duplicate records in a table using SQL Server

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
Just add all fields to the query and remember to add them to Group By as well.
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
To get the list of multiple records use following command
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Try this instead
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
Read about the CHECKSUM function first, as there can be duplicates.
Try this
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.
Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.
While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.
Hope this helps!
You can use below methods to find the output
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
Select *
from dbo.sales
group by shoppername
having(count(Item) > 1)
Select EventID,count() as cnt
from dbo.EventInstances
group by EventID
having count() > 1
The following is running code:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )