sql query - filtering duplicate values to create report - sql

I am trying to list all the duplicate records in a table. This table does not have a Primary Key and has been specifically created only for creating a report to list out duplicates. It comprises of both unique and duplicate values.
The query I have so far is:
SELECT [OfficeCD]
,[NewID]
,[Year]
,[Type]
FROM [Test].[dbo].[Duplicates]
GROUP BY [OfficeCD]
,[NewID]
,[Year]
,[Type]
HAVING COUNT(*) > 1
This works right and gives me all the duplicates - that is the number of times it occurs.
But I want to display all the values in my report of all the columns. How can I do that without querying for each record separately?
For example:
Each table has 10 fields and [NewID] is the field which is occuring multiple times.I need to create a report with all the data in all the fields where newID has been duplicated.
Please help.
Thank you.

You need a subquery:
SELECT * FROM yourtable
WHERE NewID IN (
SELECT NewID FROM yourtable
GROUP BY OfficeCD,NewID,Year,Type
HAVING Count(*)>1
)
Additionally you might want to check your tags: You tagged mysql, but the Syntax lets me think you mean sql-server

Try this:
SELECT * FROM [Duplicates] WHERE NewID IN
(
SELECT [NewID] FROM [Duplicates] GROUP BY [NewID] HAVING COUNT(*) > 1
)

select d.*
from Duplicates d
inner join (
select NewID
from Duplicates
group by NewID
having COUNT(*) > 1
) dd on d.NewID = dd.NewID

Related

Select a NON-DISTINCT column in a query that return distincts rows

The following query returns the results that I need but I have to add the ID of the row to then update it. If I add the ID directly in the select statement it will return me more results then I need because each ID is unique so the DISTINCT statement see the line as unique.
SELECT DISTINCT ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
So basically I need to add ucpse.ID in the Select statement while keeping DISTINCT values for MemberID,ProductID and UserID.
Any Ideas ?
Thank you
According to you comment:
If the data has been duplicated 67 times for a given employee with a given product and a given client, I need to keep only one of thoses records. It's not important which one, so this is why I use DISTINC to obtain unique combinaison of given employee with a given product and a given client.
You can use MIN() or MAX() and GROUP BY instead of DISTINCT
SELECT MAX(ucpse.ID) AS ID, ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
UPDATE:
From you comments I think the below query is what you need
DELETE FROM UserCustomerProductSalaryExceptions
WHERE ID NOT IN ( SELECT MAX(ucpse.ID) AS ID
FROM #UserCustomerProductSalaryExceptions
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
HAVING COUNT(ucpse.ID) >= 2
)
If all you want is to delete the duplicates, this will do it:
WITH X AS
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY MemberID, ProductID, UserID ORDER BY ID) AS DupRowNum<br
FROM UserCustomerProductSalaryExceptions
)
DELETE X WHERE DupRowNum > 1
ID's not necessary - try:
UPDATE uu SET
<your settings here>
FROM UserCustomerProductSalaryExceptions uu
JOIN ( <paste your entire query above here>
) uc ON uc.MemberID=uu.MemberId AND uc.ProductID=uu.ProductId AND uc.UserID=uu.UserId
From the sound of your data structure (which I would STRONGLY advise normalizing as soon as possible), it sounds like you should be updating all the records. It sounds as if each duplicate is important because it contains some information about an employee's relation to a customer or product.
I would probably update all the records. Try this:
UPDATE UCPSE
SET
--Do your updates here
FROM UserCustomerProductSalaryExceptions as ucpse
JOIN
(
SELECT UserID, MemberID, ProductID
FROM UserCustomerProductSalaryExceptions
GROUP BY UserID, MemberID, ProductID
HAVING COUNT(UserID) >= 2
) T
ON ucpse.UserID = T.UserID AND ucpse.MemberID = T.MemberID AND ucpse.ProductID = T.ProductID

Find duplicated rows that are not exactly same

Can i select all rows that have same column value (for example SSN field) but display them all separably. ?
I've searched for this answer but they all have "count(*) and group by" section that demands the rows to be exactly same.
Try This:
SELECT A, B FROM MyTable
WHERE A IN
(
SELECT A FROM MyTable GROUP BY A HAVING COUNT(*)>1
)
I have done with SQL server. But hope this is what you need
Here is another approach, which only references the table once, using an analytic function instead of a subquery to get the duplicate counts It might be faster; it also might not, depending on the particular data.
SELECT * FROM (
SELECT col1, col2, col3, ssn, COUNT(*) OVER (PARTITION BY ssn) ssn_dup_count
)
WHERE ssn_dup_count > 1
ORDER BY ssn_dup_count DESC
SELECT
*
FROM
MyTable
WHERE
EXISTS
(
SELECT
NULL
FROM
MyTable MT
WHERE
MyTable.SameColumnName = MT.SameColumnName
AND MyTable.DifferentColumnName <> MT.DifferentColumnName)
This will fetch the required data and show them in order so that we can see the grouped data together.
SELECT * FROM TABLENAME
WHERE SSN IN
(
SELECT SSN FROM TABLENAMEGROUP BY SSN HAVING COUNT(SSN)>1
)
ORDER BY SSN
Here SSN is the column names fro which similar value check is done.

How to Get only Duplicate data from one table?

I have one table and many data like duplicate values and single values.
But I want to get only duplicate value data's , not single value.
SELECT columnWithDuplicates, count(*) FROM myTable
GROUP BY columnWithDuplicates HAVING (count(*) > 1);
The above query will show the duplicated values. And once you provide that to a business user, their next question will be what happened? How did these get there? Is there a pattern to the duplicates? What's often more informative is to see the whole rows containing those values to help determine why there are duplicates.
-- this query finds all the values in T that
-- exist in the derived table D where D is the list of
-- all the values in columnWithDuplicates that occur more than once
SELECT DISTINCT
T.*
FROM
myTable T
INNER JOIN
(
-- this identifies the duplicated values
-- courtesy of Brian Roach
SELECT
columnWithDuplicates
, count(*) AS rowCount
FROM
myTable
GROUP BY
columnWithDuplicates
HAVING
(count(*) > 1)
) D
ON D.columnWithDuplicates = T.columnWithDuplicates

How to keep only one row of a table, removing duplicate rows?

I have a table that has a lot of duplicates in the Name column. I'd
like to only keep one row for each.
The following lists the duplicates, but I don't know how to delete the
duplicates and just keep one:
SELECT name FROM members GROUP BY name HAVING COUNT(*) > 1;
Thank you.
See the following question: Deleting duplicate rows from a table.
The adapted accepted answer from there (which is my answer, so no "theft" here...):
You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.
Example query:
DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)
In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.
It would probably be easier to select the unique ones into a new table, drop the old table, then rename the temp table to replace it.
#create a table with same schema as members
CREATE TABLE tmp (...);
#insert the unique records
INSERT INTO tmp SELECT * FROM members GROUP BY name;
#swap it in
RENAME TABLE members TO members_old, tmp TO members;
#drop the old one
DROP TABLE members_old;
We have a huge database where deleting duplicates is part of the regular maintenance process. We use DISTINCT to select the unique records then write them into a TEMPORARY TABLE. After TRUNCATE we write back the TEMPORARY data into the TABLE.
That is one way of doing it and works as a STORED PROCEDURE.
If we want to see first which rows you are about to delete. Then delete them.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
Full example at http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
You can join table with yourself by matched field and delete unmatching rows
DELETE t1 FROM table_name t1
LEFT JOIN tablename t2 ON t1.match_field = t2.match_field
WHERE t1.id <> t2.id;
delete dup row keep one
table has duplicate rows and may be some rows have no duplicate rows then it keep one rows if have duplicate or single in a table.
table has two column id and name if we have to remove duplicate name from table
and keep one. Its Work Fine at My end You have to Use this query.
DELETE FROM tablename
WHERE id NOT IN(
SELECT id FROM
(
SELECT MIN(id)AS id
FROM tablename
GROUP BY name HAVING
COUNT(*) > 1
)AS a )
AND id NOT IN(
(SELECT ids FROM
(
SELECT MIN(id)AS ids
FROM tablename
GROUP BY name HAVING
COUNT(*) =1
)AS a1
)
)
before delete table is below see the screenshot:
enter image description here
after delete table is below see the screenshot this query delete amit and akhil duplicate rows and keep one record (amit and akhil):
enter image description here
if you want to remove duplicate record from table.
CREATE TABLE tmp SELECT lastname, firstname, sex
FROM user_tbl;
GROUP BY (lastname, firstname);
DROP TABLE user_tbl;
ALTER TABLE tmp RENAME TO user_tbl;
show record
SELECT `page_url`,count(*) FROM wl_meta_tags GROUP BY page_url HAVING count(*) > 1
delete record
DELETE FROM wl_meta_tags
WHERE meta_id NOT IN( SELECT meta_id
FROM ( SELECT MIN(meta_id)AS meta_id FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) > 1 )AS a )
AND meta_id NOT IN( (SELECT ids FROM (
SELECT MIN(meta_id)AS ids FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) =1 )AS a1 ) )
Source url
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [emp_id] ORDER BY [emp_id]) AS Row, * FROM employee_salary
)
DELETE FROM CTE
WHERE ROW <> 1

Get list of records with multiple entries on the same date

I need to return a list of record id's from a table that may/may not have multiple entries with that record id on the same date. The same date criteria is key - if a record has three entries on 09/10/2008, then I need all three returned. If the record only has one entry on 09/12/2008, then I don't need it.
SELECT id, datefield, count(*) FROM tablename GROUP BY datefield
HAVING count(*) > 1
Since you mentioned needing all three records, I am assuming you want the data as well. If you just need the id's, you can just use the group by query. To return the data, just join to that as a subquery
select * from table
inner join (
select id, date
from table
group by id, date
having count(*) > 1) grouped
on table.id = grouped.id and table.date = grouped.date
The top post (Leigh Caldwell) will not return duplicate records and needs to be down modded. It will identify the duplicate keys. Furthermore, it will not work if your database doesn't allows the group by to not include all select fields (many do not).
If your date field includes a time stamp then you'll need to truncate that out using one of the methods documented above ( I prefer: dateadd(dd,0, datediff(dd,0,#DateTime)) ).
I think Scott Nichols gave the correct answer and here's a script to prove it:
declare #duplicates table (
id int,
datestamp datetime,
ipsum varchar(200))
insert into #duplicates (id,datestamp,ipsum) values (1,'9/12/2008','ipsum primis in faucibus')
insert into #duplicates (id,datestamp,ipsum) values (1,'9/12/2008','Vivamus consectetuer. ')
insert into #duplicates (id,datestamp,ipsum) values (2,'9/12/2008','condimentum posuere, quam.')
insert into #duplicates (id,datestamp,ipsum) values (2,'9/13/2008','Donec eu sapien vel dui')
insert into #duplicates (id,datestamp,ipsum) values (3,'9/12/2008','In velit nulla, faucibus sed')
select a.* from #duplicates a
inner join (select id,datestamp, count(1) as number
from #duplicates
group by id,datestamp
having count(1) > 1) b
on (a.id = b.id and a.datestamp = b.datestamp)
SELECT RecordID
FROM aTable
WHERE SameDate IN
(SELECT SameDate
FROM aTable
GROUP BY SameDate
HAVING COUNT(SameDate) > 1)
If I understand your question correctly you could do something similar to:
select
recordID
from
tablewithrecords as a
left join (
select
count(recordID) as recordcount
from
tblwithrecords
where
recorddate='9/10/08'
) as b on a.recordID=b.recordID
where
b.recordcount>1
http://www.sql-server-performance.com/articles/dba/delete_duplicates_p1.aspx will get you going. Also, http://en.allexperts.com/q/MS-SQL-1450/2008/8/SQL-query-fetch-duplicate.htm
I found these by searching Google for 'sql duplicate data'. You'll see this isn't an unusual problem.
SELECT * FROM the_table WHERE ROW(record_id,date) IN
( SELECT record_id, date FROM the_table
GROUP BY record_id, date WHERE COUNT(*) > 1 )
For matching on just the date part of a Datetime:
select * from Table
where id in (
select alias1.id from Table alias1, Table alias2
where alias1.id != alias2.id
and datediff(day, alias1.date, alias2.date) = 0
)
I think. This is based on my assumption that you need them on the same day month and year, but not the same time of day, so I did not use a Group by clause. From the other posts it looks like I could have more cleverly used a Having clause. Can you use a having or group by on a datediff expression?
I'm not sure I understood your question, but maybe you want something like this:
SELECT id, COUNT(*) AS same_date FROM foo GROUP BY id, date HAVING same_date = 3;
This is just written from my mind and not tested in any way. Read the GROUP BY and HAVING section here. If this is not what you meant, please ignore this answer.
Note that there's some extra processing necessary if you're using a SQL DateTime field. If you've got that extra time data in there, then you can't just use that column as-is. You've got to normalize the DateTime to a single value for all records contained within the day.
In SQL Server here's a little trick to do that:
SELECT CAST(FLOOR(CAST(CURRENT_TIMESTAMP AS float)) AS DATETIME)
You cast the DateTime into a float, which represents the Date as the integer portion and the Time as the fraction of a day that's passed. Chop off that decimal portion, then cast that back to a DateTime, and you've got midnight at the beginning of that day.
Without knowing the exact structure of your tables or what type of database you're using it's hard to answer. However if you're using MS SQL and if you have a true date/time field that has different times that the records were entered on the same date then something like this should work:
select record_id,
convert(varchar, date_created, 101) as log date,
count(distinct date_created) as num_of_entries
from record_log_table
group by convert(varchar, date_created, 101), record_id
having count(distinct date_created) > 1
Hope this helps.
SELECT id, count(*)
INTO #tmp
FROM tablename
WHERE date = #date
GROUP BY id
HAVING count(*) > 1
SELECT *
FROM tablename t
WHERE EXISTS (SELECT 1 FROM #tmp WHERE id = t.id)
DROP TABLE tablename
TrickyNixon writes;
The top post (Leigh Caldwell) will not return duplicate records and needs to be down modded.
Yet the question doesn't ask about duplicate records. It asks about duplicate record-ids on the same date...
GROUP-BY,HAVING seems good to me. I've used it in production before.
.
Something to watch out for:
SELECT ... FROM ... GROUP BY ... HAVING count(*)>1
Will, on most database systems, run in O(NlogN) time. It's a good solution. (Select is O(N), sort is O(NlogN), group by is O(N), having is O(N) -- Worse case. Best case, date is indexed and the sort operation is more efficient.)
.
Select ... from ..., .... where a.data = b.date
Granted only idiots do a Cartesian join. But you're looking at O(N^2) time. For some databases, this also creates a "temporary" table. It's all insignificant when your table has only 10 rows. But it's gonna hurt when that table grows!
Ob link: http://en.wikipedia.org/wiki/Join_(SQL)
select id from tbl where date in
(select date from tbl group by date having count(*)>1)
GROUP BY with HAVING is your friend:
select id, count(*) from records group by date having count(*) > 1