How to Get only Duplicate data from one table? - sql

I have one table and many data like duplicate values and single values.
But I want to get only duplicate value data's , not single value.

SELECT columnWithDuplicates, count(*) FROM myTable
GROUP BY columnWithDuplicates HAVING (count(*) > 1);

The above query will show the duplicated values. And once you provide that to a business user, their next question will be what happened? How did these get there? Is there a pattern to the duplicates? What's often more informative is to see the whole rows containing those values to help determine why there are duplicates.
-- this query finds all the values in T that
-- exist in the derived table D where D is the list of
-- all the values in columnWithDuplicates that occur more than once
SELECT DISTINCT
T.*
FROM
myTable T
INNER JOIN
(
-- this identifies the duplicated values
-- courtesy of Brian Roach
SELECT
columnWithDuplicates
, count(*) AS rowCount
FROM
myTable
GROUP BY
columnWithDuplicates
HAVING
(count(*) > 1)
) D
ON D.columnWithDuplicates = T.columnWithDuplicates

Related

SQL : Find Numbers of Rows in a Table according to Criteria

I want to get numbers of rows in a table according to certain criteria.
Please see the below table:-
Herein I want to get numbers of rows according to Column StationTo.
I want to get numbers of rows of each StationTo entries.
You could group by the StationTo and use the aggregate count(*) function:
SELECT StationTo, COUNT(*)
FROM mytable
GROUP BY StationTo
EDIT:
If you just want the number of rows for a single StationTo, you could use a where clause:
SELECT COUNT(*)
FROM mytable
WHERE StationTo = 'P11004400000'
Hi have you master table for stationTo records?
select s.stationto, count(data.*) from stationtomaster
left join data on data.stationto=stationtomaster.stationto
group by s.stationto
Select StationTo,Date, count(*) from table group by StationTo, Datemeaning all the stationTo having rows display their count.
or select count(distinct StationTo) from table or Select count(*) from table where stationTo='yourvalue'

BigQuery how to group by after flattening a collection of tables over timerange

I'm trying to do the following:
combine tables over a timerange using FROM TABLE_DATE_RANGE
FLATTEN that set of data
GROUP BY ColumnX
SELECT ColumnX, SUM(ColumnY), SUM(ColumnZ) over only unique ColumnX values.
here's the gist of my query:
SELECT
r.ColumnX
,SUM(r.ColumnY)
,SUM(r.ColumnZ)
FROM
(
SELECT *
FROM FLATTEN(
(
SELECT
ColumnX
,ColumnY
,ColumnZ
FROM TABLE_DATE_RANGE(projectx.events_,
TIMESTAMP('2015-09-01'), TIMESTAMP('2015-09-08'))), my_funky_object
)
WHERE ColumnY > 10
) r
GROUP BY
r.ColumnX
The problem is, I get a number of rows WAY GREATER than the count of unique values of ColumnX should. So I took a step back and simply outputted the GROUP BY - COUNT of ColumnX in order to debug, and I get the following output!
and I get what looks like an intermediate result.
What is happening and how do I ensure that my outer select only aggregates over unique values of ColumnX?
You're getting the count of each distinct value of ColumnX, but you're only showing the count, not the value.
If your goal is to get an accurate count for the number of distinct values, try something like this:
SELECT
COUNT(*) ct
FROM (
SELECT
1
FROM
... rest of your query ...
GROUP BY r.ColumnX
)
That inner query will give you exactly one row (each with the value 1) for each distinct value of ColumnX. The outer select statement will count the number of such rows.
Another alternative is to use EXACT_COUNT_DISTINCT to get the exact count of rows. That's simpler but less scalable than using GROUP BY.

sql query - filtering duplicate values to create report

I am trying to list all the duplicate records in a table. This table does not have a Primary Key and has been specifically created only for creating a report to list out duplicates. It comprises of both unique and duplicate values.
The query I have so far is:
SELECT [OfficeCD]
,[NewID]
,[Year]
,[Type]
FROM [Test].[dbo].[Duplicates]
GROUP BY [OfficeCD]
,[NewID]
,[Year]
,[Type]
HAVING COUNT(*) > 1
This works right and gives me all the duplicates - that is the number of times it occurs.
But I want to display all the values in my report of all the columns. How can I do that without querying for each record separately?
For example:
Each table has 10 fields and [NewID] is the field which is occuring multiple times.I need to create a report with all the data in all the fields where newID has been duplicated.
Please help.
Thank you.
You need a subquery:
SELECT * FROM yourtable
WHERE NewID IN (
SELECT NewID FROM yourtable
GROUP BY OfficeCD,NewID,Year,Type
HAVING Count(*)>1
)
Additionally you might want to check your tags: You tagged mysql, but the Syntax lets me think you mean sql-server
Try this:
SELECT * FROM [Duplicates] WHERE NewID IN
(
SELECT [NewID] FROM [Duplicates] GROUP BY [NewID] HAVING COUNT(*) > 1
)
select d.*
from Duplicates d
inner join (
select NewID
from Duplicates
group by NewID
having COUNT(*) > 1
) dd on d.NewID = dd.NewID

Single Query to delete and display duplicate records

One of the question asked in an interview was,
One table has 100 records. 50 of them
are duplicates. Is it possible with a single
query to delete the duplicate records
from the table as well as select and
display the remaining 50 records.
Is this possible in a single SQL query?
Thanks
SNA
with SQL Server you would use something like this
DECLARE #Table TABLE (ID INTEGER, PossibleDuplicate INTEGER)
INSERT INTO #Table VALUES (1, 100)
INSERT INTO #Table VALUES (2, 100)
INSERT INTO #Table VALUES (3, 200)
INSERT INTO #Table VALUES (4, 200)
DELETE FROM #Table
OUTPUT Deleted.*
FROM #Table t
INNER JOIN (
SELECT ID = MAX(ID)
FROM #Table
GROUP BY PossibleDuplicate
HAVING COUNT(*) > 1
) d ON d.ID = t.ID
The OUTPUT statement shows the records that get deleted.
Update:
Above query will delete duplicates and give you the rows that are deleted, not the rows that remain. If that is important to you (all in all, the remaining 50 rows should be identical to the 50 deleted rows), you could use SQL Server's 2008 MERGE syntax to achieve this.
Lieven's Answer is a good explanation of how to output the deleted rows. I'd like to add two things:
If you want to do something more with the output other than displaying it, you can specify OUTPUT INTO #Tbl (where #Tbl is a table-var you declare before the deleted);
Using MAX, MIN, or any of the other aggregates can only handle one duplicate row per group. If it's possible for you to have many duplicates, the following SQL Server 2005+ code will help do that:
;WITH Duplicates AS
(
SELECT
ID,
ROW_NUMBER() OVER (PARTITION BY DupeColumn ORDER BY ID) AS RowNum
)
DELETE FROM MyTable
OUTPUT deleted.*
WHERE ID IN
(
SELECT ID
FROM Duplicates
WHERE RowNum > 1
)
Sounds unlikely, at least in ANSI SQL, since a delete only returns the count of the number of deleted rows.

How to keep only one row of a table, removing duplicate rows?

I have a table that has a lot of duplicates in the Name column. I'd
like to only keep one row for each.
The following lists the duplicates, but I don't know how to delete the
duplicates and just keep one:
SELECT name FROM members GROUP BY name HAVING COUNT(*) > 1;
Thank you.
See the following question: Deleting duplicate rows from a table.
The adapted accepted answer from there (which is my answer, so no "theft" here...):
You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.
Example query:
DELETE FROM members
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM members
GROUP BY name
)
In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.
It would probably be easier to select the unique ones into a new table, drop the old table, then rename the temp table to replace it.
#create a table with same schema as members
CREATE TABLE tmp (...);
#insert the unique records
INSERT INTO tmp SELECT * FROM members GROUP BY name;
#swap it in
RENAME TABLE members TO members_old, tmp TO members;
#drop the old one
DROP TABLE members_old;
We have a huge database where deleting duplicates is part of the regular maintenance process. We use DISTINCT to select the unique records then write them into a TEMPORARY TABLE. After TRUNCATE we write back the TEMPORARY data into the TABLE.
That is one way of doing it and works as a STORED PROCEDURE.
If we want to see first which rows you are about to delete. Then delete them.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
Full example at http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
You can join table with yourself by matched field and delete unmatching rows
DELETE t1 FROM table_name t1
LEFT JOIN tablename t2 ON t1.match_field = t2.match_field
WHERE t1.id <> t2.id;
delete dup row keep one
table has duplicate rows and may be some rows have no duplicate rows then it keep one rows if have duplicate or single in a table.
table has two column id and name if we have to remove duplicate name from table
and keep one. Its Work Fine at My end You have to Use this query.
DELETE FROM tablename
WHERE id NOT IN(
SELECT id FROM
(
SELECT MIN(id)AS id
FROM tablename
GROUP BY name HAVING
COUNT(*) > 1
)AS a )
AND id NOT IN(
(SELECT ids FROM
(
SELECT MIN(id)AS ids
FROM tablename
GROUP BY name HAVING
COUNT(*) =1
)AS a1
)
)
before delete table is below see the screenshot:
enter image description here
after delete table is below see the screenshot this query delete amit and akhil duplicate rows and keep one record (amit and akhil):
enter image description here
if you want to remove duplicate record from table.
CREATE TABLE tmp SELECT lastname, firstname, sex
FROM user_tbl;
GROUP BY (lastname, firstname);
DROP TABLE user_tbl;
ALTER TABLE tmp RENAME TO user_tbl;
show record
SELECT `page_url`,count(*) FROM wl_meta_tags GROUP BY page_url HAVING count(*) > 1
delete record
DELETE FROM wl_meta_tags
WHERE meta_id NOT IN( SELECT meta_id
FROM ( SELECT MIN(meta_id)AS meta_id FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) > 1 )AS a )
AND meta_id NOT IN( (SELECT ids FROM (
SELECT MIN(meta_id)AS ids FROM wl_meta_tags GROUP BY page_url HAVING COUNT(*) =1 )AS a1 ) )
Source url
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [emp_id] ORDER BY [emp_id]) AS Row, * FROM employee_salary
)
DELETE FROM CTE
WHERE ROW <> 1