Select statement to find duplicates on certain fields - sql

Can you help me with SQL statements to find duplicates on multiple fields?
For example, in pseudo code:
select count(field1,field2,field3)
from table
where the combination of field1, field2, field3 occurs multiple times
and from the above statement if there are multiple occurrences I would like to select every record except the first one.

To get the list of fields for which there are multiple records, you can use..
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Check this link for more information on how to delete the rows.
http://support.microsoft.com/kb/139444
There should be a criterion for deciding how you define "first rows" before you use the approach in the link above. Based on that you'll need to use an order by clause and a sub query if needed. If you can post some sample data, it would really help.

You mention "the first one", so I assume that you have some kind of ordering on your data. Let's assume that your data is ordered by some field ID.
This SQL should get you the duplicate entries except for the first one. It basically selects all rows for which another row with (a) the same fields and (b) a lower ID exists. Performance won't be great, but it might solve your problem.
SELECT A.ID, A.field1, A.field2, A.field3
FROM myTable A
WHERE EXISTS (SELECT B.ID
FROM myTable B
WHERE B.field1 = A.field1
AND B.field2 = A.field2
AND B.field3 = A.field3
AND B.ID < A.ID)

This is a fun solution with SQL Server 2005 that I like. I'm going to assume that by "for every record except for the first one", you mean that there is another "id" column that we can use to identify which row is "first".
SELECT id
, field1
, field2
, field3
FROM
(
SELECT id
, field1
, field2
, field3
, RANK() OVER (PARTITION BY field1, field2, field3 ORDER BY id ASC) AS [rank]
FROM table_name
) a
WHERE [rank] > 1

To see duplicate values:
with MYCTE as (
select row_number() over ( partition by name order by name) rown, *
from tmptest
)
select * from MYCTE where rown <=1

If you're using SQL Server 2005 or later (and the tags for your question indicate SQL Server 2008), you can use ranking functions to return the duplicate records after the first one if using joins is less desirable or impractical for some reason. The following example shows this in action, where it also works with null values in the columns examined.
create table Table1 (
Field1 int,
Field2 int,
Field3 int,
Field4 int
)
insert Table1
values (1,1,1,1)
, (1,1,1,2)
, (1,1,1,3)
, (2,2,2,1)
, (3,3,3,1)
, (3,3,3,2)
, (null, null, 2, 1)
, (null, null, 2, 3)
select *
from (select Field1
, Field2
, Field3
, Field4
, row_number() over (partition by Field1
, Field2
, Field3
order by Field4) as occurrence
from Table1) x
where occurrence > 1
Notice after running this example that the first record out of every "group" is excluded, and that records with null values are handled properly.
If you don't have a column available to order the records within a group, you can use the partition-by columns as the order-by columns.

CREATE TABLE #tmp
(
sizeId Varchar(MAX)
)
INSERT #tmp
VALUES ('44'),
('44,45,46'),
('44,45,46'),
('44,45,46'),
('44,45,46'),
('44,45,46'),
('44,45,46')
SELECT * FROM #tmp
DECLARE #SqlStr VARCHAR(MAX)
SELECT #SqlStr = STUFF((SELECT ',' + sizeId
FROM #tmp
ORDER BY sizeId
FOR XML PATH('')), 1, 1, '')
SELECT TOP 1 * FROM (
select items, count(*)AS Occurrence
FROM dbo.Split(#SqlStr,',')
group by items
having count(*) > 1
)K
ORDER BY K.Occurrence DESC

Try this query to find duplicate records on multiple fields
SELECT a.column1, a.column2
FROM dbo.a a
JOIN (SELECT column1,
column2, count(*) as countC
FROM dbo.a
GROUP BY column4, column5
HAVING count(*) > 1 ) b
ON a.column1 = b.column1
AND a.column2 = b.column2

You can also try this query to count a distinct() column and order by with your desired column:
select field1, field2, field3, count(distinct (field2))
from table_name
group by field1, field2, field3
having count(field2) > 1
order by field2;

Try this query to have a separate count of each SELECT statement:
select field1, count(field1) as field1Count, field2,count(field2) as field2Counts, field3, count(field3) as field3Counts
from table_name
group by field1, field2, field3
having count(*) > 1

Related

Select with column that no in the group by SQL Server

I want to select a column that is not in the GROUP BY.
My code:
SELECT
dbo.func(field1, field2), field3
FROM
table
WHERE
field4 = 1224
GROUP BY
dbo.func(field1, field2), field3
HAVING
COUNT(id) > 1
And I want to select also the column id like this:
SELECT
id, dbo.func(field1, field2), field3
FROM
table
WHERE
field4 = 1224
GROUP BY
dbo.func(field1, field2), field3
HAVING
COUNT(id) > 1
I suspect that you want to apply a count restriction and then return all matching records from the original table, along with the output of the scalar function. One approach is to use COUNT as analytic function with a partition which corresponds to the columns which appeared in your original GROUP BY clause. The difference here is that we don't actually aggregate the original table.
WITH cte AS (
SELECT id, dbo.func(field1, field2) AS out, field3,
COUNT(id) OVER (PARTITION BY dbo.func(field1, field2), field3) cnt
FROM yourTable
WHERE field4 = 1224
)
SELECT id, out, field3
FROM cte
WHERE cnt > 1;
You could join back to the original table to retrieve the matching row(s) with id:
SELECT t.id
, filter.funresult
, t.field3
FROM table t
JOIN (
SELECT dbo.func(field1,field2) as funresult
, field3
FROM table
WHERE field4 = 1224
GROUP BY
dbo.func(field1,field2)
, field3
HAVING COUNT(id) > 1
) filter
ON filter.funresult = dbo.func(t.field1, t.field2)
AND filter.field3 = t.field3

remove duplicate values of one column by sql query

I have a table with two column, values of one column is null and another column have many duplicated values and blanks. how can I remove duplicated values and blanks from that column by a query?
You can use temporary table for this task as below:
SELECT DISTINCT * INTO #tmpTable FROM MyTable
TRUNCATE TABLE MyTable
INSERT INTO MyTable SELECT * FROM #tmpTable
You can create an empty copy of the table. Then you run an INSERT INTO new_table SELECT DISTINCT * FROM old_table. Finally, drop the old table and rename the new one.
One way of doing this - using CTE
create table #dups (col1 int, col2 int)
insert into #dups
select null,null union all
select null,1 union all
select null,1 union all
select null,1 union all
select null,2 union all
select null,2 union all
select null,3 union all
select null,null
select * from #dups
;WITH cte
AS (SELECT col1,col2,ROW_NUMBER() OVER (PARTITION BY Col1,Col2
ORDER BY ( SELECT 0)) RN
FROM #dups
)
DELETE FROM cte
WHERE RN > 1 OR col2 is null
1) If you want to keep the row with the lowest id value:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
2) If you want to keep the row with the highest id value:
DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
Please get the copy of your table before processing this.
if don't get than send me table name i will put replace your table name.
It is tested with MySQL 5.1
Not sure about other versions
see the link
Delete all Duplicate Rows except for One in MySQL?
If I understood right, you want to avoid for the column2 NULL and blanks?
SELECT COLUMN1, COLUMN2 FROM TABLE
GROUP BY
COLUMN1, COLUMN2
WHERE
COLUMN2 NOT NULL AND COULMN2 <> ''
This query is going to show results only when the COLUMN2 has some data.
try this
DELETE FROM [Table]
WHERE (ColmnB IN
(SELECT ColumnB
FROM [Table] AS Table_1
GROUP BY ColumnB
HAVING (COUNT(*) > 1))) OR
(RTRIM(LTRIM(ISNULL(ColumnB,''))) = '')
table has tow column . first column is null value . second column has duplicate value and blank value.

Select statement for Oracle SQL

I have a table say,
column1 column2
a apple
a ball
a boy
b apple
b eagle
b orange
c bat
c ball
c cork
Now I would like to fetch column1 based on the rows that doesn't contain 'apple' and also ignore values in column1 if any of the rows have 'apple' in it. So in the table above only 'C' must be retured.
I am kind of new to Oracle SQL and I know Select column1 from table where column2 != 'apple' will not work. I need some help with this please.
You could use DISTINCT with NOT IN in following:
QUERY 1 using NOT IN
select distinct col1
from t
where col1 not in (select col1 from t where col2 = 'Apple')
QUERY 2 using NOT EXISTS
As per #jarlh comment you could use NOT EXISTS in following:
select distinct col1
from #t t1
where not exists (select 1 from #t t2 where col2 = 'Apple' and t1.col1 = t2.col1)
SAMPLE DATA
create table t
(
col1 nvarchar(60),
col2 nvarchar(60)
)
insert into t values
('a','apple')
,('a','ball')
,('a','boy')
,('b','apple')
,('b','eagle')
,('b','orange')
,('c','bat')
,('c','ball')
,('c','cork')
Assuming that column1 is NOT NULL you could use:
SELECT DISTINCT t.column1
FROM table_name t
WHERE t.column1 NOT IN (SELECT column1
FROM table_name
WHERE column2 = 'apple');
LiveDemo
To get all columns and rows change DISTINCT t.column1 to *.
Select * from tbl
Left join (
Select column1 from tbl
Where column2 like '%apple%'
Group by column1
) g on tbl.colum1 = g.column1
Where g.column1 is null
Seems to me that you need to find a summary of all colum1 values that have any reference to apple. Then list the rows that have no match to the summary list (g)
If I understand well, you need the values af column1 such that in your table does not exist a row with the same value of column1 and 'apple' in column2; you can translate this in SQL with:
Select column1
from your_table t
where not exists (
select 1
from your_table t2
where t2.column1 = t1.column1
and t2.column2= 'apple'
)
This is only one of the possible ways to get your result, soyou can rewrite it in many ways; I believe this way of writing is similar enough to the logics to clearly explain how a logic could be written in plain SQL.

How to obtain count of record differences in the same table, where there are distinct and nearly-distinct records

I've a table TABLEA with data as below
field1 field2 field3.......field16
123 10-JAN-12 0.8.......ABC
123 10-JAN-12 0.8.......ABC
.
.
.
123 10-JAN-12 0.7.......ABC
245 11-JAN-12 0.3.......CDE
245 11-JAN-12 0.3.......CDE
245 11-JAN-12 0.3.......XYZ
...
<unique rows>
When I do a
select field1, field2, ...field16
from TABLEA
I obtain M records,and when I do a
select distinct field1, field2...field16
from TABLEA
I obtain M-x records, where M is in the Millions and x is a much smaller #.
I am trying to write SQL to get the x records (eventually, just get the count).
I've tried all Set operator keywords like
select field1...field16
from TABLEA
EXCEPT
select distinct field1..field16
from TABLEA
Or using UNION ALL instead of EXCEPT. But none of them return x, instead they all return 0 rows.
You can select the rows that are not distinct by
SELECT field1, ... , field16
FROM tablea
GROUP BY field1, ... , field16
HAVING count(*) > 1
Edit: Another approach would be to use an analytical function ROW_NUMBER(), partitioning by all your field columns. The first (i.e. distinct) row for a given set of fields has ROW_NUMBER = 1, the second = 2, the third = 3 etc. So you can select the x-rows with WHERE ROW_NUMBER > 1.
CREATE TABLE tablea (
field1 NUMBER, field2 DATE, field3 NUMBER, field16 VARCHAR2(10)
);
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.8, 'ABC');
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.8, 'ABC');
INSERT INTO tablea VALUES (123, DATE '2012-01-10', 0.7, 'ABC');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'CDE');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'CDE');
INSERT INTO tablea VALUES (245, DATE '2012-01-11', 0.3, 'XYZ');
To select the duplicate rows x:
SELECT *
FROM (
SELECT field1, field2, field3, field16,
ROWID AS rid,
ROW_NUMBER() OVER (PARTITION BY
field1, field2, field3, field16 ORDER BY ROWID) as rn
FROM tablea
)
WHERE rn > 1;
123 10.01.2012 0.8 ABC AAAJ6mAAEAAAAExAAB 2
245 11.01.2012 0.3 CDE AAAJ6mAAEAAAAExAAE 2
you will get what you want with your own 'Except' query that you have posted above. But you must include the 'ALL' keyword in your except as 'Except Distinct' is the default. So I have just added the ALL keyword below in your query itself:
select field1...field16
from TABLEA
EXCEPT ALL
select distinct field1..field16
from TABLEA
If you want a count of the records of M-x then make the above query a subquery in the FROM clause of another query and have count in that outer query and you would get the count as shown below:
Select count(*)
From
(
select field1...field16
from TABLEA
EXCEPT ALL
select distinct field1..field16
from TABLEA
) B
Guess this is what you are looking for.
Good luck
You are not going to get a count of a row result that is not in your distinct, if your column choices are the same. Distinct is showing a 'DISTINCT' possibility of all results so doing a union all is just going to repeat it and except is never going to find anything as you are limiting out your rows. What are you trying to even do? Try to count where the distincts are happening? The answer you got from Wolfgang does that already.
declare #Table Table ( personID int identity, person varchar(8));
insert into #Table values ('Brett'),('Brett'),('Brett'),('John'),('John'),('Peter');
-- gives me all results
select person
from #Table
-- gives me distinct results (no repeats)
Select distinct person
from #Table
-- gives me nothing as nothing exists that is distinct that is not in total
select person
from #Table
except
select distinct person
from #Table
-- shows me counts of rows repeated by pivoting on one column and counting resultant rows from that. Having clause adds predicate specific logic to hunt for.
-- in this case duplicates or rows greater than one
Select person, count(*)
from #Table
group by person
having count(*) > 1
EDIT you can get a difference of the distinct from the total if that is what you mean:
with dupes as
(
Select count(*) as cnts, sum(count(*)) over() as TotalDupes
from #Table
group by person
having count(*) > 1 -- dupes are defined by rows repeating
)
, uniques as
(
Select count(*) as cnts, sum(count(*)) over() as TotalUniques
from #Table
group by person
having count(*) = 1 -- non dupes are rows of only a single resulting row
)
select distinct TotalDupes - TotalUniques as DifferenceFromRepeatsToUnqiues
from Dupes, Uniques

How to retrieve the rows (with maximum value in a field) having a another common field?

I have a table; let it be called table1; with the following fields and data
alt text http://img228.imageshack.us/img228/3827/45939084.png
I need a query that returns the record with the maximum value in Field3 for each group of records having the same value in Field2. So that the query returns:
alt text http://img87.imageshack.us/img87/62/48847706.png
How could this be done using SQL queries ?
This:
WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY field2 ORDER BY field3 DESC) AS rn
FROM table1
)
SELECT *
FROM q
WHERE rn = 1
or this:
SELECT q.*
FROM (
SELECT DISTINCT field2
FROM table1
) qo
CROSS APPLY
(
SELECT TOP 1 *
FROM table1 t
WHERE t.field2 = qo.field2
ORDER BY
t.field3 DESC
) q
Depending on the field2 cardinality, the first or the second query can be more efficient.
See this article for more details:
SQL Server: Selecting records holding group-wise maximum