Find duplicated rows that are not exactly same

Find duplicated rows that are not exactly same - sql

Can i select all rows that have same column value (for example SSN field) but display them all separably. ?
I've searched for this answer but they all have "count(*) and group by" section that demands the rows to be exactly same.

Try This:
SELECT A, B FROM MyTable
WHERE A IN
(
SELECT A FROM MyTable GROUP BY A HAVING COUNT(*)>1
)
I have done with SQL server. But hope this is what you need

Here is another approach, which only references the table once, using an analytic function instead of a subquery to get the duplicate counts It might be faster; it also might not, depending on the particular data.
SELECT * FROM (
SELECT col1, col2, col3, ssn, COUNT(*) OVER (PARTITION BY ssn) ssn_dup_count
)
WHERE ssn_dup_count > 1
ORDER BY ssn_dup_count DESC

SELECT
*
FROM
MyTable
WHERE
EXISTS
(
SELECT
NULL
FROM
MyTable MT
WHERE
MyTable.SameColumnName = MT.SameColumnName
AND MyTable.DifferentColumnName <> MT.DifferentColumnName)

This will fetch the required data and show them in order so that we can see the grouped data together.
SELECT * FROM TABLENAME
WHERE SSN IN
(
SELECT SSN FROM TABLENAMEGROUP BY SSN HAVING COUNT(SSN)>1
)
ORDER BY SSN
Here SSN is the column names fro which similar value check is done.

Related

How to delete the duplicate data in table (Postgres)

I want to delete the duplicated data in a table , I know there is a way use
SELECT
fruit,
COUNT( fruit )
FROM
basket
GROUP BY
fruit
HAVING
COUNT( fruit )> 1
ORDER BY
fruit;
to find them , buy I need to determine every column's value is equal , which means tableA.* = tableA.* (except id , id is the auto-increment primary key )
and I tried this:
SELECT
*,
COUNT( * )
FROM
myTable
GROUP BY
*
HAVING
COUNT( * )> 1
ORDER BY
id;
but it says I can't use GROUP BY * , so how can I find & delete the duplicated data(need every column's value is equal except id)?

using
SELECT * DISTINCT
DISTINCT remove duplicated result

You need to try something similar to be below query. You apply PARTITION BY for the columns other than Id (as it is incrementing unique value). PARTITION BY should be applied for columns, for which you want to check duplicates.
Also refer to Row_Number in Postgres & Common Table expression in Postgres
WITH DuplicateTableRows AS
(
SELECT Id, Row_Number() OVER (PARTITION BY col1, col2... ORDER BY Id)
FROM
Table1
)
DELETE FROM Table1
WHERE Id IN (SELECT Id FROM Table1 WHERE row_number > 1)

You can do this using JSON:
select (to_jsonb(b) - 'id')
from basket b
group by 1
having count(*) > 1;
The result is as JSON. Unfortunately, to extract the values back into a record, you need to list the columns individually.

How to use to functions - MAX(smthng) and after COUNT(MAX(smthng)

I don't understand why I can't use this in my code :
SELECT MAX(SMTHNG), COUNT(MAX(SMTHNG))
FROM SomeTable;
Searched for an answer but didn't find it in documentation about these aggregate functions.
Also I get an SQL-compiler error "Invalid column name "SMTHNG"".

You want to know what the maximum SMTHNG in the table is with:
SELECT MAX(SMTHNG) FROM SomeTable;
This is an aggregation without GROUP BY and hence results in one single row containing the maximum SMTHNG.
Now you also want to know how often this SMTHNG occurs and you add COUNT(MAX(SMTHNG)). This, however, does not work, because you can not aggregate an aggregate directly.
This doesn't work either:
SELECT ANY_VALUE(max_smthng), COUNT(*)
FROM (SELECT MAX(smthng) AS max_smthng FROM sometable) t;
because the sub query only contains one row, so it's too late to count.
So, either use a sub query and select from the table again:
SELECT ANY_VALUE(smthng), COUNT(*)
FROM sometable
WHERE smthng = (SELECT MAX(smthng) FROM sometable);
Or count per SMTHNG before looking for the maximum. Here is how to get the counts:
SELECT smthng, COUNT(*)
FROM sometable
GROUP BY smthng;
And the easiest way to get the maximum from this result is:
SELECT TOP(1) smthng, COUNT(*)
FROM sometable
GROUP BY smthng
ORDER BY COUNT(*) DESC;

First of all, please read my comment.
Depending on what you're trying to achieve, the statement have to be changed.
If you want to count the highest values in SMTHNG field, you may try this:
SELECT T1.SMTHNG, COUNT(T1.SMTHNG)
FROM SomeTable T1 INNER JOIN
(
SELECT MAX(SMTHNG) AS A
FROM SomeTable
) T2 ON T1.SMTHNG = T2.A
GROUP BY T1.SMTHNG;

use cte like below or subquery
with cte as
(
select count(*) as cnt ,col from table_name
group by col
) select max(cnt) from cte
you can not use double aggregate function at a time on same column

SQL: find most common values for specific members in column 1

I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards

This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final

Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;

This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!

You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.

Select all columns for a distinct column

I need to fetch distinct rows for a particular column from a table having over 200 columns. How can I achieve this?
I used
Select distinct col1 , * from table;
but it failed.
Please suggest an elegant way to achieve this.
Regards,
Tarun

The solution if you want one row for each distinct value in a particular column is:
SELECT col1, MAX(col2), MAX(col3), MAX(col4), ...
FROM mytable
GROUP BY col1
I chose MAX() arbitrarily. It could also be MIN() or some other aggregate function. The point is if you use GROUP BY to make sure to get one row per value in col1, then all the other columns must be inside aggregate functions.
There is no way to write MAX(*) to get that treatment for all columns. Sorry, you will have to write out all the column names (at least those that you need in this query, which might not be all 200).

We can generate a sequence of ROW_NUMBER() for every COL1 and then select the first entry of every sequence.
SELECT * FROM
(
SELECT E.* , ROW_NUMBER() OVER( PARTITION BY COL1 ORDER BY 1) AS ID
FROM YOURTABLE E
) MYDATA
WHERE MYDATA.ID = 1
A working example in fiddle

Need to select ALL columns while using COUNT/Group By

Ok so I have a table in which ONE of the columns have a FEW REPEATING records.
My task is to select the REPEATING records with all attributes.
CustID FN LN DOB City State
the DOB has some repeating values which I need to select from the whole table and list all columns of all records that are same within the DOB field..
My try...
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
This only returns two columns and one row 1st column is the DOB column that occurs more than once and the 2nd column gives count on how many.
I need to figure out a way to list all attributes not just these two...
Please guide me in the right direction.

I think a more general solution is to use windows functions:
select *
from (select *, count(*) over (partition by dob) as NumDOB
from table
) t
where numDOB > 1
The reason this is more general is because it is easy to change to duplicates across two or more columns.

Select *
FROM Table1 T
WHERE T.DOB IN( Select I.DOB
FROM Table1 I
GROUP BY I.DOB
HAVING COUNT(I.DOB) > 1)

Try joining with a subquery, which will also allow you to see the count
select t.*, a.SameDOB from Table1 t
join (
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
) a on a.dob = t.dob

select *
from table1, (select count(*) from table1) as cnt

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find duplicated rows that are not exactly same - sql

Can i select all rows that have same column value (for example SSN field) but display them all separably. ? I've searched for this answer but they all have "count(*) and group by" section that demands the rows to be exactly same.

Try This: SELECT A, B FROM MyTable WHERE A IN ( SELECT A FROM MyTable GROUP BY A HAVING COUNT(*)>1 ) I have done with SQL server. But hope this is what you need

SELECT * FROM MyTable WHERE EXISTS ( SELECT NULL FROM MyTable MT WHERE MyTable.SameColumnName = MT.SameColumnName AND MyTable.DifferentColumnName <> MT.DifferentColumnName)

This will fetch the required data and show them in order so that we can see the grouped data together. SELECT * FROM TABLENAME WHERE SSN IN ( SELECT SSN FROM TABLENAMEGROUP BY SSN HAVING COUNT(SSN)>1 ) ORDER BY SSN Here SSN is the column names fro which similar value check is done.

Related

How to delete the duplicate data in table (Postgres)

How to use to functions - MAX(smthng) and after COUNT(MAX(smthng)

SQL: find most common values for specific members in column 1

Select all columns for a distinct column

Need to select ALL columns while using COUNT/Group By

Categories

Resources