SQL removing duplicates in a certain column

SQL removing duplicates in a certain column - sql

Good afternoon all!
Had a question regarding the removal of duplicates in one column and making it remove the whole row. I will provide an example in a screenshot in Excel as to not provide proprietary info.
I am looking to remove one of the rows highlighted in yellow for example but do not want to limit it to one Dr.Mike or one Health Partners clinic for example. Relatively new to this so any help would be greatly appreciated.
Thank you

You can do:
select distinct prov, clinic, address
from t;

I have used the below script to remove duplicates.
SELECT Prov, Clinic,Address, count(*)
FROM SomeTable
group by Prov, Clinic,Address
having count(*)>1
SELECT Prov, Clinic,Address, countrows = count(*)
INTO [holdkey]
FROM SomeTable
GROUP BY Prov, Clinic,Address
HAVING count(*) > 1
SELECT DISTINCT sometable.*
INTO holdingtable
FROM sometable, holdkey
WHERE sometable.Prov = holdkey.Prov
AND sometable.Clinic = holdkey.Clinic
AND sometable.Address = holdkey.Address
SELECT Prov, Clinic,Address, count(*)
FROM holdingtable
group by Prov, Clinic,Address
having count(*)>1
DELETE sometable
FROM sometable, holdkey
WHERE sometable.Prov = holdkey.Prov
AND sometable.Clinic = holdkey.Clinic
AND sometable.Address = holdkey.Address
INSERT sometable SELECT * FROM holdingtable

Related

How to limit duplicated rows

I would like help regarding an SQL query.
Looking around the site, I found several code snippets to return duplicate rows.
Here is the one I went with:
select unumber, name, localid
from table1
where unumber
in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber
which works fine, however, in the table I have other columns as well, like timestamp etc.
As such, when I run the query I indeed get the duplicate rows, however, I get the duplicates several times due to different timestamps for example.
Is there any way to limit the results to 'unique' duplicate rows only?
Hope this makes sense!
Thank you in advance!

For what you describe, you can just use select distinct:
select distinct unumber, name, localid
from table1
where unumber in (select unumber from table1 group by unumber having count (*) > 1 )
order by unumber;
However, I would be more likely to write this using window functions:
select unumber, name, localid
from (select t1.*,
count(*) over (partition by unumber) as cnt,
row_number() over (partition by unumber, name, localid order by unumber) as seqnum
from table1 t1
) t1
where cnt > 1 and seqnum = 1;

SQL: find most common values for specific members in column 1

I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards

This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final

Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;

This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!

You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.

Find duplicated rows that are not exactly same

Can i select all rows that have same column value (for example SSN field) but display them all separably. ?
I've searched for this answer but they all have "count(*) and group by" section that demands the rows to be exactly same.

Try This:
SELECT A, B FROM MyTable
WHERE A IN
(
SELECT A FROM MyTable GROUP BY A HAVING COUNT(*)>1
)
I have done with SQL server. But hope this is what you need

Here is another approach, which only references the table once, using an analytic function instead of a subquery to get the duplicate counts It might be faster; it also might not, depending on the particular data.
SELECT * FROM (
SELECT col1, col2, col3, ssn, COUNT(*) OVER (PARTITION BY ssn) ssn_dup_count
)
WHERE ssn_dup_count > 1
ORDER BY ssn_dup_count DESC

SELECT
*
FROM
MyTable
WHERE
EXISTS
(
SELECT
NULL
FROM
MyTable MT
WHERE
MyTable.SameColumnName = MT.SameColumnName
AND MyTable.DifferentColumnName <> MT.DifferentColumnName)

This will fetch the required data and show them in order so that we can see the grouped data together.
SELECT * FROM TABLENAME
WHERE SSN IN
(
SELECT SSN FROM TABLENAMEGROUP BY SSN HAVING COUNT(SSN)>1
)
ORDER BY SSN
Here SSN is the column names fro which similar value check is done.

Need to select ALL columns while using COUNT/Group By

Ok so I have a table in which ONE of the columns have a FEW REPEATING records.
My task is to select the REPEATING records with all attributes.
CustID FN LN DOB City State
the DOB has some repeating values which I need to select from the whole table and list all columns of all records that are same within the DOB field..
My try...
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
This only returns two columns and one row 1st column is the DOB column that occurs more than once and the 2nd column gives count on how many.
I need to figure out a way to list all attributes not just these two...
Please guide me in the right direction.

I think a more general solution is to use windows functions:
select *
from (select *, count(*) over (partition by dob) as NumDOB
from table
) t
where numDOB > 1
The reason this is more general is because it is easy to change to duplicates across two or more columns.

Select *
FROM Table1 T
WHERE T.DOB IN( Select I.DOB
FROM Table1 I
GROUP BY I.DOB
HAVING COUNT(I.DOB) > 1)

Try joining with a subquery, which will also allow you to see the count
select t.*, a.SameDOB from Table1 t
join (
Select DOB, COUNT(DOB) As 'SameDOB' from Table1
group by DOB
HAVING (COUNT(DOB) > 1)
) a on a.dob = t.dob

select *
from table1, (select count(*) from table1) as cnt

SQL - select only results having multiple entries

this seems simple but I cannot figure out how to do it or the proper description to correcltly google it :(
Briefly, have a table with:
PatientID | Date | Feature_of_Interest...
I'm wanting to plot some results for patients with multiple visits, when they have the feature of interest. No problem filtering out by feature of interest, but then I only want my resulting query to contain patients who have multiple entries.
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND (Filter out PatientID's that only appear once)
So - just not sure how to approach this. I tried doing:
WITH X AS (Above SELECT, Count(*),...,Group by PatientID)
Then re-running query, but it did not work. I can post that all out if needed, but am getting the impression I am approaching this completely backward, so will defer for now.
Using SQL Server 2008.

Try this:
WITH qry AS
(
SELECT a.*,
COUNT(1) OVER(PARTITION BY PatientID) cnt
FROM myTable a
WHERE Feature_Of_Interest = 'present '
)
SELECT *
FROM qry
WHERE cnt >1

You'll want to join a subquery
JOIN (
SELECT
PatientID
FROM myTable
WHERE Feature_Of_Interest is present
GROUP BY PatientID
HAVING COUNT(*) > 1
) s ON myTable.PatientID = s.PatientID

You could start with a counting query for visits:
SELECT PatientID, COUNT(*) as numvisits FROM myTable
GROUP BY PatientID HAVING(numvisits > 1);
Then you can base further queries off this one by joining.

Quick answer as I head off to bed, so its untested code but, in short, you can use a sub query..
SELECT PatientID,Date,...
FROM myTable
WHERE Feature_Of_Interest is present
AND patientid in (select PatientID, count(patientid) as counter
FROM myTable
WHERE Feature_Of_Interest is present group by patientid having counter>1)
Im surprised your attempt didnt work, it sounds a little like it should have, except you didnt say having count > 1 hence it probably just returned them all.

You should be able to get what you need by using a window function similar to this:
WITH ctePatient AS (
SELECT PatientID, Date, SUM(1) OVER (PARTITION BY PatientID) Cnt
FROM tblPatient
WHERE Feature_Of_Interest = 1
)
SELECT *
FROM ctePatient
WHERE Cnt > 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL removing duplicates in a certain column - sql

You can do: select distinct prov, clinic, address from t;

Related

How to limit duplicated rows

SQL: find most common values for specific members in column 1

Find duplicated rows that are not exactly same

Need to select ALL columns while using COUNT/Group By

SQL - select only results having multiple entries

Categories

Resources